Tired of waking up in the middle of the night because you've received a notification? In this post, we’ll look at designing a notification system that takes our beloved users into account.
Index
When designing notification systems, not only is there no single perfect solution, but there are several scenarios. The architecture isn’t the same for a WhatsApp notification letting you know about a new message, as for a YouTube notification about a new video from a creator with millions of subscribers. While the idea is the same, implementing the second one the same way as the first is not advisable due to the cost. Today, we’re going to focus on one-to-one notifications.
I’m doing this intentionally to keep this post in the group of easy reads.
1 - Requirements of a Notification System
A notification system has some straightforward requirements:
- The system must know which notification we want to send.
- Type of notification: whether we want to send an email, SMS, or push notification to the phone.
- Scalable and available: as in every post in this series, the idea is that it should scale smoothly and be available all the time. Also, peak times are clear: daytime hours will see much higher loads than nighttime, and depending on if it's an enterprise or entertainment system, peak times will differ.
Additionally, we can mention user priorities. Here, we can let users choose which types of notifications they want to receive and the hours when they do not want to receive certain notifications.
If this were a real interview, you could ask what type of system you're building, or where it will be used. For WhatsApp notifications, you want to receive them as soon as they happen, but for a forum notification, maybe you don’t want to send notifications at night, or you want to batch them together.
2 - Designing the Notification Contract
Let’s move on to defining how message producers—internal applications—generate notifications. Here, we’re defining a contract, not an API as in other posts. The reason is simple: most notification systems work on the producer-consumer pattern, so we generate an event, not an API call.
The contract consists of several parts:
- Metadata: In this section, we specify the unique identifier for the message, the date and time it was generated, and through which channels or what type of notification we want to send. Basically, the information about the notification.
{ "metadata": { 👈 "uniqueIdentifier": string, "timestamp": datetime, "channels": [sms, email, push] }}
- Recipient: The second part is the recipient, usually it just contains the user ID, since the notification system has access to all the internal systems.
{ "metadata": { "uniqueIdentifier": string, "timestamp": datetime, "channels": [sms, email, push] }, "recipient": { 👈 "userId": string },}
- Notification Content: Here we have all the information the recipient will get. The object to send is usually a big object containing all the properties/fields for every channel to be used. If a certain type isn’t sent, those fields will be empty.
{ "metadata": { "uniqueIdentifier": string, "timestamp": datetime, "channels": [sms, email, push] }, "recipient": { "userId": string }, "content": { 👈 "email": { "subject": string, "body": string, "attachments": [ (url, filename, mimeType) ], }, "sms": { "content": string }, "push": { "title": string, "body": string, "action": string } }}
As seen, it contains all the info for the three delivery options—SMS, push notification, and Email content.
If the notification contains an attachment, in most cases just the URL to the attachment is sent, not the file itself, to avoid file size issues. Still, this depends on the specific scenario: for invoices, the file might be attached to the email; for books, it’s probably not recommended.
3 - Designing the Architecture of a Notification System
As I like to do (and recommend doing in interviews), start simple and evolve. In this case, we’ll start with something basic.
We have one or more systems generating notifications, and these notifications are published into a producer-consumer system, like an event bus, queue, etc. There’s a subscriber that reads the message and sends it to the relevant third-party service for emails, notifications, or SMS, after having read the user’s contact info from the API.
In the real world, depending on your system size and company infrastructure, this might be enough, but in a design interview, we need to go further.
Next, let's continue with scaling. The first thing we’ll need is more instances of our consumer.
You might think that because we’re behind a producer/consumer system, we can consume at our own pace, and that's true—but not always. If we have too many notifications, a large backlog will build up. For example, if you process 10k messages/second but receive 12k, each second you’re accumulating an extra 2k in backlog.
So, we need to scale our consumer with more instances. Depending on your producer/consumer system, you’ll scale differently—Kafka is a common choice for implementing producer-consumer; to learn more about its internals, check out my book Building Distributed Systems.
NOTE: If you went with direct API communication, you’d also need more instances, but you’d need a load balancer in front of them.
We’re not done yet. Instead of sending messages directly from this app, we use this APP to properly distribute messages to a specific queue for each notification type. Ideally, after that, a native cloud function (Lambda, Azure Function, etc.) processes the message and delivers it to the third-party system.
As an extra, if you use Azure or AWS, you can create connectors so these actions (email, notification, etc.) can be done directly from the queue, without even needing a function in between.
In most interviews where you’re asked for a notification system, this covers what you’ll be asked for. WhatsApp alerts, monitoring system warnings, etc., all generally work like this.
3.1 - Including User Preferences in a Notification System
For our scenario, let's include user preferences; something, by the way, I should add to my own blog…
For this, we have two key questions:
- First, which layer do we include user preferences in? In the APP layer that redistributes notifications.
- Second, what do we do with notifications that shouldn’t be sent yet? We have several options for this.
Before answering, let’s consider why we put this logic in the redistribution layer. The reason is simple: in many systems, this will be the only app needed, since the rest can be done via configuration. The idea behind the Cloud Native Function is to guarantee that any message received gets sent. No extra functionality—the main goal is single responsibility, so handling user logic (as we’ll see) is considered extra work.
Now comes what to do with notifications.
Suppose a user only wants to be contacted between 8 AM and 4 PM. The rest of the time, they do not want notifications.
For that, we have an API with user info that tells us their settings. If we want to send a notification, we have to check the user’s schedule.
If our notification is within the user’s preferred hours, we propagate it as usual. If it's not, we have two options: ignore it and drop it. This might seem crazy, but for, say, a store with flash sales for the next two hours, there’s no point sending that notification if the user’s unavailable—it’s not even worth storing, since it won’t be valid when the user reads it. Similarly, for live stream notifications, there’s no point in sending them after the event ends.
The real issue is with useful notifications that should be stored and sent when the user becomes available. For this, we can implement a queue for every hour when users want those notifications delivered. For example, a queue (or stream) for notifications to be executed at 8 AM, another at 9 AM, and so on.
Then we have an application that reads from those queues and propagates messages to the original producer-consumer system as if they just arrived.
The overall flow would be:
1 - An app in our system generates an event.
2 - Our consumer consumes the message.
3 - We check user preferences and read their info (email, phone number).
4 - If it’s out of hours, pass the message to the corresponding hour’s queue.
Note: If it’s within the user’s preferred time, skip to step 9.
5 - A timer or cron job runs an app every hour (or at custom intervals).
6 - The app reads the message from the relevant queue.
7 - Forward the message to the original notification queue/stream.
8 - The message is received and the user info is read again.
9 - Forward the message to the final notification queue.
10 - Read the message and send it to the relevant third-party app for the notification type.
If a message cannot be delivered at any of these steps, we can implement a Dead-letter-queue
for post-mortem review.
3.2 - Priority Messages
While this is the architecture, there’s one more design point. Now that we have notifications with scheduling, some messages may need to bypass those rules. For example, if an entire system is down, regardless of user configuration, we must send the notification. That’s why the event contract should include a priority field. If maximum, we bypass all user checks and deliver the message straight away.
{ "metadata": { "uniqueIdentifier": string, "timestamp": datetime, "channels": [sms, email, push], "priority": low | medium | high 👈 }, "recipient": { ... }, "content": { ... }}
Just include the value in the metadata to skip any validation and forward it to the correct queue. For this you can use priorities 1, 2, 3, or high, medium, low—whichever works for you.
If there is any problem you can add a comment bellow or contact me in the website's contact form