Ensuring Event Consistency in Distributed Systems

What happens when we use Events within our system after saving information in the database but the event fails and does not propagate? This is a question I received recently on one of my videos, and we are going to discuss it here.

 

 

1 - Problem Context

Let’s imagine we are building a system that needs to store data in the database and from there send an event. This could be any kind of application or system, or even be part of a SAGA process, for example.

In this scenario, we have two actions

 

1 - Update the database

2 - Send the event

event consistency

And it doesn’t matter in which order they occur—if the first process works, nothing guarantees the second process will work.

 

One possible solution could be to add a fallback: if sending the event fails, we roll back the database change. But this solution isn’t very realistic, because what happens if the process itself crashes?

Another option is to implement 2PC (two-phase commit), but not all databases support this functionality, nor do message brokers, so we can rule it out, at least automatically.

So, how can we guarantee both actions happen?

 

 

2 - Outbox Pattern

One of the most common solutions is the Outbox Pattern. In this pattern, we have a couple of extra steps in the process. To begin with, our process for inserting or modifying data in the database not only performs that action, but also inserts the event content into a table called outbox_table_name, which will contain the full event to be sent.

outbox pattern part 1

Therefore, both actions occur as part of a single transaction within the same database.

 

Subsequently, we need an application (which could run every minute or every X seconds) whose job is to read that table, publish the event to the message broker, and finally mark the record as sent.

outbox pattern implemented

You might wonder what happens if the application fails after sending the event but before saving the update marking it as sent. In that case, we must ensure that the events we generate are idempotent, so that consumers will discard any duplicate events they receive—problem solved.

 

NOTE: This extra application could also be a worker in the main application.

 

 

3 - Listen to Yourself Pattern

 

Another, less popular, alternative is the Listen to Yourself Pattern, which consists of first generating an event to be listened to by consumers, one of which is the same application. This internal consumer within the application is responsible for updating the database.

listen to yourself pattern

This scenario also has its problems. It may generate a race condition where our "own" consumer (the one creating the event) fails to insert data into the database. Meanwhile, another consumer has received the event and might try to query something from the database. In this scenario, the database query will fail as the information is missing because our internal consumer failed.

Listen to yourself with race condition

Not to mention that events are always system actions that have already occurred—and here, we are actually not performing the action. So that’s another issue. One solution is, just like we saw in the CQRS post, to generate a command or a domain event; these commands are only listened to by a single consumer, who will update the read database (if there’s a separation) and then generate the events that are consumed by the rest of the system.

 

Before moving to the final section where I’ll share my experience, feel free to leave yours in the comments on these patterns! Learning from the experiences of others is always valuable.

 

 

4 - Real Usage in Companies to Ensure Consistency

 

How these mechanisms are implemented in companies can vary greatly—not only depending on the type of company but also on what part of the company or system we’re dealing with.

 

The reason is simple. Under normal and ideal circumstances, neither the database nor the message broker/queues/buses will fail, which means that not having these systems to guarantee both actions happens is not a major problem.

 

For example, if we have a service that updates product details such as name, description, or images.

The database is updated but the event system fails, meaning that any other system needing that information will still have the old info.

 

If this happens and the user notices, most likely they will try to make the change again or contact support. Here we have to evaluate whether the cost of infrastructure, configuration, and development is worth it compared to the 0.1% of the time the message bus will be down during the year.

 

Because that system is different from, say, one that charges credit card payments or updates the stock inventory. In the case of inventory control, it is important to guarantee that events occur because otherwise we might sell (and thus charge for) products that we are out of stock of—which can be a major issue.

And of course, in the case of payments, ensuring these events happen and are processed only once is crucial for the system.

 

As we can see, using these patterns depends on what part of the system we are dealing with and how critical it is.

 

This post was translated from Spanish. You can see the original one here.
If there is any problem you can add a comment bellow or contact me in the website's contact form

© copyright 2025 NetMentor | Todos los derechos reservados | RSS Feed

Buy me a coffee Invitame a un café