Most database installations are based on one master (primary) and zero to N slaves (secondaries). Master is responsible for reading/writing and slaves are ready to become primary if necessary – typically when the original master is down. Slaves can be also used for read-only queries if the synchronization delay is not an issue.
But if we want to scale?
A lot of developers and software architects assume it’s easy to switch to a multi-master database. They assume that if we pay enough money for the database, everything will resolve itself. But that’s not how it works.
There is a big fundamental problem (described as CAP theorem).
We have to choose between availability and consistency.
For example, you have a multi-master configuration with node A and B. If we want to make a hotel room reservation on the node A, we need to be sure that the room is available on the node B as well. But if the node B does not respond we have two options:
- Wait until B responds and we are going to lose availability. If B is stuck, the A is also.
- Confirm reservation and we are going to lose consistency.
If the B does not respond, we do not know if B is down or if there is some network problem. We must assume the worse variant, that is, the connection is only broken and the other node is running and accepting requests.
We can sacrifice availability, but that means we have to abandon JOINS, database transactions, database locks and etc. It can be painful especially if we come from the RDBMS world.
We can sacrifice consistency but it can be used only for special rare cases. We have to implement processes to correct the data.
It is why we should avoid master-master configuration if we want to keep consistency and availability at the same time.
But we really need to scale
The solution is not at the database level, but at the application level. We have to split our data into independent shards.
For example, we have a cloud-based accounting system. We can split our customers geographically. Customers come from the USA and the EU. We can set up two independent environments. Because our customers do not have to see the data from other customers, the shards are completely independent and we are safe. We can have two master databases at the same time. We can take advantage of the RDBMS database.
One thought on “Multi-master database problem”