Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

In our case, we're designing around INSERT-only tables with a composite primary key that includes the site id, so (in theory) there will never be any conflicts that need resolution.


> with a composite primary key that includes the site id

It doesn't look like you'd need multi master replication in that case? You could simply partition tables by site and rely on logical replication.


I think that's absolutely true in the happy scenario when the internet is up.

There's a requirement that during outages each site continue operating independently and might* need to make writes to data "outside" its normal partition. By having active-active replication the hope is that the whole thing recovers "automatically" (famous last words) to a consistent state once the network comes back.


But if you drop the assumption that each site only writes rows prefixed with its site ID, then you're right back to the original situation where writes can be silently overwritten.

Do you consider that acceptable, or don't you?


Not silently overwritten: the collision is visible to the application layer once connectivity is restored and you can prompt humans to reconcile it if need be.


Sounds like a recipe for a split brain that requires manual recovery and reconciliation.


That's correct: when the network comes back up we'll present users with a diff view and they can reconcile manually or decide to drop the revision they don't care about.

We're expecting this to be a rare occurrence (during partition, user at site A needs to modify data sourced from B). It doesn't have to be trivially easy for us to recover from, only possible.


You could implement a CRDT and partially automate that "recovery and reconciliation" workflow.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: