Concurrent editing of JSON on client and server

tmbb · October 17, 2017, 9:30pm

I’m trying to design something like Drab but in which client and server share the state and can both apply changes to the state. State would be represented as JSON or something equivalent. I’d like to support maps and lists.

I keep all state on the server, and the client can’t even touch the state directly. It can only send events to the server so that the server approves of the changes and broadcasts them back.

Currently I have a problem when editing lists. List operations are identified with elements. It makes sense to say: “delete element 1” or “delete element 2”.

When an Item is deleted, the server updates the list, and broadcasts the change to the client. The UI is only updated after the client receives the server’s response.

This is simple, but if the user deletes two elements of the list on the client very quickly, before the server has time to send back to the client the new state, then the operation will have changed meaning, that is, I’ll be removing a different element form the one I wanted. Concretely: if I send: “delete 1” and “delete 2”, the server will delete element 1, then element 3 (because the old element 2 would have moved to position 1 and the old element 3 would have moved to position 2).

I can solve this by selectively blocking parts of the UI, but I’d like to avoid this for several reasons.

It seams like the solution would be to have some kind of replicated datastructure on the client and server that could interpret these changed correctly. From what I’ve read, CRDTs seem like a viable solution, and scale to several clients connected to the same server.

But maybe actually need something simpler, like Operational Transformation (OT); I think there are OT algorithms for JSON.

The design goal here is to have a the server perform all operations on the data so that there is no need for the end-user to write any Javascript. Finally, this is all theoretical, and I have no practical uses for this right now that can’t be accomplished in more traditional
way.

EDIT: maybe this is more general discussion than just Phoenix; the core details are actually independent of phoenix, so maybe this should be moved? I don’t know.

alfert · October 18, 2017, 4:52am

Yes, this problem is orthogonal to Phoenix. I think the major problem is that you use the lists as arrays (i.e. lists with indices) and the index numbers are thus part of the global state. Some issues with your approach:

If you delete an element on the client side, than the deletion changes the amount of elements in the list. Since you wait for an update from the server for the new client state, your delete interaction must take that into account. Either you wait for the server update (= pessimistic locking) or your code is aware that the server state may not be the same as the client state, so any update operation may fail (the entire list may be deleted on the server while you are doing some updates on the client).

A typical solution is optimistic locking with version numbers: The server is the source of truth and maintains a (monotonic increasing) version number. Every manipulative request must send also the version number. If the server detects that his version number and your version number disagree, the client operates on old state and the operation is declined. On the client side you present a nice error message, retrieve the current state and let the user reapply the operation. Your won’t do that automatically, since the state can be changed towards any direction and your program usually would not know what to do next.

If you want to delete two items immediately, than you could send them together to the server (“a single transaction”) and you need to define a protocol between client and server: are the indexes the original ones (before any delete), or modifies each single delete operation inside the transaction the indexes mentioned? In your example above: is delete([1, 1]) or delete([1,2])? Client and Server must follow the protocol and their algorithms to define the index numbers will be different.

Moving towards CRDTs means very very roughly that every client has its own set of version numbers (“vector clocks”) and you have a careful selected set of allowed operations that do not interfere with state changes all over the place. Deletes are notoriously difficult.

It is generally helpful to use idempotent remote operations since they don’t rely on partial global state. But this does not prevent you to think about locking and synchronisation if you leave sequential programming…

tmbb · October 18, 2017, 9:02am

A typical solution is optimistic locking with version numbers: The server is the source of truth and maintains a (monotonic increasing) version number. Every manipulative request must send also the version number. If the server detects that his version number and your version number disagree, the client operates on old state and the operation is declined.

This is probably the way to go, thanks Having operations fail is not a problem. It’ll be rare and I don’t expect many conflicts in practice.

Actually, I won’t be editing arbitrary JSON; it’ll be mostly JSON with a fixed schema so I can be smarter with the locks. Also, there are some things I know I won’t delete, so instead of having a central server I can split independent parts into multiple processes on the server side. Sending data tagged with version numbers will probably be enough for my purposes. Thanks a lot for the detailed writeup.

OvermindDL1 · October 19, 2017, 2:56pm

Precisely that, never use the index in a distributed structure, use some kind of unique identifier for every single ‘element’. CRDT’s are fine for simple things like this.

tmbb · October 19, 2017, 3:09pm

I’ve found an implementation of a JSON OT library in javascript that’s small enough for me to read and reimplement: https://github.com/ottypes/json0

I prefer OT to CRDTs because it has conflict resolution built in and makes it easier to audit the changes, as long as I have access to the server all the time.