I am currently writing a phoenix server application based around channels that which will hopefully be adopted and communicate with any number of Android based clients distributed around Australia in unknown locations. So it is imperative part of the design of my system focuses on the ability of being able to self heal socket communications with clients and making sure we can pick up where we left off if there is an issue with the network.
Let me give an example and some terminology:
Client - The Android client device
Server - My phoenix server that comunicates with client
Client Group - A logical group of one or more clients.
Client devices that belong to customers can be arranged into logical groups called “client groups”, a client group is represented on the server as its own topic, e.g. “client_group:1” OR “client_group:2”.
When a client connects to the server and is authorised, the server responds with the name of the client group the client is assigned to and the client will connect to the appropriate client group topic.
A client can only belong to one client group at a time and the server can request the client to change client groups at any time by issuing a “change_client_group” event with an appropriate payload that tells the client which client group to switch to.
The client will then issue a phx_leave event for the current client group, wait for the :ok from the server and then issue a phx_join event for the new client group (maybe there is a better way to handle this from the server side?)
That is what should happen in a perfect world. Now, if for some reason there is a network communication error (or some other unknown error) and the client never receives the :ok reply from the server, then the client will never join the new client group they have been assigned to. If the client never joins the new client group, then my whole application becomes useless and the client is left dangling and not belonging to any client group… the end…
So to circumvent this… my current line of thinking around this is to write a “Socket Guardian” that is the guardian of all communications that are compulsory for my system to operate effectively.
When a message is sent from the server that requires validation that the client has received the message, I am thinking of spawning a “Socket Guardian” process that will make sure that the client has ticked all the boxes in the sequential steps in the communication.
For example, based on the example communication above, once the server decides to send “change_client_group”, then the server will spawn a process to make sure that the client responds with “phx_leave” and then a “phx_join”. The “socket guardian” may have a timeout that is larger than the latency from client to server that will resend messages that appear to have been missed. All this logic is rough at the moment - but hopefully it gives you an idea of what I am trying to achieve.
But before I send myself down that rabbit hole, I am wondering if anyone knows if there is a library that handles this system of behaviour so that I can either leverage it or learn from it.
Long winded post, thanks for your time!