i plan to build 2 phoenix applications, let’s say appA and appB.
AppA will be having multiple phoenix containers (to match scalability requirements) in a cluster. AppB (single node) will maintain state in ETS tables. ETS tables on AppB can have at max 5gb of data. I am looking out for a way to setup a consistent connection between 2 apps. AppA should be able to pull data from AppB.
i can try with HTTPoison. but as that is a http protocol, when i try with a million concurrent connections, it can be slow.
instead i am looking out for a better alternative like a channel. is it possible to set a persistent connection between both the applications with a channel and then pull data when required?
Initially i thought of placing my ETS tables inside one phoenix application. however due to the data consistency issues that araise during a network split, i wanted to separate state from application and AppA will be stateless and AppB will contain ETS tables.
Thanks for taking time in reading my point. Any suggestion in this direction can be helpful.
How much data do you need to be sent? Why would there be millions of concurrent connections?
Can you connect A and B nodes in the same Elixir cluster via Distributed Erlang? In this case A node could just send messages to B node and vice-versa.
the data to be fetched will be around 20kb.
when i say the statement “when i try with a million concurrent connections”, i mean the app will be access by users from web and the concurrency is expected to be heavy.
@alvises initially started with the intent of storing the data in ETS tables in a single application cluster. I am not sure how to handle the network split scenarios and hence thought of making it into 2 applications. one as stateless and another with state.
hence trying the distributed erlang might land me in the similar scenario? not sure.
Tell me If I get it right: You want A to be replicated without any state (not even cache to avoid issues during a network split), and the only source of truth is B, a single node.
If B doesn’t make any particular transformations and processing, instead of making your own db in AppB, I would simply use a database like redis or postgres (which are battle-tested!). Both redis and postgres can be partitioned and replicated - I would use them instead of reinventing the wheel
if instead you have only the app-A replicas storing data in a distributed fashion, without any database or B node, sure… you have to deal with consistency issues, network partitions and all that jazz… if you can accept eventual consistency, take a look at CRDTs https://github.com/derekkraan/delta_crdt_ex
@alvises i will be storing my data in postgres and it will always be my source of truth. the data which i am planning to store on AppB is for faster access (instead of making numerous round trips to database).
However it seems in this case for me to validate the cached data (to find if that is stale) i might end up doing more I/O than expected.
With that in mind, for my requirement, i think relying completely on backend database is safe (although it leads to more I/O, it can work well for a scalable application).
Thanks for your time and insight to my post.
If you need faster I/O than postgres, just use redis (with redix library) instead of re-implementing it yourself on AppB.
IMO your easiest option would be for every app server in the cluster to maintain their own local cache through ETS. All local caches will warm up eventually.
Otherwise you’ll have to deal with a lot of complexity. Not worth it going for Redis IMO. If your DB server is good (no need for highest-end) the latencies from your DB won’t be much worse than Redis. Likely 2-20ms in the DB server compared to 1-5ms from Redis, has been my average experience in the past. Redis is fast but rarely needed (unless you use a framework with big ORM overhead, like Rails, but definitely not like Phoenix / Ecto).
In addition to my first suggestion, you can also pre-fill some of the caches on server’s startup.
As a more general answer, in decreasing order of efficiency:
- Don’t split your app into multiple apps. (after all, the fastest kind of communication is no communication.)
- Run your applications in the same cluster, and use distributed erlang to communicate. (You can now use Erlang’s battle-tested distributed message-passing systems, which allows for fine-grained and transparent communication.)
- Run two applications at remote locations, and communicate over a socket using Erlang’s built-in
term_to_binary to encode/decode datatypes. (More overhead, but does not require apps to run in same distributed erlang cluster.)
- Run two applications and perform HTTP requests to each-other. (Slow, but still usable if one of the apps is not a BEAM app, or if a persistent socket connection cannot be established.)
Note that these all take as premise that you really want two applications that communicate. The other answers, about e.g. using a shared database are very well worth considering, depending on the exact situation!
thank you @Qqwy, @dimitarvp and @alvises for your inputs. As stated earlier, following are the conditions i have in designing the architecture for this application.
- users when login and visit a page, data need to be fetched from database and served. if around 50,000 users login at the same time, then 50,000 DB calls would occur for same data.
- one thing i want to eliminate is the latency due to DB calls. and another point is, i will be hosting this application on a cloud. hence more the DB calls, the more cloud usage bill will be. so i wanted to remove the DB calls part where ever possible. So i chose to store data in ETS tables and update them during a data modification.
- if i maintain my data in ETS in one single application and store local cache, it might give incorrect result to clients connecting to different nodes(when multiple docker containers spawn). So my app need a global cache. And as stated earlier, in case of network split, maintaining cache data without having stale cache data is a huge challenge (this may end up in more DB calls than normal approach).
This condition lead me to the idea of separating state from the application and then maintaining the state in another application.
for making use of erlang distributed message passing option, i am not sure how will it work when appA will be scaling up/down with docker and appB is a single node. if anyone can shed some light on this, it will be really helpful.
So i chose that my appA will make a HTTP call to appB. Data is stored in ETS tables in appB.
Until now, i am able to setup 2 applications and store data in ETS in appB and fetch that into appA. i am using HTTPoison to make http calls (not sure if there is any efficient way other than HTTPoison to make faster http calls). Now i am configuring Tsung to perform some load tests to see how the response time for a call will be.
Once again thanks for taking time in reading my post.
With your points you‘re going full circle: From an external db to local state to local distributed state and back to an external db, which you now call an app. At some point you need to look at the tradeoffs of all those options and decide which ones are the best to take at which time.
Your statement is true. I am rather trying to consider various options one after one and eliminating the option that doesn’t suit. I am a database administrator and not a full time developer. Hence I am trying to read through different sources and trying to arrive at a consensus. This is where valuable inputs from these forums really matter to me.
Thanks for your time.