smanza
Worker with state vs Cache + Worker
Hello !
I’m wondering of the best approach to use regarding data coming from worker processes under supervision tree.
Is it better to use state per worker (ie. GenServer, :gen_statem) and restart over when they fail (with maybe mechanism to save data when they are down: ie ETS) or use a cache process which is filled by worker process (ie. Task) ?
The last approach is more centralized instead of the first one which is more distributed/decentralized
Thanks
Most Liked Responses
gregvaughn
No, please, a thousand times no. Do not use :sys.get_state in production code. It is intended only for debugging purposes. You speak a lot about managing state carefully, but then you want to do this brutal approach. ![]()
My suggestion wasn’t so much about Task.Supervisor as it was about async_stream in which case the results come to you when they complete. No need to reach into another process’ state (which would be akin to me to grabbing money out of your wallet because you owe me).
And gen_statem seems to be overkill for your purposes too. The point of a “gen” style server is to be able to receive messages from other processes. You can use a basic state machine approach with Enum.reduce and an appropriate accumulator map/struct.
shanesveller
I would not depend on Agents for a production use-case, particularly compared against a purpose-built GenServer or gen_statem module. The latter can have fully custom/arbitrary lifecycle behavior, error handling, proper supervision, etc.
For OP’s actual question, can you say more about the nature and use of the data? What’s its volatility, its source of truth, how expensive is it to rebuild from nothing, are concurrent writes necessary, what’s the appetite for eventual consistency, how dangerous is it for multiple BEAM nodes to have disjoint views of the information, etc.
smanza
For what I can say, so a BEAM node will receive a request and must do concurrent jobs to build a context for a further computation. The node need to keep the result of each job.
Some job can be really fast to rebuild and some are really expensive (such as multiple network calls).
The concurrent writting does not really matter, because job will produce independant data.
The other question which interest me also, it is better to keep multiple process with small or medium data or only one process with a lot of data ? (evenif ETS can be used to leverage heap allocation)
My current approach is:
- Top Supervisor including: Registry and DynamicSupervisor.
- The dynamic supervisor will spawn for each specific request its own supervisor.
- This latter will spawn jobs (currently :gen_statem to simply state identification and processing).
- Each jobs will be registered into the Registry and keep the job result.
When I want to retrieve the all the job data, I’m using Registry.dispatch to broadcast the retrieval of the state and data from the jobs.
Another approach specially regarding 3. and 4. will be to insert a cache inside the latest supervisor and each job as Task where each will fill the cache and die after. (freeing maybe some memory)
The retrieval of the data will be directly from this cache. (but cache memory will increase)
Popular in Discussions
Other popular topics
Categories:
Sub Categories:
Forums
Popular Tags
- #ecto
- #liveview
- #troubleshooting
- #learning-elixir
- #deployment
- #library
- #erlang
- #testing
- #genserver
- #mix
- #absinthe
- #remote-other
- #otp
- #plug
- #how-to-question
- #macros
- #postgres
- #channels
- #elixirconf
- #exunit
- #discussion
- #javascript
- #podcasts
- #code-sync
- #onsite
- #dialyzer
- #docker
- #authentication
- #umbrella
- #full-time-contract
- #podcasts-by-brainlid
- #ecto-query
- #elixir-ls
- #phoenix_html
- #iex
- #blog-post
- #graphql
- #genstage
- #ai
- #websockets
- #supervisor
- #advent-of-code
- #elixirconf-us
- #distillery
- #processes
- #forms
- #api
- #metaprogramming
- #security
- #performance









