Manage state in Elixir

Fl4m3Ph03n1x · February 15, 2019, 3:24pm

Background

I have an application that upon start creates 100 processes. Now, every time I need to perform a specific operation (say, calculate the size of my ego ) I pick one of those processes at random to perform the calculation.

So now I need to access those processes, which means I need to save them somewhere, aka, I need state.

State via named ETS tables

This solution advocates that when starting my app, I create all the processes and save their pids into an ETS table with a name (which is a singleton in the end).

This solution is simple, and works as a cache (so far so good), but I really don’t like singletons in my apps because they make for code poorly testable. I could create a mock of the singleton and use it for tests, but I wonder if there is simpler way.

State via GenServer

After reading pragmatic’s Dave approach to components, I understand that he (and some erlangers) prefer to manage state via GenServers. This would mean I need processes to communicate with this GenServer and would add a lot of boiler plate code to my app, not to mention I believe this is quite overkill for the problem at hand - I just want a table of pids after all.

Questions

So this brings me to a couple of questions:

Are there other ways to manage State in Elixir?
Which approach (ets or genserver) would you choose and why?

PS: the size of my ego is the same as the size of my intelligence: NaN (so small doesn’t even qualify )

shanesveller · February 15, 2019, 3:31pm

I would say that this isn’t a question of state so much as of process registration. The workers themselves sound stateless in your description. Any one of them is capable of fulfilling a request, and none of the persist or share anything between subsequent requests.

Read up on how GenServer’s allow various forms of name registration. Next, take a look at :global from Erlang and Registry from the Elixir stdlib. If you just need a constant-sized worker pool where any of them are used to handle a given single invocation, look at the :poolboy Erlang library. One of these should fit your use-case.

https://hexdocs.pm/elixir/GenServer.html#module-name-registration
http://erlang.org/doc/man/global.html
https://hexdocs.pm/elixir/Registry.html

alvises · February 15, 2019, 3:35pm

If I understand correctly you need a process pool. You can use something like Poolboy. With Poolboy you can setup a supervisor that supervises your processes.

kokolegorille · February 15, 2019, 3:40pm

Why not use Registry?

Fl4m3Ph03n1x · February 15, 2019, 3:45pm

I know of poolboy but I choose not to use it for this specific case, reason being, that having a pool of processes was an analogy to make the problem easier to understand.

What actually ends up happening is that I have around 10_000 gun connections open and I need to save them somewhere. Upon opening a connection (which is in reality a stream) it remains open forever, waiting for me to send requests for it to send to some domain.

This is why it makes no sense to use poolboy here. In any other case, I would agree.
Should I updated the problem and get rid of the analogy?

alvises · February 15, 2019, 3:52pm

Do you need to broadcast the same message to all of them?

You can use a PubSub system. Take a look at :pg2.
Registry is great when each single process needs to have a name/id. In your case you just need to have access the processes all-together. :pg2 also monitors your processes so if anyone crashes, it’s automatically removed from the group.

P.S. It’s also possible to use pg2 on multiple distributed node, but be careful about this because if I remember correctly the replication over multiple nodes is made locking the pg2 processes.

Fl4m3Ph03n1x · February 15, 2019, 3:56pm

I don’t need to broacast a message. I need each process to accept a given command and make a request to a domain.

My main question now is: What happens if a connection in Registry dies?

Thanks for the pubsub idea though!

alvises · February 15, 2019, 3:58pm

If the connection process dies it’s automatically removed from the registry. Registry monitors the processes like pg2.

If the processes are from the same module, I would use :pg2, or some sort of pubsub mechanism, to group them together without reinventing the wheel. You could do it with a DynamicSupervisor where you just add the new processes to the supervisor and asks to the supervisor the processes list…

benwilson512 · February 15, 2019, 3:58pm

How would the process get a command if not by receiving a message? Notably if they’re inside a registry you don’t also need pg2, you can use Registry to broadcast, there are examples in the registry docs.

gregvaughn · February 15, 2019, 4:01pm

The same thing that happens when any other BEAM process dies: it depends on what supervisor it has. The Registry also monitors the process and will de-list a dead process. But the supervisor can restart the process and re-register the new instance with the Registry.

peerreynders · February 15, 2019, 4:04pm

You need to differentiate between between Singleton and Just Create One.

If you pass in a handle or name when the process spawns it is more like “Just Create One” because you don’t depend on any globally fixed information.

That being said having 10000 processes pounding on one, single process can still yield all the disadvantages of “being shared” (singleness) in the form of a bottleneck - so a single process isn’t necessarily a good replacement for a single ETS table (depending on the circumstances of course).

I still think that you have to reveal the complete interaction pattern before it can become clear what the best solution is.

alvises · February 15, 2019, 4:10pm

nice! I just saw Registry.dispatch, is it this you were referring to?

I’m used to use Registry with :via. I didn’t see this function

Fl4m3Ph03n1x · February 15, 2019, 4:29pm

Good question. I actually don’t care about the response. I only care that I made the request. So I won’t be waiting for anything.
I am also using the Registry in a preliminary solution attempt, but thanks for pointing out!

Now this is interesting to me. I know Registry removes dead processes, which is great, but if that is all it does, then overtime time (as some processes eventually fail) I will end up with an empty Registry.

You mentioned int depends on which supervisor it is using. Great news for me. Can I tell this supervisor to create a new gun connection to replace the dead one? (By passing a function or something?)

I couldn’t find documentation about this.

This is interesting. Do you mean that the Registry process would actually be the bottleneck here? I have trouble understanding that, since the Registry uses the ETS, which goes back to the first approach I mentioned.

I will attempt to create a better post latter on, with all my questions and without the (horrible) analogy that is confusing everyone.

You can do it via dispatch, yes. But the docs go even further:

https://hexdocs.pm/elixir/master/Registry.html#module-using-as-a-pubsub

peerreynders · February 15, 2019, 4:30pm

Unlikely as the lookup would be pretty quick. But ETS has been optimized for shared access so any additional layer over top like a guardian process would tend to slow things down - whether that matters is a separate story.

You do likely care that a process received your request. Because a send (or cast) to a dead pid won’t result in an error.

Supervisors simply restart processes the same way they started them the last time - for something a bit smarter (and therefore more fragile) look into parent (Rationale).

If for whatever reason a fresh process cannot create it’s own connection it could always get another (named, specialized) process to create it.

Looked at The Hitchhiker’s Guide to the Unexpected yet?

Smart systems make stupid mistakes

benwilson512 · February 15, 2019, 4:30pm

There is no single Registry process. Registry is very well optimized having been derived from Phoenix pubsub. It is not likely to be your bottleneck.

Sending a message doesn’t require that you get a response.

gregvaughn · February 15, 2019, 6:48pm

I think you missed my point. I’m not talking about the supervisor of the registry (whose job is to keep the registry running), but the supervisor of your gun processes. That supervisor is given a child spec with a restart: :permanent like any other supervisor can receive, and it will restart your process when it dies. During your process’ startup, it can then register itself.

Docs for restart values are here in the Supervisor module Supervisor — Elixir v1.16.0

bottlenecked · February 23, 2019, 10:50am

This still sounds like poolboy to me- unless each connection is to a different endpoint, in which case process registration like other people mentioned here would work better. But if you connect to the same (or just a couple) different endpoints and do not wish to pay upfront the cost of opening 10000 connections, you can do that lazily with poolboy using a small size with a large max overflow