Holding Reference of ETS and reuse it in sockets

rurkss · February 11, 2021, 9:36am

Hello community!

We use phoenix and sockets in our application. and keep additional data in state of socket like that

  defp success_connect(socket, auth_data) do
    socket =
      socket
      |> assign(:current_entity, auth_data)
      ....
      ....

As far as application grows, we need to store more and more data during socket connection. So we would like to transfer data into ETS tables. We init new ETS table as soon as socket connection is done:

defp success_connect(socket, auth_data) do
    ets_ref = :ets.new(:state_store, [:set, :public, {:read_concurrency, true}])    
    
    socket =
      socket     
      |> assign(:store_ref, ets_ref)
     .....

and here comes a problem. Inside of socket’s handle_out/handle_info/handle_in methods, we try to work with ets table, but we keep getting an error, because application does not see table by ref_id

def handle_out(event, data, socket) do
 ets_ref = socket.assigns.store_ref
 :ets.lookup(ets_ref, "some_key") -> ArgumentError, because can not retrieve table by ref_id
end

could you please explain why it is so? and how to achieve this approach?

lud · February 11, 2021, 10:48am

Generally it means that the process that created the ETS table exited. I believe that success_connect is called after connect on the socket module ?

In that cas, this process will exit once the channel process is started. Your handle_out function is executed in the channel process, but by the time it is called, your socket process (that was just started to establish the connection) is long gone and the associated ETS table was deleted.

rurkss · February 11, 2021, 11:29am

100% you are right, just checked it. I had to move init of ETS from socket to channel context. Thank you!

Exadra37 · February 11, 2021, 12:25pm

Or you can create a GenServer or Agent to deal with this data and still create the ets table from them, thus they could be used from the moment a request is received to mount the socket.

lud · February 11, 2021, 2:05pm

Indeed @rurkss you are creating a new ETS table per channel where you could use a single table for all your processes.

The number of available ETS tables is limited. When you create an ETS table per channel (and as it is destroyed when the channel process exists), there is no advantage doing that instead of just storing your data in assigns ; besides the table being public.

rurkss · February 11, 2021, 2:35pm

By using single ETS table, we have to deal with OLD data in the table, we have to manually remove unnecessary records. In this way (creating new table for every new connection), all data will be removed with whole table as soon as channel will be disconnect.

BTW, i did not know there are limit of ETS table instances may be created
PS there is a limit in 1400 tables per node, it is way to low…

olivermt · February 11, 2021, 4:27pm

Its not too low if you consider an ETS table like a database table. If you want data to die move your data to hide behind a genserswr and just add socket id as a parameter. Then you can have sweeps now and then to get rid of old data if that is a concern.

You are using the system at odds with how it was designed to be used.

drolenc · February 12, 2021, 12:59am

I concur. I use ETS tables inside a GenServer for session data. In my case, I just keep track of expiration times, and a single :ets.select_delete call executed periodically with Process.send_after takes care of cleaning up any mess.

garazdawi · February 12, 2021, 7:34am

Since a couple of years ago the limit is for named ets tables only. There is no limit on the number of unnamed ets tables.

The limit can be increased by starting Erlang with the +e Num.

http://erlang.org/doc/man/ets.html#max_ets_tables

rurkss · February 12, 2021, 8:55am

The reason why we prefer ETS over GenServer state is because of performance.

LostKobrakai · February 12, 2021, 9:07am

Does your data actually need to be accessible to external parties? ETS is indeed the better option than asking an external GenServer for data, but data stored in the process your code runs in is even faster than that. So I’m wondering why you want to move from having state in the channel process itself to some external location.

rurkss · February 12, 2021, 9:34am

the reason is amount of data to be kept in state. Should we keep more than 100_000 records in socket state map?
we have many socket connection, around 10_000 every day, and every of them in its state have to hold about 100_000 additional key=>value data.

LostKobrakai · February 12, 2021, 9:36am

The amount is less the concern. Are those the same data? If yes, then sharing a common data source makes sense. If not then it’s better to keep data local.

rurkss · February 12, 2021, 9:41am

no, data is unique for every connection. we just thought that state is not to be meant for storing big data.

LostKobrakai · February 12, 2021, 9:45am

Unique per connection with channels will still mean shared state. With channels there are n+1 processes involved. 1 process holding the connection and n further for each individual channel that connection joined. So if all channels of one connection need the same kind of data, then it’s still shared. If however you only need the data in one specific channel it might be worth storing the data in that channels process.

Not really. It’s all in memory anyways, it just depends on where in memory and how fast certain parties can access that part of the memory.

lud · February 12, 2021, 9:58am

I would use a single table with a column holding a channel or user identifier, and a GenServer to monitor the channels. This server can hold the table.

On join, a channel would call the GenServer to be monitored. On channel process exit, the GenServer would start a timer of N seconds. If before N seconds a new channel process is started and calls the GenServer to be monitored with the same identifier, then nothing happens. If not, then the GenServer deletes all entries for the tables belonging to the channel/user identifier.

I used this technique for a game server, where after 30 seconds without a new connection, a player is considered gone and I can send an event message to the game state process to tell that the player left the party. I left the code in an unfinished state because it worked for my simple needs (the game was very simple and there were not many players), but it’s a start: mogs/tracker.ex at master · lud/mogs · GitHub

Now about a simple map in socket assigns vs. ETS table, I guess the performance of ETS will be better is you have to write often to the table. If you build the data early and then mostly do lookups by key I would go with the simpler route as a good-enough, simpler, faster to implement way. But I am not event sure that ETS will make a huge difference. Remebering AdventOfCode 2020 day 23, working with a map of 1 million elements was not fast but still way faster than what I would expect from erlang maps. I don’t know the details of the implementation but erlang does not seem to blindly recreate a 1Million keys map whenever you change a value inside.