ETS Table - read concurrency

Hi everyone.

I have been going around with this issue for a while and will like some help from you guys.

Use case
I am making a web application that will help you find your GMAIL INBOX attachments faster.
Basically you will go to this public web application, sign in with your gmail credentials and give the app read only permissions to your inbox.
The app will concurrently download the metadata of all the messages that have a attachment and store it locally.
Finally, you will be able to download the attachment file via a link.
The app works fine with a Postgresql but I wanted to use ETS to store the metadata temporarily in memory and when the user signs out delete it.

The issue
When the user signs in, it takes a while to download the metadata but I have to get back to the user fast so I make a GenServer cast call to download the metadata and return to the user. This works fine.
The problem is that I try to access the ETS table a few seconds after starting the download and I get this error:

So I am un able to load the users contacts until the download process finishes.

The created ETS table is not that big:

But the download process takes about 5 minutes to download (40k inbox messages).

Questions:

  1. Is ETS sutible for my use case?
  2. How can I access ETS table while the download process writes to the table?

Thanks for your help and comments.

  1. It should work, but I’m not sure why you chose ETS over a plain old map.
  2. Have the GenServer spawn a separate Task to do the actual downloading. The task can communicate back by messages, or write to the table directly if you set it public.
1 Like

The problem with this solution is that while the ETS table will be accessible almost immediately it will not contain all the data for a while. Whether this is acceptable depends on your application. This problem is not dependant on how you decide to build your table.

The reason for your original error is that there is an implicit 5000 ms timeout in GenServer.call. You can make this longer by adding an extra argument with an explicit timeout. This is probably not a good way to solve your initial problem.

2 Likes

@dom, @rvirding thanks for your support.

Robert, I totally agree. All the data will not be available at the time of the first. I will reduce the amount of data to download for the first time and leave a background job downloading the rest. For example. the first download will focus on the past month only.

I will still need to access this table while it is been written to.

Best regards.

There are no problems reading from an ETS table while it is being updated, as long as the processes can access the table. The interface to ETS tables is safe in the sense that reading and writing at the same time will never corrupt the table or the data in it. That is one thing you don’t have to worry about. This works automatically without using the read_concurrency or write_concurrency flags.

2 Likes

Hi everyone.

@rvirding following up on this issue, I haven’t been able to solve it. I am still getting the following error:

But the table is created successfully on the background as shown below:

As you can see, the Owner is try to access the table and it has public access enabled.

Any suggestions or comments are welcome.

Thanks.

From what I can see is that a process does a GenServer.call to process #PID<0.1409.0>, get_contacts it looks like. The error comes from that the calling process times out after 5000 ms. So the GenServer never replies to the calling process, or at least replies too slowly. What does the code in the handle_call callback which handles the get_contact look like? Does it return a reply tuple, or is there some reason it hangs?

Just want to point out that even if there is no explicit timeout given in the GenServer.call there is a default timeout of 5000 ms.

This is the Client API:

And this is the Server Callback:

Thanks

Ok, the handle_call shouldn’t take 5 seconds (the timeout). So perhaps the gen_server is already doing something.

For example, is the GenServer downloading the list and blocking the other callers?

For example if you have:

def handle_call({:download}, _from, state) do
   thing = download_takes_long_time(..)
   {reply, ..., state}
end

you would block the entire GenServer until the the {:download} call returns.

Also, I am not sure that an ets table is the ideal choice for you here. An ets table is generally used when a GenServer with normal state is too slow or needs concurrent reads to it.

In your case, you are reading from the ets table in the GenServer which doesn’t really need to be the case. You could read directly in the get_contacts function.

Can you please use copy and paste code and put it in a fenced code block instead of screenshotting it?


The way you are creating the map feels a bit overcomplicated.

If all entries are equally shaped you should be able to achieve a similar result but maybe faster by doing the following:

def handle_call({:get_contacts}, _from, state = [table|_]) do
  contacts = :ets.foldl(fn {k, v, _, _}, acc -> Map.put(acc, k, v) end, %{}, table)
  {:reply, contacts, state}
end

This is not tested though. Of course you need to adjust the shape of the first argument to the fn. If it is a 2-tuple already you could experiment with just removing the map you currently do…

@nobbz I will do the copy and paste instead of images.

Thanks, the code worked fine. I just modified the first argument as you said.

Still getting the timeout. I will keep on trying.

Thanks.

Then its probably that you are exhausting your GenServer with to much work at once. as @cmkarlsson already guessed.

Yes, I am taking his suggestion. I will change the code accordingly.

Thanks guys @rvirding, @cmkarlsson, @NobbZ

Would you mind posting your modified code?