GenServer Call timeout when connected in circle

Hi,

I want to implement something like this. Lets say I have 10 genserver nodes connected in the circle.
How are they connected ? Every genserver contains “PID” of its successor genserver.
Every genserver contains in the state some random “KEYS” as well. Lets say I start the entire system.
Now all the 10 genservers are up and running. Now I select one genserver and ask for the key using “Genserver.call” [I do this from iex shell] . The current genserver can contain the key or not. If it does not contain the key then the GenServer makes a call on the successor genserver using “GenServer.call”. This is how the cycle goes. Once the keys is found the answer is returned.

But my GenServer.call leads to a timeout. Let’s say I increase the timeout time for 10 nodes. But what if my system contains “1000000” nodes.

How to solve this problem ? I am very new to elixir.

Welcome to Elixir! I’m glad to see you’re jumping in and looking at GenServers. Unfortunately, the way you’re using them here has a few issues, one of which you’re running into now (another is that you’re copying the data N times).

Can you describe at a high level what you want to achieve? It sort of sounds like you want a sharded Key / Value store, is that right?

2 Likes

Yes, I want to create a distributed key/value store. The keys are distributed among ‘N’ genservers.

Is this for your learning purposes or is this for a production situation? If you’re trying to build something for production I’d just use Cachex, it has a distributed mode.

If this is for learning purposes, then let’s dig into what’s going on here a bit. First, can you provide some code? While daisy chaining GenServers isn’t a great approach, it still shouldn’t timeout with just 10. I wonder if you’re creating a deadlock somewhere.

3 Likes

This is a big point here, you don’t want to daisy-chain access, you’d want them each to manage a ‘shard’ of the total key space (like a hash of it is common) and name them based on the shard they manage, so then you can just directly query the one genserver that would have the data without needing to touch any of the others.

2 Likes

No, I am doing it for learning purposes only.

Yes, it seems to work for 10 gen_servers, but what if I have got 10M servers and every server has got huge ‘key/value’ table. That is when it is a problem.

2 Likes

Ah yes, that’d be a huge problem. This is why the daisy chain approach doesn’t work. As noted by @OvermindDL1 what you want is to have a hashing function that tells you which GenServer to ping, and then you ping just 1 genserver.

Here’s another issue with the dailsy chain approach. If it takes N calls to get the key, that’s N genservers that are all blocked waiting for a response. If multiple clients are trying to get keys, they’re going to block each other alot, which kills the point of using multiple GenServers in the first place instead of just 1.

3 Likes

What you are saying is totally right.

But my main question is “Is it possible to make genserver “calls” from one genserver to another genserver?”

Aren’t you already doing that?

Yes, I am doing it. But I am getting a timeout. If my implementation is right and assuming there is no deadlock, it should work right ?

If you’re getting a timeout, what did you mean by:

Yes, it seems to work for 10 gen_servers

Can you provide code? We can’t offer detailed help without it. If you aren’t building things with a deadlock, and if you aren’t doing something particularly slow in each GenServer, then it should not timeout with just 10 genservers.

1 Like

I will try to create a dummy project and will share the repo with you in some time.

I truly appreciate your help. Thank you so much for bearing with me.

1 Like

Well a quick dummy project that passes a value around and gets something that has it might be like:

defmodule Blah do
  use GenServer

  def start_link({range, next}), do: GenServer.start_link(__MODULE__, {range, next}, [])

  def set(pid, key, value), do: GenServer.cast(pid, {:set, key, value})

  def get(pid, key), do: GenServer.call(pid, {:get, key})
  
  def init({range, next}), do: {:ok, {%{}, range, next}}

  def handle_cast({:set, key, value}, {values, myrange..fullrange = range, next})
  when rem(key, fullrange) == myrange do
    values = Map.put(values, key, value)
    {:noreply, {values, range, next}}
  end
  def handle_cast({:set, key, value}, {_, _, next} = state) do
    set(next, key, value)
    {:noreply, state}
  end
  
  def handle_call({:get, key}, _from, {values, myrange..fullrange = range, _next} = state)
  when rem(key, fullrange) == myrange do
    value = Map.get(values, key)
    {:reply, value, state}
  end
  def handle_call({:get, key}, _from, {_, _, next} = state) do
    value = get(next, key)
    {:reply, value, state}
  end
end

nop = spawn(fn -> receive do msg -> IO.inspect(msg, label: UnhandledHash) end end)
pid = Enum.reduce(9..0, nop, fn i, next -> Blah.start_link({i..10, next})|>elem(1) end)

Blah.get(pid, 42)
Blah.set(pid, 42, :boop)
Blah.get(pid, 42)
2 Likes

If every server has a huge key/value table then you want to fail fast, or make the call as parallel as possible.

One idea for failing fast is to have a range index, when the server receives the request it checks if it might have the key. If not it passes it on to the next server. If yes then does the lookup. This implies some ordering of the keys though.

One idea for parallelizing the request, when the server receives the request it spawns a Task which asks the next server in the daisy-chain. It then proceeds to check its own cache using another Task. Should it find it, it informs the next server to stop the search.

All in all, using a daisy-chain is O(n) and will get slower the more servers you add. This is unavoidable.

1 Like

How large is the data structure you are searching. The default time out for a GenServer.call is 5s. As long as your 10 servers take longer than 0.5s for the search each the initial GenServer.call will timeout.

2 Likes

I won’t go into a discussion about how to structure a service like this as others have already done that.

You had a few other questions as well:

  • Any process can make a GenServer.call, the only requirement is that the process you call is a GenServer[*]
  • The GenServer.call is synchronous and has a default timeout of 5 sec. You can set the timeout with a third argument which is the timeout in msec. You can also give it the value :infinity in which case the caller will wait forever.

You have to be careful with synchronous communication as it makes it very easy to hang the system.

[*] Or to be precise obeys the GenServer API. You can write your own if you wish, it is
an interesting exercise.

3 Likes

Hi @benwilson512 @tty @OvermindDL1 , Thank you so much for helping. It means a lot.

I just checked my code. After hours of debugging, I found out that I am actually creating a deadlock. I cant share the entire code here because I have to submit my project. My project was to implement a chord protocol. My genservers are arranged in the circle, and all of GenServers are connected to not only their neighbours but also other GenServers according to some specifications ( I will not bore you by getting into the nitty gritties of the algorithm).

That is why when I use “GenServer.call” , then the deadlock occurs and the all the servers hung up.

I think the best way to acheive this is to use Task.async, in order to query different GenServers accordingly as suggested by @tty

1 Like

Yes, that is why is happening to me. Please have a look here - GenServer Call timeout when connected in circle

This is the actual problem I am facing.

Thank you for the reply. It means a lot.

1 Like