ETS tables unexpectedly stop existing

hey, I’m working on an implementation of the Reticulum Network Stack in elixir. however, I have a strange bug that seems to have appeared out of nowhere: when I run it using mix run -e "RNS.Reticulum.start_link()" --no-halt, I get this output:

Compiling 1 file (.ex)
     warning: variable "source" is unused (if the variable is not meant to be used, prefix it with an underscore)
     │
 448 │       {:inbound, packet, source} ->
     │                          ~
     │
     └─ lib/rns/packet.ex:448:26: RNS.Packet.wait_for_proof/3

Generated rns app

17:29:58.536 [debug] The config file does not contain 'transport'. We will instead use the default value, which is false(disabled).

17:29:58.541 [debug] The config file does not contain 'data_directory'. We will instead use the default value, which is '~/.local/share/reticulum_ex/'.

17:29:58.542 [info] RNS is starting...

17:29:58.562 [debug] Starting interface RNS.Interface.TCPClient with parameters %{"host" => "202.61.243.41", "port" => 4965}...

17:29:58.575 [debug] Interface GenDestination#<6b9f66014d9853faab220fba47d02761> started with pid #PID<0.191.0>

17:29:58.576 [warning] Interface with pid #PID<0.191.0> and id "GenDestination#<6b9f66014d9853faab220fba47d02761>" has exited with reason :shutdown!

17:29:58.646 [debug] Interface TCPClient@202.61.243.41:4965 started with pid #PID<0.192.0>

17:30:00.080 [error] GenServer #PID<0.192.0> terminating
** (ArgumentError) errors were found at the given arguments:

  * 1st argument: the table identifier does not refer to an existing ETS table

    (stdlib 7.1) :ets.member(:packet_hashes, <<110, 88, 109, 8, 49, 139, 206, 247, 78, 43, 81, 190, 92, 0, 35, 150, 18, 130, 108, 79, 1, 25, 167, 1, 191, 114, 235, 23, 112, 226, 244, 72>>)
    (rns 0.1.0) lib/rns/packet_hash_store.ex:32: RNS.PacketHashStore.exists?/1
    (rns 0.1.0) lib/rns/packet_handler.ex:28: RNS.PacketHandler.start/3
    (rns 0.1.0) lib/rns/interface/tcp_client.ex:81: RNS.Interface.TCPClient.handle_info/2
    (stdlib 7.1) gen_server.erl:2434: :gen_server.try_handle_info/3
    (stdlib 7.1) gen_server.erl:2420: :gen_server.handle_msg/3
    (stdlib 7.1) proc_lib.erl:333: :proc_lib.init_p_do_apply/3
Last message: {:tcp, #Port<0.9>, <<126, 81, 2, 55, 62, 77, 215, 139, 61, 49, 156, 165, 232, 81, 188, 237, 168, 25, 133, 188, 73, 236, 11, 4, 111, 1, 18, 35, 183, 192, 70, 227, 194, 22, 90, 0, 144, 60, 127, 114, 193, 25, 155, 71, 6, 159, 246, 237, 187, 141, 105, 92, 65, 92, 251, 102, 31, 114, 49, 54, 213, 64, 245, 185, 217, 12, 179, 112, 13, 176, 104, 80, 11, 81, 35, 5, 142, 104, 198, 225, 147, 107, 213, 81, 23, 33, 122, 168, 169, 240, 34, 128, 125, 94, 202, 46, 255, ...>>}
State: {{:interface, "TCPClient@202.61.243.41:4965", #PID<0.192.0>, :full, nil, false}, #Port<0.9>, false}

here’s the code snippet of the creation of the ETS table:

:ets.new(:packet_hashes, [
    {:read_concurrency, true},
    {:write_concurrency, true},
    :public,
    :named_table
  ])

see the full code

this happens when a packet is received. the ETS table doesn’t seem to exist, but I have confirmed(with IO.inspect()) that the ETS table is created and does exist for some time, but later, it disappears.
the simplest explanation would be that the process died, but I have a supervisor supervising it and I did not get any error messages.
I have managed to re-create the table in IEX before the first packet was received, and I got an error for yet another ETS table, so it would seem that this has happened to all ETS tables.

can anyone tell me what the heck is going on or do I need to call an exorcist?

Do you have the same trouble with RNS.AnnounceHashStore? I ask because it’s a GenServer, which is the way I would normally create this kind of process (while the process giving you trouble is using spawn_link and Process.sleep(:infinity)). I’m wondering if the owning process tied up sleeping is the cause of the problem

Also, welcome @int32!

1 Like

I have tried making RNS.PacketHashStore a GenServer and the problem remains.
here’s the code of RNS.PacketHashStore as a GenServer:

defmodule RNS.PacketHashStore do
  @moduledoc """
  Stores the hashes of all sent/received packets.
  This is to be sure no packets are received twice, and also limits attacks on announces which could be used to block some paths.
  """

  use GenServer

  @spec start_link(any()) :: {:ok, pid()}
  def start_link(opts) do
    GenServer.start_link(__MODULE__, opts)
  end

  # Client
  @doc "Adds a packet hash."
  @spec add(RNS.Packet.hash()) :: :ok
  def add(packet_hash) do
    true = :ets.insert(:packet_hashes, {packet_hash})
    :ok
  end

  @doc "Checks if the packet hash is in the table."
  @spec exists?(RNS.Packet.hash()) :: boolean()
  def exists?(packet_hash) do
    :ets.member(:packet_hashes, packet_hash)
  end

  # Callbacks
  @impl true
  @spec init(any()) :: {:ok, nil}
  def init(_opts) do
    :ets.new(:packet_hashes, [
      {:read_concurrency, true},
      {:write_concurrency, true},
      :public,
      :named_table
    ])

    {:ok, nil}
  end
end

do you think making it a GenServer is better(not for the bug, just as a general question)? as it doesn’t listen to anything, I’m not sure it makes sense of having it as a GenServer, I might even remove the process and make RNS.Reticulum create the table instead.

I agree; no need to make a genserver just to own the ets table. You can create the ets table in the supervisor.

If the ets table does not exist, it is either not created yet or has been removed due to the owning process has quit.

1 Like

IO.inspect has proven that it WAS created and did exist at some point, and shouldn’t the supervisor warn me if the owner exited?

UPDATE: I have now moved the :ets.new call to the RNS.Reticulum supervisor and removed the RNS.PacketHashStore process.

Has your original problem been resolved?

hasn’t!

Ah, the problem is how you are running your application

Try this instead:

iex -S mix
iex(1)> RNS.Reticulum.start_link()

You might want to switch RNS.Reticulum from a Supervisor to Application, and start it from mix.exs

that doesn’t really make sense for a library…

This is bad:

You can’t sleep in init. inits are called sequentially, so if one sleeps, nothing behinds are called.

In a general sense, you should refrain from calling Process.sleep/1 pretty much everywhere. What is your intent?

1 Like

actually, the function was called init but it was no init function. it was a process.
just bad naming :slight_smile:

Are you expecting the users of your library to start the supervisor themselves, and more specifically would it make sense for them to potentially start more than one of them?

it would not make sense to start more than one of them, but I am expecting them to start the supervisor themselves, if not so they can have it when they need it if they need it, then so they can give it the right configuration.

Exactly. The process that executes -e/—eval terminates at the end, which will bring down any linked process.

A cool trick I like to do in processes is this:

parent = self()
spawn(fn ->
  Process.monitor(parent)
  receive do: (msg -> IO.inspect(msg))
end)

This will print when the parent terminates, which should help debug the issue.

5 Likes

Just so you know… it can make sense for a library to start an “OTP application” during start of the BEAM. Some examples: IEx, Req, Phoenix.

Any supervised process I start in production code I prefer to be OTP-compliant, which is much easier to do when using tools provided by stdlib

well I fixed that problem by simply eliminating that process and creating the ETS table from the supervisor.

1 Like