Nif resource calling dtor randomly

I’m building a project that controls a quad copter via usb dongle. I have a cpp api to interact with it via various methods and callbacks, so i went about writing a NIF that uses a thread and enif_send to send data back to Elixir/Erlang.

It all works pretty good, but it seems like the GC is calling the dtor function on my resource (which stops my thread) after a pretty short amount of time, which causes me to loose a connection to the copter causing it to fall out of the air. Is there any info out there on what causes ERTS to garbage collect enif_resources? I know of at least one situation where the reference count goes to zero ie:

Interactive Elixir (1.6.6) - press Ctrl+C to exit (type h() ENTER for help)
iex(1)> {:ok, cf} = Crazyflie.connect("radio://0/80/250k") # nif call that starts a `enif_thread`
iex(2)> respawn()
Interactive Elixir (1.6.6) - press Ctrl+C to exit (type h() ENTER for help)
iex(1)> rt_dtor called
cleaning up thread

now since nothing holds a reference to cf anymore, it gets destructed. Ideally that should be the only time i would think, but something is causing my cf variable to be garbage collected, then restarted for some reason.

I’m wrapping it in a simple gen_server currently:

defmodule Crazyflie.Server do
  use GenServer

  def start_link(args \\ ["radio://0/80/250k"]) do
    [uri] = args
    GenServer.start_link(__MODULE__, args, name: name(uri))
  end

  def subscribe(uri \\ "radio://0/80/250k") do
    GenServer.call(name(uri), {:subscribe, self()})
  end

  def init([uri]) do
    {:ok, cf} = Crazyflie.connect(uri)
    {:ok, %{cf: cf, uri: uri, reg: nil}}
  end

  def terminate(reason, _) do
    IO.inspect(reason, label: "Server died")
  end

  def handle_info(info, %{reg: nil} = state) do
    {:noreply, state}
  end

  def handle_info(info, state) do
    send(state.reg, {__MODULE__, {state.uri, info}})
    {:noreply, state}
  end

  def handle_call({:subscribe, pid}, _, state) do
    {:reply, :ok, %{state | reg: pid}}
  end

  defp name(uri) do
    :"#{uri}"
  end
end

The genserver itself (which is started in a supervision tree) never exits, but the nif destructor still is being called.

4 Likes

I don’t know If you are using the Port module to call C code. I think It is very easy create the Port and use the C code. Check out http://erlang.org/doc/reference_manual/ports.html

1 Like

I can’t use a port because of the latency involved. I’m using a NIF. (Native implemented Function) to get Max speed outfor the IO calls.

My intuition was that cf is getting copied somewhere, but I don’t see where. Did you try calling enif_keep_resource ?

I didn’t try enif_keep_resource. I’ve never used it before, but it seems like it may be part of what i want.

EDIT:
just tried it. Didn’t seem to have do what i wanted either.

here is the entire code btw. Not incredibly complex.

EDIT2:
i just found this in the docs

A resource object is not deallocated until the last handle term is garbage collected by the VM and the resource is released with enif_release_resource (not necessarily in that order).

which seems to validate my assumptions, although that is not what i’m seeing here. I’m going to try saving the data on the priv_data struct maybe?

The logging in terminate/2 won’t do anything since your server isn’t trapping exits. Did you confirm the process isn’t dying via some other means?

I’ve turned on otp and sassl reports and I don’t see anything.

ANOTHER UPDATE:

21:20:26.912 [error] Process :"radio://0/80/250k" (#PID<0.206.0>) terminating
** (FunctionClauseError) no function clause matching in :gen.reply/2
    (stdlib) gen.erl:198: :gen.reply(:selftestPassed, {:error, {:unknown_system_msg, 1}})
    (stdlib) sys.erl:370: :sys.handle_system_msg/8
    (stdlib) proc_lib.erl:249: :proc_lib.init_p_do_apply/3
Initial Call: Crazyflie.Server.init/1
Ancestors: [Crazyweb, #PID<0.203.0>, #PID<0.79.0>]
Message Queue Length: 41
Messages: [{:flightmode, :althold, 0}, {:flightmode, :poshold, 0}, {:flightmode, :posSet, 0}, {:flightmode, :yawMode, 2}, {:flightmode, :yawRst, 0}, {:flightmode, :stabModeRoll, 1}, {:flightmode, :stabModePitch, 1}, {:flightmode, :stabModeYaw, 0}, {:cmdrCPPM, :rateRoll, 720.0}, {:cmdrCPPM, :ratePitch, 720.0}, {:cmdrCPPM, :rateYaw, 400.0}, {:cmdrCPPM, :angRoll, 50.0}, {:cmdrCPPM, :angPitch, 50.0}, {:locSrv, :enRangeStreamFP32, 0}, {:pid_rate, :roll_kp, 250.0}, {:pid_rate, :roll_ki, 500.0}, {:pid_rate, :roll_kd, 2.5}, {:pid_rate, :pitch_kp, 250.0}, {:pid_rate, :pitch_ki, 500.0}, {:pid_rate, :pitch_kd, 2.5}, {:pid_rate, :yaw_kp, 120.0}, {:pid_rate, :yaw_ki, 16.700000762939453}, {:pid_rate, :yaw_kd, 0.0}, {:pid_attitude, :roll_kp, 6.0}, {:pid_attitude, :roll_ki, 3.0}, {:pid_attitude, :roll_kd, 0.0}, {:pid_attitude, :pitch_kp, 6.0}, {:pid_attitude, :pitch_ki, 3.0}, {:pid_attitude, :pitch_kd, 0.0}, {:pid_attitude, :yaw_kp, 6.0}, {:pid_attitude, :yaw_ki, 1.0}, {:pid_attitude, :yaw_kd, 0.3499999940395355}, {:sensorfusion6, :kp, 0.800000011920929}, {:sensorfusion6, :ki, 0.0020000000949949026}, {:sensorfusion6, :baseZacc, 1.003791332244873}, {:posEst, :estAlphaAsl, 0.996999979019165}, {:posEst, :estAlphaZr, 0.8999999761581421}, {:posEst, :velFactor, 1.0}, {:posEst, :velZAlpha, 0.9950000047683716}, {:posEst, :vAccDeadband, 0.03999999910593033}, {:velCtlPid, :vxKp, 25.0}]
Links: [#PID<0.205.0>]
Dictionary: []
Trapping Exits: false
Status: :running
Heap Size: 610
Stack Size: 27
Reductions: 38149
 
21:20:26.912 [error] Child Crazyflie.Server of Supervisor Crazyweb terminated
** (exit) an exception was raised:
    ** (FunctionClauseError) no function clause matching in :gen.reply/2
        (stdlib) gen.erl:198: :gen.reply(:selftestPassed, {:error, {:unknown_system_msg, 1}})
        (stdlib) sys.erl:370: :sys.handle_system_msg/8
        (stdlib) proc_lib.erl:249: :proc_lib.init_p_do_apply/3
Pid: #PID<0.206.0>
Start Call: Crazyflie.Server.start_link(["radio://0/80/250k"])
Restart: :permanent
Shutdown: 5000
Type: :worker

It flew by so fast i didn’t notice it the first time. I think GenServer isn’t able to handle all my messages maybe?

The problem is you’re sending messages like {:system, _, _}, because there’s a group called system in the crazyflie code. You can’t use that pattern, it’s reserved for system messages in Erlang behaviors.

4 Likes

Oh great catch! I’ll fix that up and I bet it’ll fix it. Will report back!

Alright that was it! no crashes or anything now. Thanks @dom. @michalmuskala also spotted the problem on twitter here

Thanks for all the help guys!

I was worried about Erlang not being able to handle the messages fast enough but that does not seem to be a problem.

2 Likes