Erlport unable to cast to Supervisor on Nerves

My application is to run a MyApp.Worker with Erlport.

And when I try to use handle_info and GenServer.reply, it always timeout.

First, I start the worker in MyApp.Application

defmodule MyApp.Application do
  def start(_type, _args) do
    # ...
    children = [
      {MyApp.Worker, app_dir: "priv/ruby"}
    ]
    Supervisor.start_link(children, options)
  end
end

And the worker uses Erlport to call some Ruby method.

defmodule MyApp.Worker do
  def handle_call(:ping, from, state) do
    state |> :ruby.cast({:ping, from})
    {:noreply, state)
  end

  def handle_info({:pong, from], state) do
    GenServer.reply(from, :pong)
    {:noreply, state)
  end
end

In the Ruby, it will start a thread and reply it async.

# In side message handler
def handle((event, from))
  Thread.new do
    cast @server, [:pong, from]
  end
end

It works well in my macOS, but when I deploy it RPi3 it cannot cast to the MyApp.Worker and get timeout error. (The handle_info didn’t receive any event)

But if I cast it to other PID, it works well.

The official Nerves systems don’t ship with Ruby. Could it be that there’s an error starting ruby and it’s showing up as a timeout?

It is possible to add Ruby to a custom Nerves system. That would just be Ruby, though. Not sure if that helps.

Frank

I use my customize Nerves with Ruby and correct receive event from Ruby if it send to PID I created. But if it try to send to the supervisor created worker, it will unable to send anything. So I am sure my Ruby work correctly.

But I am not sure what happen, I cannot see any message from Ruby’s STDOUT, I guess it is the Nerves didn’t handle it.

There’s no Nerves code involved at this point. Nerves provides infrastructure around a minimal Linux environment to handle things like userspace initialization, firmware updates, and some embedded systems-specific features. Where you’re at should only involve the BEAM, Linux to handle IPC, and Ruby.

I feel like there’s some configuration or setting that Erlport might be getting wrong or your supervisor-created worker is not running or stuck. It’s hard to say without studying the code.

Do the logs indicate anything? Perhaps enable SASL (add :sasl to your :extra_applications, and handle_sasl_reports: true to the Elixir logger config) to see if your supervision tree is doing anything unexpected.

2 Likes

I got more information from RingLogger.next but I am not sure where is wrong.

The Log: 02:13:21.944 [error] GenServer #PID<0.1121.0> terminating** (stop) exited in: - Pastebin.com


My implement details is here:

There has two type to send information to Elixir-side, the first one is cast to @server

@server is the PID from Erlport’s register_handler

This behavior is the timeout case.

Another is cast my handler, and it works well

The reply target is created by Tide.Reaction.start_link() and it didn’t start by Supervisor.

Therefore I can sure my Ruby is running and Elixir can communicate with it, but there has some problem with Nerves or it is limited by some STDIO feature.

But I am unable to disable Erlport use STDIO and the STDOUT redirect feature is not working.
Disable STDIO mode still unable send event back to GenServer created by Supervisor

I’m not sure I understand how it’s a STDIO issue when your cast works and Erlport’s cast doesn’t. It sounds like the pid in @server is different from the pid that works for you.

Since there’s no “Nerves layer”, it might be possible to reproduce your issue on a Linux desktop computer. I see that you’re running on MacOS. Could you try Linux? Raspbian would work as well, and nerves_system_rpi3 uses Linux kernel versions from the Raspberry Pi Foundation’s Linux kernel repository.

I create a CentOS 7 server with elixir-1.10.2-otp-22 and erlang-22.3 and ruby-2.4.9 to build a same environment as Nerves image I created.

And it works well in my Linux server.


I am testing for STDIO because Erlport use STDIO to exchange data between Elrang and Ruby, but it seems the reason to make it failed.

Does anyone have any idea about this problem?

I also try to set the timeout to infinity to wait for it, but it seems not caused by Ruby takes a long time to respond to Elixir.

But I didn’t know the details inside Nerves have any different when Erlang running and communicate with Ruby by STDIO.

Could you share an example program that reproduces the issue and instructions for running?

The produced firmware: https://send.firefox.com/download/7e858aa516462489/#J9OYYLhYkcTJwNtHBiTEqA

Please use TideRpi3.exec/0 to test for it

The related codes (It should use menuconfig to add Ruby package support)

# mix.exs

# dependencies
defp deps do
  [
    # ...
    {:tide, "~> 0.3"},
    # ...
  ]
end

The ruby file to return data to Elixir

# priv/ruby/app.rb

# frozen_string_literal: true

Elixir::Tide.on('ping') do
  reply :ok, "PONG"
end

In application add Supervisor to start ruby process.

# lib/my_app/application.ex

def start(_type, _args) do
    # See https://hexdocs.pm/elixir/Supervisor.html
    # for other strategies and supported options
    opts = [strategy: :one_for_one, name: TideRpi3.Supervisor]
    children =
      [
        {Tide.Supervisor, root: :code.priv_dir(:tide_rpi3) |> Path.join("ruby"), file: "app"},
      ] ++ children(target())

    Supervisor.start_link(children, opts)
  end

In lib/my_app.ex to add the test functions

# lib/my_app.ex

defmodule MyApp do
  # Timeout
  def exec do
    {:ok, agent} = Tide.Agent.start_link()
    {:ok, args} = agent |> Tide.Agent.exec("ping")
  end

  # The async queue receive the reply event
  def emit do
    {:ok, agent} = Tide.Agent.start_link()
    agent |> Tide.Agent.emit("ping")
    {:ok, args} = agent |> Tide.Agent.next
  end
end

I ssh into the device and run MyApp.exec to test it, but got a timeout error.

Write into elixir file or direct execute inside the shell are failed.

Your firmware image file works for me. I put it on a Raspberry Pi 3 B+ and connected to the IEx prompt on the HDMI port.

Here’s what I see:

iex> TideRpi3.exec
{:ok, ["PONG", "Hi?"]}

It works every time for me.

Just in case there was some weirdness with connecting over ssh vs. the console via HDMI, I used your instructions to build a firmware image myself. (There’s been an update to the Nerves new project generator that creates projects using VintageNet for networking, so I was using that.) I consistently get the "PONG" response with that image too. No timeouts.

I honestly don’t know why you’re seeing timeouts, and it’s even stranger that your firmware image does not time out for me since we should be testing the exact software.

My only suggestion is to update to use VintageNet since that will make network configuration easier. But that obviously doesn’t address the timeout issue.

1 Like

I use ssh to test it and get the time out error, I will try to use USB and HDMI to connect it and test again.

Another question is VintageNet is built-in in newer nerves or I have to add it by myself? I think I use the almost latest version to create my new project but the network config is use :nerves_network

Thanks for your help.

I change to VintageNet and it works well in SSH mode.

But my RPi3 seems unable to use USB (usb0) or connect a keyboard by USB.