Feedback on my first Elixir library - an Elixir interface to the Janus Web RTC Gateway

ndarilek · March 4, 2017, 1:54pm

After hacking on various Phoenix projects, I’m building my first Elixir library and would appreciate some feedback. In particular, I’m looking for advice around fault-tolerance, and on introducing supervision such that individual component crashes don’t leave invalid state.

I’ve written ElixirJanus, an Elixir interface to the Janus Web RTC Gateway. One challenge is that, when I create a Janus session, there are in essence two parts. The first is an HTTP long-poll that retrieves events from the Janus server, then dispatches them to event handlers (there is a websockets API but, for a first attempt, I thought I’d stick to long-poll.) Next are individual HTTP calls to the API, which may modify state that the long-poll needs to access. Session/handle state is stored in Agent, and I’ve tried to keep the agent-side code as simple as possible to minimize the likelihood of crashes. So, questions:

When a caller creates a Session, it gets a pid for the Agent containing session state. The caller is expected to treat the pid as opaque (I.e. not to get its data and access struct members.) Unfortunately, if the agent crashes, this pid would then be invalid, and all code that needs to access it would be holding onto an invalid pid. I see two possible solutions–either return something else to the client, or introduce a supervisor. The second seems like a better option, but all the supervision examples I’ve seen seem to have a static list of supervisors, and my library lets clients create arbitrary sessions. Do I want to create a supervisor per session somehow, one large supervisor to track all created sessions, or something else?
Should I be using GenServer for Session and Plugin handles? Right now I’m just storing it all in an Agent, but maybe using GenServer would be more idiomatic, or better for supervision.
I’m using Bypass for testing. Are there any examples of using Bypass when a library needs to queue up multiple requests? In this instance, every session starts a long-poll and fires off individual requests to make session changes. Bypass seems great for APIs where a request is sent and a response received, but I need to configure several request/response patterns that will trigger during a single call, and it isn’t immediately obvious to me how to do that. I’d hoped to build out the test suite so I can do things like refactor to GenServer and experiment with more confidence.

Thanks, and any additional feedback on this first attempt at a library would be very welcome.

msambarino · March 4, 2017, 4:05pm

Hey there, first of all, thanks for taking the time to contribute an open source library .

Regarding your questions:

You can solve this by introducing a supervisor and using a registry to register a name for the supervised processes, as the pid’s of the session and plugin processes would change if they’re restarted.

You can use a supervisor to start children dynamically by using the :simple_one_for_one strategy, as the Supervisor docs state:

The :simple_one_for_one supervisor is useful when you want to dynamically start and stop supervised children.

Agent’s are bound to the same name registration rules as GenServers:

Both start_link/3 and start/3 support the GenServer to register a name on start via the :name option. Registered names are also automatically cleaned up on termination. The supported values are:

an atom - the GenServer is registered locally with the given name using Process.register/2.

{:global, term}- the GenServer is registered globally with the given term using the functions in the :global module.

{:via, module, term} - the GenServer is registered with the given mechanism and name. The :via option expects a module that exports register_name/2, unregister_name/1, whereis_name/1 and send/2. One such example is the :global module which uses these functions for keeping the list of names of processes and their associated PIDs that are available globally for a network of Elixir nodes. Elixir also ships with a local, decentralized and scalable registry called Registry for locally storing names that are generated dynamically.

You can use elixir’s Registry (since elixir 1.4) in order to achieve this.

Agent’s are a great fit if all you want is to store state. It´s important to remember that everything that happens outside the function you pass to the agent is executed in the client, so for example in your message function

def message(pid, body, jsep \\ nil) do
    plugin = Agent.get(pid, &(&1))
    msg = %{body: body, janus: "message"}
    post(plugin.base_url, maybe_add_key(msg, :jsep, jsep))
  end

the post function is executed by the client. There is nothing wrong with this, but if you wanted to introduce some sort of coordination to, say, only send one post to request to a plugin at a time, then a GenServer might start proving more idiomatic.

The good news is that you’ve wrapped all access to the agents in a module so switching to GenServer in the future shouldn’t have much of an impact in the rest of the code base.

I’ll let someone else answer #3 as I’ve never used Bypass.

One final thing, on your Sessions init/1 you are linking the event_manager process to whichever process called the init function, not to the agent process that stores the session’s state… I don’t know if this is what you intended…

ndarilek · March 4, 2017, 4:39pm

Sweet, thanks for the detailed response!

So it sounds like one way forward would be:

Change what the session/plugin handle creation functions return to
refer not to pids, but to globally-unique names.
Register agents globally with unique names, perhaps created by
make_ref() so they don’t clash with any other globally-registered
name. Return those references as the session/plugin handles.
Change Janus.Session and Janus.Plugin into :simple_one_for_one
supervisors, create an app module, and start them from there.
Start all agents with start_link so they’re linked to the supervisors.

Am I correct in assuming that, with those changes, agents will be
restarted if they crash? All that is needed is to link them to a
Supervisor?

I’m not sure what to do about the poll private function. I need one
running per session, and ideally would like it to restart if it crashes.
Is that a good fit for Task, and can I just start it from the same
:simple_one_for_one supervisor?

As for Session.init, no, I didn’t intend to link the handler to the
calling process. Because all the supervision examples I’ve seen thus far
seem to have a fixed number of children, I haven’t been too sure of how
to support crashes in situations where I have a dynamic set of child
processes. So I focused on building out the API, then figured I’d add in
fault tolerance later.

Thanks again for all the feedback!

ndarilek · March 4, 2017, 5:15pm

Ah, looks like I can’t use references for globally-unique names because
they need to be converted to strings. Tips on how to acquire a
globally-unique name so I’m no longer returning pids?

Thanks.

OvermindDL1 · March 4, 2017, 5:30pm

Well you could always ‘inspect’ a ref, but those are unique per system run. If you want a globally unique name why not use a UUIDv1 (or UUIDv2 if you want some security on inspecting it)?

EDIT: https://hex.pm/packages/uuid seems decent for UUID v1’s, though no v2 support.
And it looks like it would be easy to add UUID v2 support to it so you could PR that.

ndarilek · March 4, 2017, 5:47pm

You can inspect it, sure, but can you pass the inspected value into
something that returns a ref? I suppose I can treat the inspected value
like a globally-unique string, but that seems a bit ugly.

UUIDs could work, I’m just a bit surprised that Erlang/Elixir doesn’t
natively include a method for getting a globally-unique value that can
be used in this way, or that I can’t use refs to name things. What
would I use them for if that isn’t their use case?

Thanks.

msambarino · March 4, 2017, 8:05pm

Yes, except for the make_ref() part. Seems to me like you already have good candidates for ids both for Session and Plugin. You already get a session id in the response to your request when creating a session, wouldn’t that be suitable as the session’s id within your application? Same goes for Plugin. If the id you get in the response is not unique then I’d probably use UUID as @OvermindDL1 suggested.

Actually, I wouldn’t make Janus.Session and Janis.Plugin supervisors. Rather, I’d create supervisors for the Agents they start, something like:

defmodule Janus.Session do
  def start_link(state, session_id)
    name = {:via, Registry, {Janus.Registry, {:session, session_id}}}
      Agent.start_link(fn -> state end, name: {:via, })
  end

  ....

  def init(url) do
    case post(url, %{janus: :create}) do
      {:ok, body} ->
        id = body.data.id
        
        ...
        
        {:ok, _} = Janus.SessionSupervisor.start_child(session, id)
        id 
      v -> v
    end
  
end


defmodule Janus.SessionSupervisor do
  use Supervisor

  def start_link do
    Supervisor.start_link(__MODULE__, nil, name: __MODULE__)
  end

  def start_child(state, session_id) do
    Supervisor.start_child(__MODULE__, [state, session_id])
  end

  def init(_) do
    [worker(Janus.Session, [])]
    |> supervise(strategy: :simple_one_for_one)
  end

end

defmodule Janus do
  use Application

  def start(_type, _args) do
    import Supervisor.Spec, warn: false

    # Define workers and child supervisors to be supervised
    children = [
      supervisor(Registry, [:unique, Janus.Registry]),
      supervisor(Janus.SessionSupervisor, [])
    ]

    # See http://elixir-lang.org/docs/stable/elixir/Supervisor.html
    # for other strategies and supported options
    opts = [strategy: :one_for_one, name: Janus.Supervisor]
    Supervisor.start_link(children, opts)
  end
end

Bare in mind, with this approach, you can’t just pass the id to Agent, you need to either pass the via tuple or fetch the pid from the registry and then pass that pid:

Agent.get({:via, Registry, {Janus.Registry, {:session, id}}}, &(&1))

You can simplify this by creating a helper function in Janus.Session:

def via_tuple(id) do
  {:via, Registry, {Janus.Registry, {:session, id}}}
end

And use it like so:

id
|> via_tuple()
|> Agent.get(&(&1))

Yup you just need to have them supervised.

:simple_one_for_one requires the supervisor specification to only have one child, so no, you can’t start two kinds of processes from the same :simple_one_for_one supervisor.

I haven’t really used Task much, and im not sure tasks are intended to handle execution of something that should run forever, someone might be able to offer more insight on this. That being said, you could start a supervised task for each session to run the poll function. You can use Task.Supervisor just make sure to set its restart strategy to :transient or :permanent as the default is :temporary and it won’t restart tasks that crash.

In that case, you could take a similar approach to the one used for session agents.

def init(url) do
    case post(url, %{janus: :create}) do
      {:ok, body} ->
        id = body.data.id
        {:ok, _} = Janus.EventManagerSupervisor.start_child(id)
        session = %Janus.Session{
          id: id,
          base_url: "#{url}/#{id}",
          event_manager: {:via, Registry, {Janus.Registry, {:event_manager, id}}}
        }
        {:ok, _} = Janus.SessionSupervisor.start_child(session, id)
        id 
      v -> v
    end

I’m storing the via tuple in the event_manager field so that there’s no need to change your current calls to GenEvent.

Hope it helps

ndarilek · March 4, 2017, 8:03pm

OK, so I’ve done a few things:

I’ve created a :simple_one_for_one supervisor. I’ve refactored my code
such that agents are registered via UUIDs. What I’d now like to do is
return their UUID names such that callers reference sessions not by
pids, which can crash and go away, but by globally-unique names which
are replaced should the agent pid crash.

github.com

ndarilek/elixir-janus/blob/supervision/lib/janus/session.ex#L36



def start_link(url, name) do
  case post(url, %{janus: :create}) do
    {:ok, body} ->
      id = body.data.id
      {:ok, event_manager} = GenEvent.start_link()
      session = %Janus.Session{
        id: id,
        base_url: "#{url}/#{id}",
        event_manager: event_manager
      }
      {:ok, pid} = Agent.start_link(fn -> session end, name: {:global, name})
      spawn_link(fn () -> poll(name) end)
      {:ok, pid}
    v -> v
  end
end

@doc """
Attaches the plugin identified by `id` to the specified session.

What I want to do is return the name UUID so the caller has a
universal session handle that won’t change if the pid crashes. But if I
change that to {:ok, name} I get a match error. Seems like the
supervisor is expecting a specific return value which at the very least
includes a pid.

Is there any way to work around this? Is this direction a good one? I
eliminated the start/init distinction, since started sessions can be
reconfigured while they run. This simplifies things in that I can start
all associated processes in a single place, which I assume will place
them under the supervision tree. Still haven’t worked out how to
supervise plugin processes, but one thing at a time.

Thanks for all the help and advice. I’m reading lots of docs, but using
this information to solve my own problem rather than someone else’s is
proving helpful.

OvermindDL1 · March 4, 2017, 8:15pm

Supervisors require a PID or some structure that can grab a PID.

You can register the name globally via the :global registry if it is an atom (not recommended).
You can register the name as a UUID via the new Registry in Elixir or via :gproc or something as well.

msambarino · March 4, 2017, 8:31pm

You should not create the UUID inside start_link as then the UUID would be regenerated each time the supervisor restarts the process. Instead, you should create the UUID somewhere else and pass it as an argument to the supervisor’s start_child which in turn passes it as the argument to start_link. That way, if the session agent crashes, it gets restarted with the same UUID.

Actually, :global accepts any term, not just atoms.

{:global, term}- the GenServer is registered globally with the given term using the functions in the :global module.

ndarilek · March 4, 2017, 9:08pm

Actually, :global accepts any term, not just atoms.
{global, term}- the GenServer is registered globally with the giv

Not refs. When I tried with make_ref(), I got a ProtocolError. Guess it
has to be something that serializes directly to a string.

Thanks for the other pointers, I think I have a greater understanding
for how to make this work. I’ll keep poking at it and will send more
questions if I have them.

ndarilek · March 4, 2017, 9:32pm

Actually, correction, make_ref() works now. Not sure what I did wrong
earlier, but I’m now able to globally name agents as refs and no longer
need UUIDs.

OvermindDL1 · March 4, 2017, 10:47pm

Oh yeah, I forgot that… I’ve been using :gproc for so many years with a gen-leader for distributed variants… ^.^

ndarilek · March 4, 2017, 11:10pm

Thanks again for all the help.

It now has two :simple_one_for_one supervisors started via the
application startup callback, one for sessions, the other for plugins.
When I start a session and attach a plugin, then call count_child on
each supervisor, I see a single worker attached to each. So, to the
extent that a new worker is created at the right time, this seems to work.

Does this look like an OK supervision strategy? Should calling
start_link in these functions correctly link everything into the
supervision tree? If this looks correct then I’ll merge it into master.

Thanks.

msambarino · March 6, 2017, 1:11pm

I’d say you’re on the right track, but there are a couple things to consider here:

No, the event manager and the poll process are linked to the calling process (which is now the supervisor), but this doesn’t mean they’re being supervised, they won’t be restarted if they die.

You could add a :simple_one_for_one supervisor for your event managers, e.g.:

msambarino:

def init(url) do
    case post(url, %{janus: :create}) do
      {:ok, body} ->
        id = body.data.id
        {:ok, _} = Janus.EventManagerSupervisor.start_child(id)
        session = %Janus.Session{
          id: id,
          base_url: "#{url}/#{id}",
          event_manager: {:via, Registry, {Janus.Registry, {:event_manager, id}}}
        }
        {:ok, _} = Janus.SessionSupervisor.start_child(session, id)
        id 
      v -> v
    end

As to the poll process, you could start a supervised task for it.

The other thing to bare in mind is that you’re storing the plugin’s pids in the session handles in attach_plugin, this pids will become stale if the plugin process is restarted by the supervisor, so you should store the name under which the plugin was registered.

ndarilek · March 6, 2017, 2:12pm

Thanks for the pid pointer. I’ll start those up named as well, which
should resolve that.

As for linking the EventManager to the calling process, I’m wondering if
you’re looking at the wrong branch? I eliminated the init function
entirely, since there really isn’t work that can’t be done to a session
that is already polling. So now the EventManager is started in
start_link. Should that be enough to supervise it, or does it need to
return its pid to another supervisor? I’m guessing the latter now that
I think about it, but figured I’d check.

I can create a third supervisor identical to the first two for starting
the EventManager. I’m wondering if there’s some boilerplate to spin up a
named Supervisor in a single line, then maybe pass an {:ok, pid}-returning function to yet another function that supervises the
first? Then I could spin up a bunch of Supervisors in 2 lines, rather
than creating a separate module for each. But maybe that’s a bad idea.

Thanks so much.

msambarino · March 6, 2017, 3:02pm

Yea, i was looking at the correct branch as indeed the init function was gone and replaced by start_link, it’s just that starting the event manager inside Session.start_link/1 doesn’t mean it will be supervised by the same supervisor as Session, you need to create a new supervisor and start the event manager with start_child.

Not sure about creating many module-less supervisors…

ndarilek · March 6, 2017, 3:25pm

Got it, thanks.

Do you have a good reference for moduleless supervisors? I’ve seen them
mentioned in Elixir in Action, but Googling for them just strips the
“less” off of “moduleless,” and searching the Supervisor docs for
“moduleless” doesn’t reveal anything obvious. Yet I see them referred to
by that name in various places, so I’m wondering if it’s a shorthand for
some other concept that is called something else officially.

msambarino · March 6, 2017, 5:04pm

Check Supervisor.Spec docs, you can start module-less supervisors like so:

import Supervisor.Spec

children = [
  worker(MyWorker, [arg1, arg2, arg3]),
  supervisor(MySupervisor, [arg1])
]

Supervisor.start_link(children, strategy: :one_for_one)

However, I’m not sure it’s a good idea to do this in your case, as you’ll need a way to contact the supervisor processes in order to start sessions, plugins, etc. and so you’d either need to hold on to their pids (which will become stale if the supervisors are restarted for some reason) or you’ll need to register the supervisors with a name in :global or some other registry, but then you’d have to monitor those supervisors in order to register them under the same name if they die… so it’s probably not worth all the work just to avoid defining a module for each supervisor.

Disclaimer: I’ve always used module-based supervisors so perhaps im missing something