Several Beginner Questions Megathread

So far I really love Elixir. I’m sure I will have ongoing questions and I thought it’s easier to put them in one place. I’m learning quickly so that’s good at least :smiley:

  1. When someone has these # Client API and # Server API comments in their GenServer modules, are they really just referring to methods called from the client process (normal methods), and methods called in the server context (only the callbacks)?
  2. When using Redix, @josevalim told me to place this in my Supervision tree inside children = [] in application.ex. I was excited that this made perfect sense (because MyApp.Supervisor is watching it)… but then I read this article it’s confusing again. I did notice a lot of applications autostart… but does that mean they will stay dead if they crash because they’re not being supervised? How do I know when to place something in application.ex application supervisor vs letting it autoload?
  3. When I have a single Supervisor.start_link in application.ex controlling my GenServers/GenStage/whatever… is that technically a “supervision tree” or do I need yet another layer of supervisors for it to be a tree?
  4. Is it bad practice to have modules in another namespace in my applications supervisor? So for MyApp.Supervisor to be watching something like DataPipeline.ProducerConsumer?
  5. How many Consumers can be added to GenStage before it’s a bad idea (I need to send an enormous amount of simultaneous requests). Could I do 30-50 easily do you think (on the same machine)?
  6. Is there a shorthand way to set up those 30 Consumers from GenStage into my children = []? (Note: I’m not using Supervisor.Spec and all the examples I can find are for that, using id: 1, id: 2, etc). I’m a bit lost how to add multiple consumers with the same module name and using the “tuple method” like {Consumer, []}, {Producer, [0]},...
  7. Let’s say I have a GenStage Consumer that can’t ever have more than 100 events (very strict). And I use @max_demand 100… how would I deal with this inside handle_events if what I’m doing isn’t synchronous? I’m hitting a 3rd party XMPP endpoint asynchronously which responds by sending messages to the Consumer's inbox (with an ACK or NACK). There can never be more than 100 unanswered NACK/ACK requests. The logistics of that I’ll figure out later, but I’m only thing of it now in terms of “how do I stop handle_events from pulling more demand when it depends on messages in the inbox rather than synchronous wokr?”
1 Like

In fact, that are quite a lot of questions, and some have been answered recently in the forums, but let me takle them one by one anyway.

Those comments are put by the author of the code and there is no technical necessity. Usually Server API refers to the implementations of the callback functions.

As a sidenote, let me remind you that there are no methods in Elixir, only functions. A method is tied to an object and may mutate the object without returning a new instance of it.

They have their own supervision tree (or do not have startable components at all).

Usually you know this from the documentation of the application you want to use.

Even not having a supervisor at all is a supervision tree, as nothing specifies that a tree has have at least one node. Even the empty tree is a valid tree.

Aside of that, you have a supervisor. The supervisor has its workers, so you have a supervision node and that has child nodes, do you see the tree?

I do not consider this a bad practice. Some libraries even require you to supervise them. But usually they give you a supervisor to supervise, rather than workers.

Totally depends on your machine. You need to find the bounds for each machine by trial and error.

I can’t answer your last 2 questions though, as I am not actively using GenStage.

3 Likes

First, thank you so much for answering this stuff.

Great on you for pointing out methods vs functions… that is obvious in retrospect.

2 follow up questions:

  • So in reference to applications getting their own supervision tree, how do I know where to put it if their documentation sucks (so far, elixir docs are pretty bad for libraries)? Is there something I can look at in the code to see? Take Redix for instance, or Romeo. Romeo tells you to put it in there when you don’t have to, and Redix doesnt tell you to put it in there when you do have to.

  • Someone on IRC said this about the same question about adding applications to my supervisor. “everything is ultimately supervised, it’s just supervised by code you can’t see” … I am confused what this means. Does this mean if a process linked iwth the main erlang process will get restarted even if I haven’t defined my own supervisor in my application.ex?

There is no need to put redix into a supervision tree.

It is a low level wrapper, very basic. At least thats what I read glancing the README. It is meant to be started where you need the connection and to be terminated when you don’t.

If you want to pool it globally, you need to do that on your own.

Just putting a single redix-connection under supervision is probably not what you want and will bottleneck sooner or later.

romeo on the other hand, seems to be a bit dated, and its README contains instruction to set it up, that are pre 1.4. There you actually needed to explicitely put it under :applications, because there was no inference.


For your second question. Lets take your own application. It gets started and spans up your supervision tree. Even your toplevel supervisor is started and supervised by something. By the BEAM. But, the BEAM restart strategy is basically to shut down the other applications that have been started in order and then to die. The BEAM will not restart anything.

But those applications that do spin off in a separate tree from yours usually have their own base supervisor, such that it is pretty unlikely that they kill you. And if they do, your trouble is often much greater than that :wink:

2 Likes

Almost universally start_link/* is meant to be used as part of a supervision tree. If it follows the contract of returning {:ok, pid} (where pid is a pid) after its init callback is called, then it can be started as apart of a supervision tree. When people use start_link/* in iex, they’re usually doing it for illustrative purposes; not to imply that it shouldn’t be started in a supervision tree.

So here’s a simplified example for both Redix and Romeo:

def My.Application do
  use Application

  def start(_type, _args) do
    children = [
      {Redix, [[host: "example.com", port: 5000], [name: ExampleRedisConnection]]},
      {Romeo.Connection, [[jid: "romeo@montague.lit", password: "iL0v3JuL137"], [name: ExampleXMPPConnection]]}
    ]
    opts = [strategy: :one_for_one, name: My.Supervisor]
    Supervisor.start_link(children, opts)
  end
end

Now you may have the aforementioned bottleneck described by @NobbZ. That’s when you want to look into things like pools (so you might have a pool that is a supervisor that supervises many Redixs or many Romeos as one of your My.Supervisor children).

For the time being, the processes started above are named processes. So
you can use ExampleRedisConnection in place a of the pid for Redix and ExampleXMPPConnection in place of the pid for Romeo.Connection.

If you take a look at Romeo here, it looks like there is no need to start its application at all (BUT in the future there might be a need to have its application started, so explicitly not starting the application is not necessary).

3 Likes

Ah ha! That makes sense about redis… I think that makes sense about BEAM I will need to research and play with it some more. I get that ‘something’ starts my app, I guess it’s a little hazy beyond that. Thank you

Ok cool, is that what ‘poolboy’ is for or whatever?

And so I think I get it about processes. You’re saying most libs are autostarted by beam, and its up to the lib a lot of times to start their own supervisor as well.

I take it its just not often that you actually need to add a library into your own supervisor tree (it makes sense how romeo and redix shouldnt be there because they create a bottleneck)

I’ll rephase my question 6:

How do I write this:

children = [
  worker(GenstageExample.Producer, [0]),
  worker(GenstageExample.ProducerConsumer, []),
  worker(GenstageExample.Consumer, [], id: 1),
  worker(GenstageExample.Consumer, [], id: 2)
]

but in a non-deprecated way? (notice the id: 1 and 2

According to the documentation (https://hexdocs.pm/elixir/Supervisor.html#module-child_spec-1), it should be something like this:

children = [
  {Producer, 0},
  ProducerConsumer,
  Supervisor.child_spec(Consumer, id: 1),
  Supervisor.child_spec(Consumer, id: 2)
]
3 Likes

I get this error that the 2nd one cant be started: returned an error: shutdown: failed to start child: 2 ** (EXIT) already started: #PID<0.244.0>

I solved it… i just had to remove the name: __MODULE__ from the actual Consumer!

You might actually want to keep :name but use a :via and Registry as well as give an argument to init, to create the name dynamically. That way you can talk to the process without needing to know its PID.

1 Like

is that difficult to do? lol If its simple i will look into it. otherwise i will save for later

Not difficult? I think so. Simple? Not necessarily…

You need to set up, start and supervise a registry first, then you need to give your childs an argument when starting that is used to distinguish them beneath each other.

In the started child, instead of name: __MODULE__ you need to name: {:via, Registry, {__MODULE__.Registry, local_name}} or something like that.

Then you can use that tuple innstead of the name.

2 Likes

Awesome thanks, I’ll save this in my notes :smiley:

Regarding question #7 have you you read Discord’s article about using GenStage with XMPP? https://blog.discordapp.com/how-discord-handles-push-request-bursts-of-over-a-million-per-minute-with-elixirs-genstage-8f899f0221b4

3 Likes

Yes that was a big inspiration for this heh