Process performance question

Rustixir · January 4, 2022, 11:01pm

Hi.

I wrote a server and spawn a statefull service
That serving to client over phoenix,

I have stress how many instance of that spawn under a pool worker to not come in bottleneck under high load ?!

In Rust tokio for a crucial high load service , i can create a two seperate Runtime and communicate with message passing, and Im ok because i know if my runtime come under load with 1M Task/procces, another runtime thread always ready to run because it just running one task/process in it ,

But in Erlang/Elixir ( Beam )

When Running 1M process and more than half of them are ready to run may take a bit maybe a lot time to scheduler run my service process in each cycle

josevalim · January 5, 2022, 7:49am

The Erlang VM starts one scheduler per core in your machine and those schedulers will run many of the computations required by your application. Processes in Erlang are preemptive, meaning that in your case, they all get a chance to run at some point.

But if you have too much work to be done and not enough cores available, then you need to either break the load by having more instances or get more powerful machines.

Lukas has great talks on the VM scheduling. Some details may be outdated but the outline is likely the same: Lukas Larsson - Understanding the Erlang Scheduler - YouTube

Rustixir · January 5, 2022, 9:03am

Thanks but what about this section :

I have stress how many instance of that spawn under a pool worker to not come in bottleneck under high load ?!

in Akka world its common, even apache mesos internally use (libprocess)
that is a actor framework, use this technique (make a actor for handling request)

josevalim · January 5, 2022, 10:24am

Apologies, I am not sure I get your questions. Do you mean if you need to use a pool? The answer is it depends.

Because the processes are preemptive, even if you have a million of them, they will all do a little bit of work. However, you may also say 1 million processes are way above the system ability and you don’t want that to happen in the first place. There are libraries for load regulation in both Erlang and Elixir, and they usually work by analyzing an actual system parameter (for example, memory or load) and then refusing to do some action based on that.

A pool would be useful when you need to talk to external resources. For example, if you open 10_000 connections to a website, the Erlang VM would be fine, but that website likely wouldn’t. So you want a pool to handle HTTP connections.

Another technique that may be useful is partitioning. But those are often to avoid single core bottlenecks and similar (but not always).

Rustixir · January 5, 2022, 12:00pm

Sorry my question were vague.
thank for take your time,

actually I come from Rust+Tokio / Akka world to Beam world

for example, i wrote a app in Akka. in there
for communication between each group of people (~30 people) (it was a forum) i spawn a Actor that handle broadcasting and routing between people, work well i never come to bottleneck,

but here scheduler is preemptive.
first i thought spawn some worker to do that but have stress of bottleneck
how do something like that in here ?

al2o3cr · January 5, 2022, 1:00pm

When you start Actors in Akka, the Dispatchers will manage running those on a fixed-size thread pool: just like how the BEAM’s schedulers work.

If you have too much load scheduled in Akka, Actors will have to wait for their turn to run: just like how the BEAM’s schedulers work.

In Akka, even making a blocking call in an actor can jam up the system: from the docs

Without any further configuration the default dispatcher runs this actor along with all other actors. This is very efficient when all actor message processing is non-blocking. When all of the available threads are blocked, however, then all the actors on the same dispatcher will starve for threads and will not be able to process incoming messages.

This is less of an issue with the BEAM; it can still happen if you have long-running NIFs (functions written in C, called from Elixir / Erlang) as described in the :erl_nif manpage.

Rustixir · January 5, 2022, 1:54pm

Thanks i know all that.

In 2021 i dont thing any Akka developer use Blocking driver, because for almost all things exist a non-blocking solution and those well tested and production used by some big company ,

and also talk about :
jvm is faster than beam then
each Akka actor can have more throughout than each beam process , now this + preemtive ,
Process wait to next cycle to run again ,

Both decrease throughout ,
I need a solution, that a actor handle request for more than 30 people in real time,

I do some research and find this solution :
I use pool worker and store state in ets/mnesia

Or exist another solution ??

Two thing exist in erlang/elixir ecosystem and i come to this world, first is ETS, second is Phoenix

Thanks creator of them

dom · January 5, 2022, 2:38pm

I’m not clear what your app does exactly, but for “broadcasting and routing” you don’t generally need to go through an actor. You can use Phoenix PubSub, or a more low-level approach like process groups (pg). A process that represents a user’s websocket connection can directly send a message to 30 other processes.

Is there a particular reason you feel the need for a central actor?

josevalim · January 5, 2022, 4:49pm

Avoid the central bottlenecks is a great advice. I would also suggest not worrying about “jvm faster than beam” and “processes waiting for cycles” at such an early stage. At least in my own experience, my intuitions about bottlenecks are often misplaced. As Joe Armstrong was used to say: make it work, make it beautiful, and make it fast (if necessary).

I have definitely worked on applications that had processes handling groups much larger than ~30 people and it has been fine.

Rustixir · January 5, 2022, 5:26pm

thanks, for last question.

i working on a project one of critical component is :

three different table and each have ~200 entry
must fetch from they and run a sorting with special algorithm over them and response to client .

=> also that must work near Real Time.
=> also different modes is to much i cannot cache all each group of them then
must do that two step for all client

=> also this app serving to ~160K client

(Beam + Elixir + ETS) is not bad for this type of processing ?

actually newly created JIT. this more attracted me to itself

First I wrote it in Rust (Monolith application) with in-memory raw data structure and persist all entry to postgres.

but mixing business Layer with Storage layer is Hard and also ETS newly use some data structure for increase scalability

very thanks you if answering to me, .
because it is all my future.

benwilson512 · January 5, 2022, 5:54pm

This sounds like a good fit for the BEAM / Elixir.

This is important though: I would start by learning the fundamentals of the language, and not try to jump straight into building this system. If you try to build this system without learning the fundamentals you’re going to be in over your head with a dozen different tools you don’t really understand.

Rustixir · January 5, 2022, 6:11pm

thanks for your suggestion,

I was very surprised to Beam can handle this much of processing.
this was good news for me.

i had doubt before, because, i don’t understand Beam is good enough for real time processing . always i heard: beam good for communication app

benwilson512 · January 5, 2022, 8:17pm

Is not communication an example of real time processing?

Rustixir · January 5, 2022, 8:32pm

my means from real time processing in above discussion was fetching many data from three table and run a algorithm over them in realtime for each client request. its like a analyze data and response for demand. not just communication.

Rustixir · January 7, 2022, 12:30pm

Above fetching is from ets ,

Ets is great because not need to GC,
Performance is great
Scalability is great

What is your think about this type of app in elixir ?!