Clarification on Sending Emails Asynchronously

ScriptyScott · December 27, 2021, 8:52pm

Hey Folks,

Quick question, while implementing email functionality into my Phoenix App I came across multiple sources that claim:

Opposite to some stacks, sending emails, talking to third party apps, etc in Elixir do not block or interfere with other requests, so you should resort to async emails only when necessary.

Taken from the Swoosh Hexdocs

This seems so foreign to me, coming from other platforms like Rails. Can someone briefly explain why this is the case in Elixir and what specific scenarios require async emails/calls? I love how responsive Phoenix and LiveView Apps are and would hate to loose that with a call that blocks the UI.

Thanks,
Scott

josefrichter · December 27, 2021, 9:57pm

I think it refers to the fact that every single connection is handled by a separate independent process, so a slow operation, or even a crash, does not slow down or directly affect any other connection. I think this is a good explanation what’s going on under the hood: Task and gen_tcp - The Elixir programming language

But I’d too like to see a confirmation or more profound answer from some of the OGs here

al2o3cr · December 27, 2021, 11:04pm

Two things that this might be referring to (but they’re both a stretch):

in the early days of Node, libraries would hide blocking calls (like an SMTP send) and cause scheduling woes by blocking the whole event loop. But nowadays there are well-established patterns for handling that situation (async/await etc)
a web server running on the BEAM is going to have a LOT more “workers” than one with traditional UNIX processes, so blocking one Erlang process should have less effect on performance

josefrichter · December 28, 2021, 11:17am

I think that’s the point - in Node or Ruby you have to use these workarounds to make sure you don’t choke the system, and still have fairly limited options. In Elixir you don’t. Because of your bullet 2, basically.

If I overly simplify it, Ruby and Node are running in single thread by default. Elixir runs on all available threads by default. On my macbook, this means 6 cores = 12 threads, so I kinda get 12x performance out of the box. It’s easy to choke 1 thread, especially if something goes awry and gets stuck there. Async/await helps you optimize within that one thread, but nothing more. You don’t need that much optimization if you run on 12x more threads. So you would most likely send mails asynchronously in Elixir for example when it’s imperative that within your application the single user cannot wait 0.5s for the mailer response. At least that is my layman understanding here, hopefully someone more experienced can chime in?

crova · December 28, 2021, 12:57pm

I can’t find the thread but I remember folks arguing that, since the majority of people will be using a third party to actually send the email, and those third party usually send emails in a reasonable time, it was OK to simply call the API to fire the message and be done with it.

dimitarvp · December 28, 2021, 1:28pm

It goes even further than that. Elixir’s processes are not only OS threads; you can still have tens of thousands of processes and parts of them might be stuck (and that count can be much more than the amount of CPU cores) but the rest will still continue running.

Nicd · December 29, 2021, 11:11am

Exactly, Erlang runs the processes in schedulers. By default there is one scheduler per CPU core, so on a 12-core system you would have 12 schedulers. The processes are switched pre-emptively, meaning that a process cannot prevent the scheduler from running other processes (except in the case of NIFs but that’s a separate problem). So even if you have one process stuck or spinning infinitely, it will be periodically switched out and the scheduler will run something else.

This is the real reason why you can technically run long running blocking tasks in the same process as the HTTP request. Assuming you have already sent the response to the user, then there is no problem if the process takes a long time to send the email or do something else, it won’t block anything else important running on the system.

josefrichter · December 29, 2021, 1:01pm

Thank you for this. To further understand, when you say “process stuck or spinning indefinitely”, would that mean the process is stuck “in” CPU? Blocking one thread?

And if one core = two threads, then each scheduler has 2 threads to work with, and can use the other one if one of them is blocked?

Also, what decides which of the scheduler takes any given process?

Nicd · December 29, 2021, 1:44pm

By “stuck” I meant a process that is blocked, passively waiting for something. That process will not be scheduled for execution, the scheduler will run other processes. By “spinning indefinitely” I mean a process that is in an infinite loop, constantly executing. It will be executed by the scheduler but periodically switched out, so other processes will also get execution time.

Schedulers are single threads in the OS PoV, so for one core you would generally have one scheduler. It doesn’t make sense to say “if one of [the threads] is blocked” as the scheduler is single threaded and won’t get blocked by regular Erlang processes.

The VM orchestrates the schedulers. They can steal work from each other, so if one is overloaded, others can take processes from it to manage the load.

josefrichter · December 29, 2021, 1:46pm

thank you again. what resource would you recommend for reading up more on this, please?

benwilson512 · December 29, 2021, 1:52pm

The Soul of Erlang and Elixir • Saša Jurić • GOTO 2019 - YouTube is a classic talk that covers these cool properties.

Nicd · December 29, 2021, 2:03pm

You may also be interested in the BEAM Book (linked the scheduling chapter directly) if you want to know more about the internals.

dimitarvp · December 29, 2021, 9:49pm

You can have 1000 Elixir processes stuck in an infinite loop (or a very long network call), on a 4-core / 8-thread CPU. The other Elixir processes will still run fine.

Understand that Erlang VM’s parallelism is not going to get deadlocked by a few rogue processes. That’s what makes it different and so desirable.

josefrichter · December 29, 2021, 10:12pm

Yeah I was just watching Sasa’s talk linked above . It seems like the piece I was missing was the preemtive scheduling where one rogue process cannot really clog the cpu. It might be coming back over and over again in some cases, maxing out the CPU, but it can never take over and block the other processes. That’s fascinating!

dimitarvp · December 29, 2021, 10:35pm

Yes. Without the preemptive scheduling of the runtime and the 99% transparent parallelism, Erlang / Elixir would remain just some curiosity languages and I personally would not use them.

As I said in other threads: don’t “love a language”. Do love the language’s runtime and (NOT in the case of Erlang and Elixir) compilation strictness (OCaml and Rust shine here).

The language syntax does not matter one bit. There are other pretty nice and solid frameworks out there – PHP’s Laravel is one very good example – but without a proper runtime and/or good compilation-level enforcement they still remain mostly curiosities, at least to serious programmers.

(And yes I am aware PHP and Ruby are used much more than Elixir. That’s a topic that’s beaten to death many times over but TL;DR no, that does not make them good. That makes them a safe choice for many employers and employees. At the end of the day we are in the whole thing for money. Programming is not just a hobby after all.)

ScriptyScott · December 30, 2021, 7:03pm

Thanks all for the great resources! I come from a background of mobile development with a bit of web dev peppered throughout my career. I always found it strange how certain asynchronous task (sending emails, async calls, etc…) would require specifically calling the async version of the method and usually some sort of additional queue support, often complicating stacks with things like Redis. I’ve skimmed through some of the links posted here and it looks like Elixir/Erlang handles threading in a manner that’s reminiscent of some of the mobile platforms I’m used to. I’m really glad I asked this question, thank you so much for the resources will be taking a more in-depth look over the next few days.

softrage · January 1, 2022, 2:01pm

Note that more generally, if you don’t want to wait for a long-running process to terminate, i.e. during an HTTP request, you can also easily spawn a new process. To be clear and to reiterate some info in this thread, these processes are very cheap: they are part of the Erlang Runtime System (ERTS), and are not OS processes, and are unrelated to the threads/cores available on the system. You can spawn a new process using one of the methods documented here. A simple example would be:

def some_route_handler(conn, _params) do
  spawn(fn -> long_running_process() end)
  send_resp(conn, 202, "Accepted")
end

Even though a synchronous operation won’t block execution for other users, it will of course for the current user waiting for that process, but spawning another process in this way allows the current process to continue execution without waiting.

I would consider sending email as a long running process, because you never know how long this might take, and for many situations (i.e. sending a welcome email) there isn’t much I would want to do if the mail didn’t send. I might implement retry logic, but that could go in the spawned process as well. If it ultimately does fail, I probably don’t want to show a cryptic and somewhat distressing “Failed to send welcome email” message to my user, so there is no point in making them wait. If you can show a useful error message, and determine it’s worth the tradeoff of making users wait longer in exchange for being able to show that message when there is a failure, then you can make it synchronous.

I use a mailing library called Bamboo that encourages asynchronous behavior with its deliver_later function, but it looks like with Swoosh you just need to spawn a process yourself if you want that. Conversely, for synchronous behavior Bamboo provides deliver_now, while that’s the default with Swoosh. I’ve mostly gone the asynchronous route, but hopefully this gives you some information on how/when to implement synchronous/asynchronous behavior.

dimitarvp · January 1, 2022, 3:17pm

Yep, both Bamboo and Swoosh are alright.

One more step forward would be to enqueue mail-sending jobs to Oban and do synchronous sending in the working process you’re given when the job’s time to execute has come.

josefrichter · January 1, 2022, 3:38pm

Is spawning a process like this outside of supervision tree ok in this case? I know in the past I actually spawned some Tasks under Task.Supervisor to send out emails. But that was my early days, not sure if that’s not overkill…

derek-zhou · January 1, 2022, 4:25pm

For small to medium setup, the only 2 sane email sending methods are:

use a reputable 3rd party services, such as Sendgrid or mailgun
install a SMTP server very close, like in the same data center or even on the same machine

In either case, there are buffering on the other side already and the sending should be instantaneous. I see no point of spawning. In any event, retrying at the application server side will not help.