Clarification on Sending Emails Asynchronously

I agree that these are sane methods. Here’s an unscientific experiment sending bad requests from an instance on GCP to Sendgrid/Mailgun to see response times:

curl -d '{"bad":"data"}' -H "Content-Type: application/json" -X POST https://api.sendgrid.com/v3/marketing/test/send_email -s -o /dev/null -w "%{time_total}\n"
0.169216

curl -d '{"bad":"data"}' -H "Content-Type: application/json" -X POST https://api.mailgun.net/v3/mydomain.com -s -o /dev/null -w "%{time_total}\n"
0.272170

170ms/270ms extra in the normal case, with no HTML email body, is not quite free. When latency increases, or in the worse cases when a request fails for some reason (which they will on rare occasions, the case where retrying server side does help), or when the service has an outage (looking at you AWS!), then presumably the user is waiting much longer until Hackney times out to get a response. Since there’s no additional effort at least in the basic case, if I’m not planning to give an error to my user, I don’t see a reason not to spawn. It doesn’t make it more robust (unlike what @dimitarvp mentions), but if the email is going to successfully send or occasionally die either way, it might as well do it where my user doesn’t have to see :upside_down_face:

2 Likes

A supervisor will restart supervised processes that crash, which could be caused by any number of things, like a hard drive failure, a temporary network outage, memory corruption, etc. If a process is not supervised, it just means in the case where that happens, nothing will be tracking it to start a new process in its place, so whatever that process was going to do will not occur.

1 Like

Yeah, that’s my reason as well. I don’t think users are interested in a random popup 30 seconds after they made an action on the website that says “sorry, we were unable to send you an email”. This is one of the things that’s expected to always eventually succeed, even if it’s one hour later. So yeah, definitely spawn a process to send the email – or delegate it to a background task library.

2 Likes

GCP is that slow? In my Elixir application (hosted on a cheap VPS on hostwind) sending a short email sync vs async via swoosh/sendgrid only have a diff of about 10 ms, from the point of view of access log. Sending through a exim server on localhost is actually somewhat slower; gen_smtp is known for its adherence to the spec, not for speed, and that clocks in about 15ms.

A lot of mail providers don’t guarantee quick responses, in my historical experience. It is not deemed as something you should put on a hot code path / user-visible delay in the UI.

If I care about the result, being slow occasionally is still better than the silent failures.

If I don’t care that much, like for an one way 3rd party api call that is nice to have but ok to fail, I will just use phoenix pubsub to broadcast to a background genserver and do the remote call there. It is async, but still not spawning on the spot.

I am not sure what we’re arguing about, network problems happen all the time, cloud providers have downtime all the time. :man_shrugging:

If we go with Oban then you’d get configurable retries for free. If we use our own supervised Tasks then we can just use a recursive function with a retries argument that can have a cap – it’s practically almost the same.

In both ways error handling is mostly a luxury. Obviously if there’s a hard requirement that the users must be notified of a failure to deliver email to them (within e.g. 5 minutes), then that’s a whole different story. But I haven’t had such requirements ever (although I am sure they do exist).

Of course. My point is, if it is something I depend on, I’d like it to fail flat on my face if it ever fails.

Case in the point: A few year ago, I have a system that often failed to deliver emails that contained the one time code for login purpose. (It was mis-configuration of some sort) I was grilled for it badly.

To recap, there are 3 ways to send emails:

  1. just send it in process
  2. spawn and send
  3. throw it out to a background job with pubsub, or oban if I want scalability or persistence across reboot.

1 has the easier error handling and is often fast enough. 3 has the latency and throughput advantage, 2 has none.