My app needs to send out scheduled reminder emails, so it needs job processing and rate limiting.
Does anyone know of an open source library that does BOTH job processing and rate limiting? I’ve been looking at Oban OSS to handle the job processing. Oban Pro also provides rate limiting but it was way too expensive for me.
I have checked out the excellent blog post by Alex Koutmos (@akoutmos ) on Easy and Robust Rate Limiting in Elixir. So worst case, I can implement a Rate Limiter myself. But I’m curious if there is anything open source that does both job processing AND rate limiting. So far, I haven’t been able to find anything.
In the end I’d write my own rate limiter though, it’s not hard, you can just have a counter in ETS and expire it at fixed intervals. Easy stuff IMO, and you won’t have to wrestle with 3rd party libraries that are likely not crafted with your exact scenario in mind.
At Keila we’re currently adding rate-limiting with Oban and ExRated: Scheduling is done by Oban and then the rate-limit is checked with ExRated. If the rate-limit is hit, the job is scheduled to re-run at a later point:
Could you sanity check my understanding on a couple of things?
First … I’m new to Phoenix. I am working on an app that will send out a lot of emails to a large groups. I will be using Amazon SES. I had originally written a GenServer that just Enum-ed through the list, saved to the database and then sent the email off via Swoosh. Then I found out that Amazon SES has rate limits … down the rabbit hole I went.
I stopped programming and started researching. Read through HexDocs and blogs, and then read through Concurrent Data Processing in Elixir by @svilen. Awesome book for beginners!
I honestly knew nothing about job processing or rate limits until last week … so feel free to tell me if I still need to do more research.
Alex Koutmos’ (@akoutmos) article described a Leaky Bucket and a Token Bucket. From what I can tell, it seems like I should be using a Leaky Bucket approach for emails. That way, they are being rate limited and sent at a steady rate. Does that sound right?
Koutmos’ approach uses Erlang Queue and a Task.Supervisor (for both the Leaky Bucket and Token Bucket). Using this approach would require that I write it myself, but as @dimitarvp pointed out … my rate limiter would then be written to suit my requirements. And Alex’s blog makes it pretty easy to write the code.
Hammer is a Token Bucket approach and uses ETS (an in memory table) instead of Erlang Queue. It can also be configured to be used with Redis instead of ETS. Redis is a memory datastore, but I’m not totally clear on how it differs from ETS … but I vaguely understand that I’d use Redis if I needed higher performance on the queueing side for distributed systems. If I go this route, I’ll just start with ETS and look at Redis if I need that.
ExRated is also the Token Bucket approach. It uses Oban (built on PostgreSQL) instead of ETS or Erlang Queue.
Can I pick any of these approaches and be fine? Or is one of these solutions better for managing emails? Should I aim for the Leaky Bucket approach for emails … or does it make no difference?
Let’s separate your concerns because you seem to be mixing them somewhat.
Persistence. Depends how important is the sending of these emails. If they are not mega-important and you don’t care about guaranteeing each one arriving then there’s nothing wrong to just put a few million records in ETS (Erlang’s in-memory DB/cache) and then pull from it in batches and send mails. Redis works fine for this as well. Kafka too but it’s too heavy to setup (unless you’re using Docker, then it’s easy-ish). PostgreSQL is just fine for it as well.
Rate limiting. It’s really not as hard to make one yourself as you seem to think. You’d need a combination of :counters.new and one of the functions of the :timer module (both in Erlang but fairly straightforward to use). If not, it’s still better to try and use a library that’s made only for rate limiting so as to reduce confusion.
I could likely write you one for your goals but I am super busy and my schedule is not stable so can’t give you a good estimation as to when.
Yes, absolutely. You haven’t indicated anything in your comments that suggests a super complex scenario. Any of the options you enumerated should serve you just fine.
You nailed it @dimitarvp … TOTAL analysis paralysis! LOL Especially since I’m climbing a huge learning curve on all of these technologies.
Really appreciate your offer of help for the rate limiter. I had nearly completed a rate limiter using Koutmos’ blog as my template when I discovered Oban Pro and began wondering if I was reinventing the wheel. So I think I’m pretty close.
The emails my app will be sending are event reminders for large groups, so I need to do some persistence in order to confirm the emails went out. It would be bad if attendees did not get reminders.
Until last week, I hadn’t thought about issues around rate limitation and process crashes. I need to make sure that my GenServer comes back up, figures out what reminders were sent and starts sending from where it left off. My GenServer was not doing that. It was just firing emails off via Swoosh. It was through the Swoosh documentation that I discovered Oban and went down the analysis-paralysis rabbit hole.
I spent today looking at what they did at Keila. I’m leaning towards using a similar strategy … Oban and ExRated or Oban and my own Rate Limiter.
Thanks for confirming that both the Leaky Bucket or Token Bucket approach would work fine for my scenario. That means I can use ExRated.
Thanks also for confirming that I’m on the right track. That will get me out of paralysis.
If persistence is important then it is best you just insert all the emails in a Postgres table and add fields like last_sent_email_at or whatever serves your scenario best. Then you can just fire up a GenServer that uses an Ecto query like “OK, give me all email records that haven’t received messages in the last 10 days” and just churn through them one by one (or in batches), while making sure you never go above N emails per unit of time.
Please don’t get misled by the above advice! It’s given only to illustrate the idea of how would a background worker pick up where it left off last time, nothing else. You actually might need a more complex data model e.g. separate campaigns that link to emails and have records that indicate whether a certain email has been contacted for that campaign.
It would also help you if you sketch your app and have it use Swoosh’s fake email sending in dev environment (which I think only logs the email messages on the console but I can’t remember) and then devise a few scripts that fill emails in the DB table, start the GenServer etc.
Thank you @wmnnd for pointing me to Keila. It has been super helpful! I’ve actually been learning some great Elixir tricks from it! I just have two questions that I’m hoping you could help me with:
Does the “recipients” table ever get flushed? Once a campaign is sent to the project’s recipients, I don’t see anything that empties that table over time. Are there advantages to holding onto that data? You already have a relationship between project and contacts, so you could recreate that list. I’m curious the value of keeping a record of every recipients job once the mail has been sent. I hadn’t thought about keeping that data, but I’m worried I’m missing something.
Do you know when the rate_limiting commits will be merged into Keila? I’m going to basically follow your lead and use this as a design pattern for incorporating rate limiting into Oban. Is the code stable enough for me to follow what you all have done? From the notes, It looks like it is almost ready to be merged.
Thanks again for pointing me in this direction. Was very helpful!
One last question (promise). I have little database experience and no understanding of database performance, so my question pertains to issues from having extremely large data sets.
The recipients table could end up enormous (as will my message table). The worker has to search Recipients to process the mail. I can see that it is just querying an id, so that is probably faster. But I assumed that there would be a performance hit if a table grew to be too large. Is that not the case? Should I not be worrying about that?