What's good way to have limits for calling external APIs on a monthly or daily level?

artem · October 1, 2022, 5:58pm

Hi all

Need for exactly monthly limiting

I am learning Elixir (and Phoenix and LiveView) by creating a simple app that would fetch weather from external service and “creatively” present it (think “today is a bit warmer that yesterday” compared to “today is +17”).

At some point I’ll make the app public. Certainly for a hobby project I don’t want to pay much for the weather API and have to stay within free or cheap pricing tiers that is my Elixir backend should call external API no more than e.g. 1000 times per calendar month.

Precision/reliability needs

I will probably be fine with limiter being not super-precise, so it’s fine if limiting mechanism e.g. allows sometimes for 1050 per month and if it’s per 30 days from some arbitrary day, not exactly within a calendar month
On the other hand I won’t like if somebody uses monthly quota in one minute, better to leave some calls for the tomorrow visitors though good to allow small bursts.

So the ideal limiter would simultaneously apply something like:

no more than 1000 calls within 30 days or within a calendar month
no more than 100 calls a day
no more than 50 calls an hour
no more than 10 calls a minute
store whatever it needs in Postgres (okay to store it in-memory for a while, but syncs to DB are a must)
- it’s a hobby project under active development, I don’t want to figure e.g. how to install and use Redis on gigalixir or fly.io while Postgres I am already using anyway
- I don’t want to store state in-memory only - service is very likely to be restarted often. Actually in case of in-memory state even one restart mid-month could by accident overrun quota 2 times

What are the options?

I figured several options for proceeding and would love to get some advice from the more experienced folks.

Hammer library (with a bit of massaging for combining the buckets) seems to allow for the rate limiting rules I am into. It doesn’t have a Postgres backend option, but maybe one is easy to write
Create an own rate limiter/counter saving the log of requests (or the final counter) starting from mix phx.gen.context so the context would query call log table to tell the client how many calls are still allowed within month/day/hour?
1. And if I’ll want to be “precise” over “fast” then maybe wrap it into a GenServer guaranteed to be run in 1 copy only
Locate some other library that does approximately what I want. Maybe there’s something else besides Hammer that I just failed to locate so far.
Something completely different?

What would you do in this kind of a situation?

Four2 · October 1, 2022, 7:49pm

what would be easy to set up is a db chart of location, current temperature, date & time, clever remarks id
Then when the user lands on the page, a background query of that location’s date&time compare, then if it is within a certain amount of time since the last query of the temperature it displays the data you want on that chart line, if not, query the parse rss feed and update the temperature and the stored comments for the current weather, which would work well on a separate db table (like a remarks table with a unique column id) from the weather db table, a compare db table can be generated per location over time, and of course the remarks from another db chart.

Parsing data from a free xml/rss feed might be a better way of doing above instead of a web API.

gregvaughn · October 1, 2022, 9:14pm

I think you’re overthinking this. Write code that does what you described. I’d start with a database table that stores some token/label that represents a “bucket” of limits and a timestamp. Every time an API call is made you write an entry to that table. Before making the call you query for the limits – group by the various time intervals with a count of each and a having clause to enforce your desired constraints. It won’t be a trivial query (possibly window function or lateral join) but you’ll learn good things figuring that out. In the end it’ll be a single query.

zpeters · October 2, 2022, 12:34pm

I would highly encourage you to write this as its own package. This will be a great way to keep your concerns separated. I also think there are probably many others folks that could use a utility exactly like this!

dimitarvp · October 2, 2022, 3:58pm

I am with @gregvaughn here, you are over-thinking it a bit.

Having your own GenServer that’s part of the app’s supervision tree is what I would do. You can either:

Persist every single usage of the API in the database and then aggregate as @gregvaughn advised, so you know how much you have remaining and what waits must be enforced before the next call;
Or persist a monthly digest which is just a single SQL record e.g. month="2022-10" and api_calls=174, and then just update that from your supervised GenServer on each API call. Write a wrapper to the API that (a) sends a message to the GenServer and the GenServer’s job is (1) wait until a quota is met, (2) increase the counter in the DB, and (c) actually call the API.

Example 1: if the budget you allocated says you have only 2 requests remaining during the current minute while 10 in total are allowed, and you have 10 seconds remaining, well, enforce a :timer.sleep(5_000) (10 seconds divided by 2 requests) before calling the API.

Example 2: if the minute is ending in 5 seconds but you still haven’t used even 1 out of the 10 allowed requests for this minute then you should do :timer.sleep(500) (5 seconds divided by 10 requests e.g. 500ms).

That’s a super naive approach however. It can be improved and made much more flexible.

It does not look very hard. Like many programming problems the true concern is (a) clearing up the entire problem in your head and (b) figure out what data shape will serve you best. Actually coding it is rarely the challenge.

Give it a go and show us what you tried if it’s not quite happening still. I am positive the people will help you finalize it.

Four2 · October 2, 2022, 4:43pm

Its not my project. But I would think getting the raw data from a free xml/rss feed would probably be better than someone else’s web API that you don’t have control over it and might change over time.

dimitarvp · October 2, 2022, 4:47pm

Seems I pressed Reply to the wrong person.

Sorry!

@artem My above long comment is directed at you.

dimitarvp · October 2, 2022, 4:49pm

Agreed. I focused on posting thoughts related to the API route is all.

If you can get the data without having to constantly ping an external service then that would be ideal.

derek-zhou · October 2, 2022, 4:52pm

The only thing I want to add is to separate the concern of avoiding abuse from the concern of budget control. Use the database approach similar to what @gregvaughn have suggested to address your budget concern, and use a GenServer approach similar to what @dimitarvp have suggested to address the abuse avoidance concern.

artem · October 2, 2022, 6:56pm

Thank you, guys. I was also considering just writing a log of requests to the database and using SQL queries to check whether bucket is full or not. My main concern (that I should have stated more clearly in the beginning) was:

Am I reinventing the wheel? There could already be some known library to do just what I want or something very similar. Like if that Hammer library had a Postgres backend functional out of the box, it could be cheaper and more proper to just tune it to my case.

Now after this discussion it seems like my problem either isn’t common enough to have some known solutions or so simple that nobody bothered to make a library for it.

I guess most of hobbyists are using either totally free tiers or ones that have a clear spending limit. One I am likely to use is “metered”, so I could spend too much by accident. Probably it is not very common.

dimitarvp · October 2, 2022, 7:21pm

You very likely are. However I’d still go for writing this on your own, because:

You will have 100% of it inside your own codebase and it can be tweaked, changed or outright removed very quickly.
Figuring out how to configure the already-coded library might take longer and/or be more confusing than writing it yourself.
Yes, it’s quite a simple thing and it’s very possible that many people didn’t consider it worthwhile to create and open-source it. I don’t claim it as a fact because I haven’t checked Hammer, I’m merely saying that this is the general culture of the Elixir community.

Again, don’t be shy sharing a partially working code. I believe people will be curious and will help you finish it if it proves more challenging than we think right now.

Four2 · October 4, 2022, 11:03pm

@dimitarvp
The traffic wouldn’t be too much of a big deal to an RSS server the xml is parsed from. How about a database routine that would subscribe to the Rss feed? Then the new data gets pushed into a new row. After that, the current data is pulled from the subscribed database when the main API calls for it.

dimitarvp · October 4, 2022, 11:16pm

I will always avoid DB procedures. Better have that logic in your code. It’s very easy to forget the DB has a function doing things, especially if even one new team member is added.

It’s fairly easy to have a GenServer that wakes up periodically and checks whether it’s time to download the feed again.

Four2 · October 5, 2022, 1:11am

I think multiple users and locations would be difficult to do without some sort of DB.

What's good way to have limits for calling external APIs on a **monthly** or daily level?

Need for exactly monthly limiting

Precision/reliability needs

So the ideal limiter would simultaneously apply something like:

What are the options?

What's good way to have limits for calling external APIs on a monthly or daily level?