How would I build this? (10M api requests/day)

9mm · November 11, 2018, 7:28pm

I need to build an internal app, and I think I can finally use Elixir for this which I’ve been waiting to do for a long time.

Unfortunately I forgot all of what I learned because I never used it, so I was wondering if someone could help me outline a basic structure of how I would build this:

The basic purpose of this server is to receive batches of tokens to send push messages to, along with the message contents. That should add them to some sort of ‘waiting’ queue. Then I want 100K workers who will churn through this queue, hitting a 3rd party HTTP API endpoint (1 API request per 1 message)

So, I need to send 10M pushes 4 times a day, so the incoming requests to elixir might look like this:

/send/batch (max limit of say… 5000 messages per request)

[["111", "push message 1"], ["222", "push message 2"], ... 4,998 more ] 
[["333", "push message 3"], ["444", "push message 4"], ... 4,998 more ] 
[["555", "push message 5"], ["666", "push message 6"], ... 4,998 more ] 
... all the way until 10 million

So thats incoming messages that need to get sent, along with the token it gets sent to. That is what has the ‘backlog’

Then I need 100K workers (or some defined # of workers)

to make an HTTP request asyncronously in its own green thread (1 thread per message, so 100K HTTP requests happening at once, at all times):

POST to 3rd party endpoint: https://example.com/sass/send-message

{body: {token: 'aaa', body: 'test message 1'}}

… if HTTP request fails, retry in its own thread using exponential backoff (this isnt really required if its a huge pain… maybe it can just wait a random amount of time or something to make it easy).

So the most that needs to be in the backlog would be about 10M.

Each token + message is about 200 bytes, so 10 million messages in the queue is about 2GB in memory.

I’m not sure really how to go about doing this , or what libraries to use (HTTPoison I suppose). I could use phoenix to do the core API portion of it… or maybe I need cowboy only (it’s a very simple app, it doesnt need other endpoints)

Do I build a “genserver” or?

Qqwy · November 11, 2018, 7:48pm

Because you have a lot of work, which at first glance seems ‘ridiculously paralellizable’, the first thing to probably do is to think about what parts will actually take significant time (or other computational resources like memory or disk space) to perform.

In this case, that is clearly the remote API request. It is very likely that external APIs throttle you in some way, or at least the network will limit the number of outgoing connections from your server. So starting more workers than you can make open connections at a given time is not going to win you a much of performance.

Instead, you could make workers that handle the work to do for one message, (i.e. perform one API request) and then move on to the next message.

These kinds of ‘work queue’ setups are a natural fit for the building blocks that GenStage and Flow provide, which is to allow you do what is described above: Perform code for every item in the collection, as well as re-grouping items in the collection based on some criteria.

tty · November 11, 2018, 7:59pm

What should happen should the HTTP request process dies before receiving a reply or during a retry ?
Do you need a record of a completed request ?

Overall it sounds like each request has at least 3 states (ready_for_processing, waiting_for_reply, completed). Think about how you would manage these as well (see GenStateMachine).

9mm · November 11, 2018, 11:24pm

Thank you - I don’t need to record a completed request (for now). If it dies, then that’s OK. I would hope if it ‘fails’ outright (like a non 200 status code) it could be retried, but if something crashes I’m not too upset about that if the state gets lost.

9mm · November 12, 2018, 12:11am

Actually wow, this is EXACTLY what I needed to know:

I will try to follow along with this but there’s a lot of details I still need to figure out how the heck this works.

The only difference is that for theirs they get messages in realtime where mine all happen at once (so there doesnt really need to be a ‘receiever’… or… i suppose the receiver would get them all at once rather than a flow over time)

It probably would be better to mvoe the entire database into elixir just to avoid data transfer of that many requests shrug

9mm · November 12, 2018, 12:45am

Just so I can save this for my notes… this is where it talks about the 100 message limit per worker, as well as connection draining

https://firebase.google.com/docs/cloud-messaging/server

9mm · November 12, 2018, 10:16pm

More notes to myself: they use this: https://github.com/discordapp/romeo

outlog · November 13, 2018, 2:59pm

also check how ForzaFootball are doing it (they might have more posts on it…)