I tend to overcomplicate my architecture with every best intention that never materialize. This time I am asking for help to pick the right component for my immediate need, thereby being much more pragmatic.
Within a phoenix app, I have a form with the user submits an URL. I persist the URL in a data store. I want to then have a background process(es) to 1) retrieve the page for the given URL, 2) parse the page for a few meta elements, 3) update the database, and finally 4) notify the user through a phoenix channel.
Originally, I thought to Keep It Simple and start a single async Task. But with the four different steps, I felt I was stretching the boundaries of how complicated a Task should be. Is this logic much too elaborate for a Task? Most documentation I read suggest using tasks for straightforward computations.
Next, I felt GenServer was ideal. I can start a GenServer to perform all the steps and pass messages to communicate the process. But, having a single process seemed error prone as some websites might take too much time to deliver the page. I could easily bottleneck requests on a single GenServer. So, should I start new GenServers for each form submission?
With my worry on throttling request, I came to GenStage. I can manage the request bandwidth based on my needs. But should I place all 4 steps into a single consumer or daisy chain them with different consumers. I quickly looked at some Flow discussion and then decided I may be overcomplicating the solution.
Advice is much appreciated.
If I were you, I’d look at the maximum requests per second that I would get. If that number is around 100, I would just spin up new processes and let them do the whole workflow. This is simple, but doesn’t allow you to control the concurrency. If the traffic was very unpredictable. I would just create a worker module which could do the whole thing and spin up 10 or 20 of them at the start of the app, and use something like poolboy to checkin/checkout the workers and work with them.
Use a GenServer. Test it in production. If it is not a bottleneck, well it works. If it is a bottleneck, you can replace it. Your public API will make the refactoring fast.
“Make it works, then make it pretty, then, if needed, make it fast”
I had a similar scenario, though I had to rate limit an external api call (1/sec sic)… I ended up using a queue (https://github.com/kuon/backy) - might not be the most elixir idiomatic, but gets the job done in a quick and simple way.
nb: with backy you can tune max_concurrency to your liking. and you can easily get the users position in the queue, by querying the queue db table.
Looks very feature rich, but I was hoping for a more native implementation. Another of my bad habits is to always reach for a library instead of first trying a native approach. I will keep this in my back pocket if I stumble or the problem grows in complexity.
I had a similar feeling. Ultimately, I need to let the performance need dictate the solution.
Good point on the public API. I will go with your advice. Thanks!