Long running process with periodic tasks

Hi everyone!

I am building an app using phoenix which is holding websocket connections with other servers and receiving messages every other second. Currently I am caching all the messages I get from other servers.
Now I would need to make some calculations based on the data I am receving, atleast once in a minute.
I thought of long running process which would be supervised by another process.
Using Process.sleep(60000) I can force the process to wait for 1 minute before running the calculations again.
Is this optimal or are there any better solutions for this kind of work? Any ideas are welcome :slight_smile:

Usually the approach is to use Process.send_after(self(), :calculate, 60_000) and then have a def handle_info(:calculate, state) do and do the calculation there.

5 Likes

Thank you for the answer.
Anybody have any thoughts about using crontab for such tasks?

Kind of need to forget what you know about normal programming limitations and welcome the recursive and process infinite loop here.

as @benwilson512 mentioned the most efficient way is to recursively call yourself after you have done your calculations.

Your calculations might run over 60seconds to complete.
The operating system scheduler might not wake your thread up at the exact time
Cron job handler might have a problem and not launch any of your jobs
These are all the limitations we introduced to programming languages when we made the mainloop invisible to consumers.

2 Likes

I’d use GitHub - sorentwo/oban: 💎 Robust job processing in Elixir, backed by modern PostgreSQL

1 Like

You could try to use an Elixir Agent to keep the calculations up to date for each time you receive a message from other servers, therefore no need to calculate it periodically, but if you need to really do it periodically I would go with @benwilson512 suggestion.

1 Like
  • Have a GenServer that is responsible for calculating and returning the aggregated data.
  • Have that GenServer maintain this state:
    • Date/time showing when it last did a calculation.
    • Last cached calculation.
  • When the GenServer receives a message to supply the aggregated data, have it check against the date/time; if more than 60_000 ms have passed, re-calculate it and return it. If the data has been calculated less time before that, just return a cached response.

I personally wouldn’t reach for a fixed time period recalculation. Imagine if nobody pings that state for a week. Why recalculate it every minute?

It depends on the project and how frequently will this data be requested but in any case, food for thought. I have changed several hobby projects to the pattern above and my OCD about not wasting CPU resources has calmed down. :101:

2 Likes

The usual way is to use send_after as @benwilson512 mentioned to schedule itself. Process.sleep shouldn’t really be used outside of very niche cases (e.g., maybe helpers for retry/backoff, testing)

Different flows will depend on what you need to do in that interval. Some considerations:

  • Can the activity you plan to do every interval take more than the interval itself? What should happen if it does?
  • Does it depend on the existence of other processes (e.g. it can schedule itself)? Or can it be always running no matter what, even in the absence of relevant sources/data?
  • How does it get its data? Is the cache/data source serialising access?
  • What side effects does it create?
  • Are you running more than 1 node and if so does it impact the way it needs to run (e.g. if the side-effect is writing stats into the db, or generating a pdf - of course in this case won’t be but an example - it might be that you only want a single process amongst all instances to ever be working on that)
  • Are there any hard guarantees you need regarding the timing and execution, is it ok if it just misses something or is best effort, etc

There are also gen_statems that have direct utilities for timers and can be useful if the flow has any semblance to a list of steps (or obviously a state-machiney nature). Like, set timer → wait → timer fires → load data → do something → do something else → set timer → wait → repeat…

2 Likes

Those are very good things to consider.
The calculations should be fast but Im holding hundreds of websocket connections which are all getting messages every other second. Perhaps there should be even multiple processes for the calculations.
I am actualy amazed by the way elixir can handle processes, so spawing more processes for each work is no problem I think.

2 Likes

Dynamically supervised GenServers that utilized the handle_continue callback and a sleep timer would be another way.

defp periodic_calculation do
  #do some work
  :timer.sleep(60000)
end

@impl true
def handle_continue(:my_timer, state) do
  periodic_calculation()
  {:noreply, state}
end
1 Like