Structuring supervisor tree for newsletter application

I am extending an existing application to allow users to be able to write and send newsletters and struggling with determining a proper supervision tree (I’m relatively new to OTP so please bear with me).

Overview

Users can create a single Newsletter, which has “subscribers” via a Subscription schema. A user may write multiple Letters within a given newsletter, each of which may be sent to a subset of their subscribers, which I’ll call “recipients”.

After a user writes a Letter and goes to send it, the plan is to kick off a background job to handle the sending of letters to all recipients.

First Attempt

  • Create a Task.Supervisor called App.MailmanSupervisor and add it to the top-level Application supervisor as a child.
  • Create a App.LetterDeliveryDispatcher, which is a DynamicSupervisor that will supervise individual App.LetterDeliveryServices, one for each Letter to send. This is also started by the top level Application.
  • App.LetterDeliveryService is a GenServer that manages sending a single Letter to all specified recipients. These are dynamically created (by calling App.LetterDeliveryDispatcher.add_child when it is time to send a given letter.

The App.LetterDeliveryService

This GenServer would take the Letter.id as a parameter to start_link/1 and add a Map called deliveries to its state in init/1 to keep track of the status of individual deliveries.

Then, to handle the sending of letters to individual recipients, use Task.Supervisor.async_stream_nolink/4 from a handle_continue/2 callback. Process and filter the results, handling {:exit, reason} tuples (I’m using zip_input_on_exit with async_stream_nolink) so that they include their result. Ultimately, update the deliveries map in the state with: {:stop, :normal, %{state | deliveries: deliveries} based on these return values, which would yield a map looking like:

%{
1 => :pending,
2 => {:delivered, 34},
3 => {:failed, :timeout},
}

The individual Task workers would receive a recipient id and they would:

  1. Query to get the full Subscription record using the recipient id
  2. Determine if and how to send to this subscriber based on some business logic
  3. Send the letter to the recipient if applicable
  4. Create a NewsletterEmail database record with foreign keys to both the Subscription and the Letter along with some other metadata related to the sent email.

Retries

One disadvantage of using a TaskSupervisor is that there isn’t any retry logic for individual worker tasks (the ones sending emails and doing database queries and updates). So, I was thinking of using the retry logic of the LetterDeliveryService genserver by setting it to :transient and exiting without :normal if any of its child Tasks return {:exit, _} so that it is restarted by the LetterDeliveryDispatcher.

The key here is that I’d save the deliveries state with this exit signal in terminate/2. I was thinking of doing this in a jsonb column on Letter called mail_order, which would hold information about the current delivery.

When the LetterDeliveryService is restarted (recall that it takes a Letter.id) I would then check the letter’s mail_order field and see if any of the letters still need to be sent. If not, I’d terminate with :normal and if so, I’d try to send those letters again, repeating the process.

I could also easily amend deliveries (which are persisted in the mail_order column on a Letter) to hold information on number of retries so I could control the number of retries.

Questions

  1. First, this is my first serious attempt at using various OTP behaviours and so I’m not sure whether this is a good approach or whether I’m overthinking things. Would you tackle this in another way?

    I thought I should have a long-running GenServer (LetterDeliveryService) per-letter to track the sending to all of its recipients, but I could also imagine a scenario in which the mailing to individual recipients are handled in one sort of “queue” irrespective of the Letter they are responsible for delivering.

  2. If the approach is mostly correct, should I use the async_stream_nolink method I described above, handling all results once they come in, or would it be better to use async_nolink and use the handle_info callbacks of the LetterDeliveryService GenServer?

  3. Is there a better way to handle retry logic?

  4. Anything else I’m missing?

Finally, I am aware that there are third-party libraries like Oban and Parent but I’d like to try to do this without relying on third-party libraries unless it really makes sense to include them. I want to make sure I understand how to architech such a system with OTP before integrating such libraries. But, if there is a very strong case to be had that one of those (or another) library is really what I should be using then of course I will use it.

1 Like

After reading some replies to other threads, notably this reply about retrying Tasks from @sasajuric, I think the approach I described above wherein I rely on the parent GenServer to be restarted to retry individual tasks that failed, might not be the best approach. It might be better to simply handle the retry logic in the individual Task.

The other questions still remain, however, and I’d appreciate any feedback on the overall approach.