Feeling lost with umbrellas

Just straight to the point.
Probably I do some overthinking but just give me a try.

The best way to understand something is to take live example. When user makes a post all the subscribers should receive 1 notification which is just regular record in the feed. So once you have a huge amount of subscriptions the posting process becomes slow.

How can I leverage the power of actor model and delegate to another process to select all my subscribers and insert back notifications for everyone?

So in my mind it must be solved within umbrella app which has an app for API as before and other one aka notifications system. But what is the best approach suits here? The questions in my head rapidly grow if I go deeper. Do I need to worry about downtime of notifications system? What are the costs to transfer ~1k user ids to another app? What if want to move notifications system to another physical machine which increases downtime of a node in case? Shall I store locally last successful notifications fanout?

1 Like

Firstly. Your questions has more to do with how to design applications using processes than it has with umbrellas. For me Umbrella application is more a way to organize your code rather than how to design the actual software.

The building blocks for OTP are: Releases → OTP applications → Supervision trees → Processes.
The code organization is: Modules → Functions

So in your case you clearly want a separate process for doing notification. This can be done in the same module and even function as your initial code. You are still using the “actor” model.

Once you start formalizing and handling corner-cases with normal processes you end up re-inventing OTP so that is why this is the recommended approach to start with.

To work a little bit with your “live example”

In this case I would have started by hiding the “notification” call behind a module.

Lets say your notifications are email notifications to each subscriber. Your original code looks like this:

def new_post(author, params) do
    {:ok, post} = Blog.new_post(author, params)
    subscribers = Blog.get_subscribers(author)
    Enum.map subscribers (fn s -> send_email(s, post) end)
end

Then I want to hide the implementation detail of the notifications into a module.

def new_post(author, params) do
    {:ok, post} = Blog.new_post(author, params)
    subscribers = Blog.get_subscribers(author)
    Notification.notify(subscribers, post)
end

defmodule Notification do
  def notify(subscribers, post)
      Enum.map subscribers (fn s -> send_email(s, post) end)
  end
end

Once this is done you can re-factor Notification whichever way you like. In the first take you just spawn a process to make the notify asynchronous.

def notify(subs, post) do
    spawn(fn -> Enum.map subs (fn s -> send_email(s, post) end) end)
end

Then you want to add some safety to it so you implement, spawn_link and/or use monitors.

Then you move into having the Notification system as a GenServer so that you can have it supervised.

Once this is done it can be plugged into any supervision tree. It may be running locally in the same OTP application as your original app. Perhaps you want it to be a separate OTP application. Then you have the choice of using umbrella apps, which I tend to do if the OTP application is not really stand-alone and depend on the other apps in the umbrella, or a stand-alone OTP application you bring in through mix dependencies.

If the notification system is truly important so that you should not be able to save a new post without it you reflect this either through the supervision tree. Or through having your original OTP application depend on the notification system.

In either case if the notification system dies, your system will die (and hopefully get restarted by the BEAM VM).

The cost of transfer is really between processes and not between apps. And sending stuff between processes is basically how you do anything in elixir so even sending quite large data structures is OK. Something like 1000 user id’s is no problem at all.

The great thing about erlang processes are location transparency. You interface with the process the same way if it runs on a separate node or not. In terms of failure handling it makes it a little bit harder though as you can’t (or should not) use supervision trees between physical nodes and relying on “erlang distributed applications” is not really recommended as they are prone to split-brain scenarios.

If you worry about the downtime here you would have to have some sort of synchronous reply from the notification system, at least so that you know it has received your message and perhaps persisted it. What if the reply is missing? Well, if you go down this route you are in for a treat (Two Generals' Problem - Wikipedia) and you will either become a professor of distributed systems or (as the rest of us) just completely insane.

At the end of the day you have to decide how important notifications are for you. There are very few notifications systems that can guarantee delivery.

So when you save your post you have to decide how long to wait for. Your notification system could persist the notification task before replying to the caller and then clear it off once you have sent all notifications.
In any case you always run the risk of not sending a notification or sending too many.

6 Likes

Thank you for this good answer!

That’s the point. The posting should not depend on notifications. I assume then here is a question how to properly communicate between two nodes and avoid termination when notifications system is not available.
I would transform this

Into

defmodule Posts do

  alias Ecto.Multi

  def new_post(author, params) do
    cs = post_changeset(author, params)
    Multi.new
    |> Multi.insert(:post, cs)
    |> Multi.run(:event, fn %{post: post} ->
      cs = Ecto.Changeset.change(%Event{post_id: post.id}, %{action: "create"})
      Repo.insert(cs)
    end)

  end
  NotificationsObserver.notify_subscribers_for_author(author)
end

defmodule NotificationsObserver do
  use GenServer

  # ...

  def handle_cast({:notify, author}, _state) do
    events = Blog.get_recent_events(author)
    Enum.map(events, fn events -> 
      post = Blog.get_post(event.post_id)
      subscribers = Blog.get_subscribers(author)
      success = # TODO: Call another node to notify with post
      events = Blog.remove_events(events)
    end)
    
  end

end

So an event here is my guarantee to send not acknowledged notifications. Once notifications is unavailable posts collect events. Once it becomes available I could recover missing notifications. Does it make sense? Then a new question appears shall I use MQ then? And a new question appears is it right to use AMQP.Basic.publish inside transaction?

Ok. If umbrella app doesn’t fit here then do I have to create another app and connect nodes? In that case the weak place is network since nodes connected within TCP/IP socket.

I don’t. The picture is to split posts into posts and notifications. In theory notifications could become unavailable in two cases either electricity or internet.