Planning first elixir/phoenix APP - state questions

Newbie here, hope question makes sense :slight_smile:

I want to use elixir processes as the main source of truth and use DB only for persistence. Here is what I had in mind:

1: When user connects to socket/channel, spin-up their own genserver/agent process and load initial state from db. This state(3-10mb) will be updated very often via other elixir processes and then delivered back to users over channels.

2: Save it to db every x minutes

3: When user disconnects, close process.

Would it make more sense to hold all users data in something like amneisa? Another option would be to not even wait for users to connect but start processes with application - 1000 users with 10mb of data each… no problem holding it in memory? Benefit of this approach is that they will probably get initial data slightly faster when they connect.

1 Like

I guess the big question is why? (what are you trying to accomplish?) BTW amneisa would be a really cool name for DB :slight_smile:

I would ask have you looked up event sourcing before? One of the key architectural design decisions behind most event sourcing frameworks is not considering any command as being implemented until after it has been persisted to the main event store (which is the nosql main source of truth). If you do it the other way around, then there can be discrepancies between the client’s state and server’s state. This is slightly altered though if you are planning on really clustering your elixir nodes to provide redundancy and availability.

It also depends on how badly your users need accurate information of course! If you haven’t already come across the term, I would be sure that you read up on “eventual consistency” and its various flavors to be sure that you’re picking the right strategy.

This reminds me that I actually recently saw a good video from Chris Mccord about the new Phoenix Presence and the complexities that come with using the Elixir nodes without a “proper” DB. But that is regarding across a cluster with no single source of truth.

Also, I am NOT an Elixir expert and am still a newb myself (a few months). But your design has some similar aspects to my own research into the area, so I figured I’d chime in.

2 Likes

http://erlang.org/doc/man/mnesia.html

EDIT: To be clear, I am NOT advocating the use of mnesia to solve the OP’s question. Rather I’m just pointing out that there is in fact a db by that name.

Yes but without an A it’s a totally different thing Amnesia would be infinitely scalable write only DB :slight_smile:

What you’re talking about is called /dev/null :stuck_out_tongue: There’s even an aaS version.

Amnesia, however, is a thing: an Elixir wrapper around mnesia.

1 Like

Oops didn’t know the name of the wrapper lib :slight_smile:

1 Like

Ha, this is very interesting. I have been having similar feelings and did some experiments going towards this direction.

In my canse, I have a system where I have very complex permissions system. I have Users that can have assigned many roles, and on top of that be granted excetpions to access parts of the Records. The Records themselves can have many Positions and they have many more sub-sections/related tables in database.

The SQL query that finds records given user has access too is literally an A4 page of 12px print. It’s insane piece of SQL glued together, having a few layers of subqueries. In addition to that I need to answer the reverse question: which users have access to given record, and you may guess - it generates the opposite insanely crazy A4-page of SQL print. On top of that I have to keep these two queries in sync.

I did an experiment basically replacing my Records with aggregates that are GenServer behind the scenes. Not only I could replace the logic written in unreadable SQL with a nice set of function calls, but I improved the performance of the system by an order of magnitude (queries down from ~800ms to ~20ms).

Moreover, if we localize the updates to the records, we no longer have inserts/updates affecting the performance of the queries. We can also get rid of majority of datase-backed transactions and replace them with in-aggregate application-level transactions instead.

I think DDD is a natural way of building Erlang/Elixir apps. You may not even call it DDD. But look around the ecosystem and it’s happening all over the place. OTP gives us ready to use primitives: GenServers can be started as aggregates, we can hibernate them when not needed, persist state to DETS / mnesia / SQL database, shard naturally across the cluster. This is the way you can build more horizontally scallable system.

In my case the speed improvement is one thing, but I am also able to localize the queries/updates, allowing me to do more things at the same time in parallel. This is a big gain for a complex multi-tenant app that has a lot of data to process, but it can be sharded in a very nice way.

When you build an app that it’s whole global state is saved in SQL (or NoSQL) database, this is the one source of truth. This is the bottleneck, this is the place where a lot of convoluted logic lands in, this is the pain for scaling, updating, migrating the data.

Ont the other hand a system that is built around DDD principles, can have the “object-oriented” abstractions put in place: aggregates that are processes/genservers, communicate via message passing, do a lot of things in async fashion etc. At the same time you can very well keep the SQL database and use it in the way it was meant to: to generate reports, ad-hoc query the data, analyse it etc.

Ok that sort of got out of hand with the story above. But my point is: your approach is probably great choice. You will use the power of Erlang/OTP to help you scale the thing out, and you will end up having better system architecture at the same time.

2 Likes

Wow, that may be more fine-grained but I think that is more excessive than even my style. o.O

I use a set of permissions that use my PermissionEx library on hex to perform a set of matching. I do not dare ever let any user know anything about the sql queries, the models and schemas are disjoint, but rather I do permission testing at all access points like:

  def edit(conn, %{"slug" => slug_param}) do
    happy_path!(else: handle_error(conn)) do
      @perm true = conn |> can?(edit(%Perms.CheckInOut{section: true}))
      {:ok, section} = verify_single_record get_section_by_slug(slug_param)
      @perm true = conn |> can?(edit(%Perms.CheckInOut{section: section.slug}))
      changeset = CheckInOut.Section.changeset(section)
      render(conn, :edit, changeset: changeset, section: section)
    end
  end

The first testing that they have any access to edit a possible section, and the second permission test tests if they have access to that specific section (it is well cached and fast).

Completely agreed. In my (experimental) architecture, I’m actually keeping each transform of each “aggregate” as a separate immutable process, so the user may go back to any point in its history and “fork” that version of the aggregate. Right now I’m sticking with the persist to the DB first, but I’m looking forward to changing this as an optimization and be more optimistic with replication across a cluster and having the DB simply be another fault-tolerance mechanism.

1 Like

yeah, the system is nuts. It’s not just having access to certain parts of the record, but also different users get a different view on it. I mean for some users the overall state of the record could be active but for others (who can’t see certain parts) the state would be inactive etc. etc. etc. SQL does not make this easy at all.

Yeah that sounds like a hard problem to solve… o.O

This might be relevant http://erlang.org/pipermail/erlang-questions/2015-January/083001.html

1 Like