Vault - a lightweight process-scoped global data storage with immutability guarantees

Vault is a lightweight Elixir library for immutable data storage within a process subtree.

Due to Elixir’s actor model nature, it’s common for a process to have global context that is valid for every function call inside the process and its children.

For example, this context can include:

  • A user when processing a request
  • A tenant in a multi-tenant application
  • Rate limiting buckets/quotas
  • Cache namespaces
  • API or client versions
  • And many more, depending on your application domain

Vault.init/1 provides you a guarantee that the context can only be defined once per existing process subtree, so you won’t override it by accident. This makes it easy to reason about your context origination.

# Initialize vault in parent process
Vault.init(current_user: %{id: 1, first_name: "Alice", role: "admin"})

# Access data from any descendant process, even these not linked!
spawn(fn ->
  Vault.get(:current_user) # => %{id: 1, first_name: "Alice", role: "admin"}

  Vault.init(current_user: :user) # => raises, because the ancestor already has vault initialized
end)

# Access data from the parent process itself
Vault.get(:current_user) # => %{id: 1, first_name: "Alice", role: "admin"}

In my case, repeatedly passing the user from the connection and GraphQL context into lower-level functions became hard to maintain.

A typical flow involved extracting the user from the Absinthe resolution context, performing substantial business logic, and only at the end persisting data or writing to the audit log - both of which also required the user. Maintaining this plumbing was cumbersome.

Because the user is immutable for the lifetime of the request and retrieved only once, storing it in the process dictionary is an elegant way to eliminate redundant parameters and simplify the overall flow.

This approach does introduce an implicit dependency - the need to understand where the value originates - but since it’s initialized exactly once, the trade-off is acceptable. In most cases, callers can simply read the value without needing to think about its source.

Properties

  • Immutability guarantees. Initializes only once per process tree - will raise if one of ancestors already has vault initialized.
  • Familiar API - API is the same as for Elixir’s Map module, except for Vault.init part.
  • Any child process will have access to the parent’s Vault. We’re using ProcessTree library by JB Steadman, which does all the heavy lifting of traversing process trees and propagating data back. You can read more about how ancestors are fetched in this amazing blog post by the library’s author.
  • Once the vault is found on one of the parents, it’s cached (set in the child’s process dict), so next fetches are faster.
  • We have a set of unsafe_* functions to perform updates on already initialized vault. These updates won’t propagate to the children that already initialized the vault.

I’m really curious what you guys think!

10 Likes

I’m not sure using links is a great idea here. Links connect processes in all manner of configurations. There’s no clear parent/child hierarchy there. For that you’d rather want $ancestors or $callers (Task — Elixir v1.18.4). Then you’re also no longer traversing a graph, but just a list of parents.

6 Likes

This is a really great point! I was going back and forth on this initially, and wanted the vault to be as resilient as possible and not be tightened to OTP primitives (for example, I wanted to support Vault inside Kernel.spawn_link/1,3), so this is why I’ve chosen links.

But thinking of this more, and considering your feedback, I’ll release a 0.2.1 version which will default to using ancestors and add an option to use links globally (if needed). What do you think?

1 Like

I’d consider $callers as well. It will open the door for a lot of tooling, which makes use of it.

2 Likes

A proper Context for liveview!

1 Like

Looks useful. I assume the controversy is that this kind of thing could be seen as counter to FP patterns? I have always followed Plugs model and just passed stuff like this around, but as @olivermt says, LV is another place where it can be quite painful.

Contexts have a very particular use-case in a React-style engine, namely they allow components to re-render based on dependencies through a memoization barrier, effectively turning the dependency tree into a DAG. You can get pretty far with a tree but the DAG models certain types of dependencies better (theme styling is a common example).

LiveView doesn’t have memoization (though maybe you can do something similar with LiveComponents?), but even if it did the engine has to actually understand the dependency DAG for things to update. If you just write your assigns into the process dictionary you can access them in distant children, but they won’t be able to re-render when things change which breaks the entire declarative model.

Surface actually tried to hack Contexts onto LV and eventually gave up for this reason.

4 Likes

Looks great! I definitely have a few projects where I’d use a $callers version of this!

1 Like

I’m pretty sure ProcessTree already does that

4 Likes

I want to use it for static data that only loads at mount time. I hacked a bit on the original surface contexts and discussed a lot with marlus.

For certain use cases you dont care about orphaning the tracking like you do if you just pull the context on render in a functional component.

It makes for a lot cleaner design on certain use cases

2 Likes

Hi, good idea. I have these questions

  1. Wouldn’t it be faster (in terms of performance) and easier (in terms of maintenance) to use Registry? This way you won’t need to traverse the links tree (which is a pretty expensive thing to do) and you will have automatic cleanup of the data when the owner dies. Doing Process.info(pid, :dictionary) can be a very expensive operation, since it copies the whole dictionary (which can be quite big) into the current process and if the pid process is running, VM will block current process until the pid is interrupted, just in order to copy the dictionary.
  2. If you use Registry and remove local pdict cache, you can make this data mutable.
  3. It would be nice to have some function like allow(parent_pid, other_pid) which would allow other_pid process to access the state of the parent_pid process.
  4. Why “Vault”? I thought that Vault is a place where people put money so that no one can steal it. The name sounds more related to security, but that’s just how I feel it
  5. It has bugs in it’s design. For example, process a is linked with processes b and c. Process b has key b_key set in it’s vault and process c has key c_key set in it’s vault. I won’t be able to access both keys in the first call. So, some unexpected link will produce a bug which would be very hard to catch

Overall, I wouldn’t use the library in it’s current state, but idea to have some storage which is accessible by direct children of the process is pretty good. If I were to solve this problem, I would use $callers and $ancestors with Registry entry.

And about GraphQL, I had a very similar problem and I just passed the user explicitly. Even if I have to pass it into 8 functions deep, I would just type 8 extra words, which is not a big deal, since explicit argument pays off in readability when compared to semi-global semi-mutable storage

2 Likes

Hey! I really appreciate the feedback!

If I understand the idea correctly, we’d still need to traverse the process tree and do a Registry lookup for each process? Process.info/2 is indeed expensive, but it’s only done once per process, so the cost should be neglectible, since Vault is cached locally afterwards, and subsequent lookups are cheap.

The whole point was to make this data immutable, for ease of reasoning about it. Maybe it’s not an ideal assumption? I didn’t want to break FP purity patterns too much here.

I’m not sure if it’s needed, assuming we’re propagating the state to the bottom only. What will be a typical use-case for that?

Easy to remember name, a little implying immutability “guarantees” of the data being stored in there.

This won’t happen, since Process C won’t be able to initialize its vault, because Process B already did. And we have a loose guarantee (excluding dynamic links and creations) for at most one vault per connected components. This might be too restrictive, since we’re not only traversing children, but all connections.
Using $ancestors + $callers + :erlang.process_info(self(), :parent) as a list of associated processes should resolve both of these issues, since they’ll have a tree structure, and for a subtree - we’ll have a guarantee for a single vault.

Overall, I wouldn’t use the library in it’s current state, but idea to have some storage which is accessible by direct children of the process is pretty good. If I were to solve this problem, I would use $callers and $ancestors with Registry entry.

Can you elaborate more on why you see registry fit better here? I’m not seeing any clear benefits as for now, but I’m most likely missing something?

And about GraphQL, I had a very similar problem and I just passed the user explicitly. Even if I have to pass it into 8 functions deep, I would just type 8 extra words, which is not a big deal, since explicit argument pays off in readability when compared to semi-global semi-mutable storage

Actually, in relatively complex systems that are already built - this quickly became an issue when we needed to introduce audit logs with an actor. Places, where previously we didn’t have access to the user needed to have it, and there was no easy way to refactor all these functions to introduce another argument.

1 Like

I think it’s actually somewhat different. Vault’s idea is to place some initialization guarantees, in addition to access guarantees.

We’ll have a vault per processes subtrees, and you won’t be able to override it from any of the children - by design. Accessing parent’s vault from a child process will indeed cache it in local process dict, this part is similar to what ProcessTree does at a first glance.

In the future, we might be able to provide granular caching controls, so that you tailor it to your specific needs.

Very cool! I really appreciate the Process-oriented architecture here, and looking forward to 0.2.1 using $callers.

This feels like an alternative take to Phoenix Scopes – how do you think this could work with Scopes? I realize Vault is focused on secrets, but I figure there’s a lot of overlap in how it would be used, since scopes are also designed to alleviate property drilling.

Thanks!

This feels like an alternative take to Phoenix Scopes – how do you think this could work with Scopes?

I think Phoenix Scopes are not solving the same problem Vault solves. Scopes standardize a way to initialize the context in a plug, but the problem of passing it down still persist.

I see two options of these two playing together:

  1. You still assign context to your connection/socket in the plug at the beginning of the request processing, but in addition to that, you’re also initializing a vault with a part of the context which is immutable and valid across the whole request processing.
    Then, down the flow, if you have access to conn/socket - you extract the context from there, but if you don’t - you use Vault.
  2. Instead of putting the context in assigns, you only put it in Vault. Down the flow you’re always using Vault.

I’m leaning over option 2 for simplicity reasons, but which one is better depends on a use-case.

I realize Vault is focused on secrets, but I figure there’s a lot of overlap in how it would be used, since scopes are also designed to alleviate property drilling.

Actually, Vault has nothing to do with secrets, and it’s probably a bad name for a library to manage state, buuuut, I like it still. I’ve chosen Vault to imply immutability guarantees that it gives to you. So it’s not just randomly putting things in Process dict, but you can only do this once per a subtree, and vault actually checks if any parent hadn’t initialized the vault before. This gives you more confidence in inferring where the value from the vault is actually coming from, which might pursue FP best practices.

iex(2)> b = spawn(fn -> Vault.init(b: 1); receive do x -> x end end)
#PID<0.239.0>
iex(3)> c = spawn(fn -> Vault.init(c: 2); receive do x -> x end end)
#PID<0.240.0>
iex(4)> Process.link(b)
true
iex(5)> Process.link(c)
true
iex(6)> Vault.get :b
1
iex(7)> Vault.get :c
nil

And the other way around

iex(2)> b = spawn(fn -> Vault.init(b: 1); receive do x -> x end end)
#PID<0.159.0>
iex(3)> c = spawn(fn -> Vault.init(c: 2); receive do x -> x end end)
#PID<0.160.0>
iex(4)> Process.link(c)
true
iex(5)> Process.link(b)
true
iex(6)> Vault.get :c
2
iex(7)> Vault.get :b
nil

Yeah, you’re right, this is a wrong argument for Registry, but this argument applies to my point about $ancestors and $caller

Sure, but you already have the API to change the value, which only works for the current process. I only suggested to improve this situation. In the end of the day, any approach can be made immutable by removing the function to update the data :grin:

Tests, like common use case of Mox’s allow. Process pools. Or I use a library which just spawns a process without link and I want to share the state with it. Not all processes links form a tree, although that’s the best approach to do it imo.

I understand it now. Sounds like a use-case for Logger.metadata

Correct, in the example you’ve provided you’re linking the processes after initializing the vault, and because of that it’s not working. But I think not using links will eliminate this problem entirely.

Yeah, you’re right, this is a wrong argument for Registry, but this argument applies to my point about $ancestors and $caller

Yep, will apply this, thanks!

Sure, but you already have the API to change the value, which only works for the current process. I only suggested to improve this situation. In the end of the day, any approach can be made immutable by removing the function to update the data :grin:

It’s prefixed with unsafe_*, so it’s unsafe :person_shrugging: and I think we should be fine with not placing any expectations there? I’m open to any alternatives though

Tests, like common use case of Mox’s allow . Process pools. Or I use a library which just spawns a process without link and I want to share the state with it. Not all processes links form a tree, although that’s the best approach to do it imo.

I think using $ancestors + $callers + :erlang.process_info(self(), :parent) as a list of associated processes should resolve this issue and you’ll be able to initialize the state inside a setup block.

I understand it now. Sounds like a use-case for Logger.metadata

Not really, I want to persist these audit logs in the database instead of flushing to some external system. Also, it’s not only applying to audit logging, there are a ton of places you need to refer to user (setting user-related columns in the db, which represent the actor of the action), verifying permissions closer to the context, and many, many more. Plus, it doesn’t need to be user, it can be a broader context where you put whatever you like. You just need to guarantee that it’s not changing across the request (or the changes of these entities do not impact the lifecycle of your process).

1 Like

I’ve just released Vault v0.2.1.

Thanks to @Asd, @jswanner, and other folks suggestions, we’re now relying on ProcessTree library to traverse process tree in an efficient and inclusive way. It relies on Process.info(pid, :parent) and fallbacks to processes $ancestors and $callers if OTP<=24 or if parent process is dead, which is exactly what we need in this case. Big kudos to JB Steadman, the author of ProcessTree.

Since ProcessTree does all the heavy lifting for process traversal now, I was going back and forth on the need to have Vault as an abstraction layer above it in the first place. And I think it still has value in guaranteeing the immutability for the process subtree and defining a clean API on how to initialize and access this data.

My primary use-case was for propagating across a single process, so I definitely still see some value being added, but I would love to hear what you guys think!

5 Likes