Is it a good idea to store context in Process dictionary/Registry for HTTP requests?

minhajuddin · January 5, 2017, 7:20am

Passing arguments around is sometimes a pain while working with Elixir. However, for Web apps, since each request runs in its own process, we should be able to stick bits of context into the Process storage. For instance, we could stick the user_id, api_key, role and things which are may be needed across the app.

So, you would do this at the beginning of the request (maybe in a plug)

Process.put(:api_key, conn.assigns[:api_key])

And in other modules where we need an api key just do Process.get :api_key, This may also be a good place to store tenant info in multi tenant apps.

I feel this goes against the purity of functional programming, but seems like it could have practical use cases. One gotcha would be to make sure that we pass the arguments when things cross a process boundary, but that should be ok.

Would love to hear what you guys think about this

hubertlepicki · January 5, 2017, 12:30pm

I do not think this is great idea. It is one of the tempting ideas at first but then they turn out being source of long term pain and unexpected bugs.

When you move a code from web request context, to say - background job context, you could not notice the dependency on the process variable. Compiler will not warn you, but it will be nil. Your code will crash. You may have problem testing this code. You will have the need to set this up when you run something from the console, since it’ll require the existence of these global values.

I do not think that growing the conn struct to large size and passing way down is also good option. What I like to do is to make my controllers issue, let’s call it a “command”. When we issue a command, we take all stuff we need from conn, or controller methods, and trigger the command with only data we need. Passing the whole conn seems equally bad practice as having global state for me, since it has similar implications.

minhajuddin · January 6, 2017, 9:27am

Logger actually uses this strategy to store :metadata e.g. the :request_id, I can definitely see it become a pain if used to store everything and if you are not careful when things cross process boundaries, but in places where we can use AOP this seems compelling, Logger definitely fits the bill for AOP. Not sure what other things might.

gregvaughn · January 6, 2017, 11:26pm

The first rule of BEAM Club is that we do not talk about the Process dictionary

OvermindDL1 · January 7, 2017, 8:21pm

/me is a closet fan of the process dictionary, you just have to know how to use it correctly.

gregvaughn · January 7, 2017, 8:54pm

Oh, I appreciate it too. My post was tongue-in-cheek. Still, for beginners it’s an easy crutch to fall into using, so there are some advantages to making it a bit more obscure, so that ones who know how to use it correctly are the main ones using it.

minhajuddin · January 8, 2017, 12:56pm

Would love to hear your use cases

dch · January 8, 2017, 3:06pm

The main issue with using the Process dictionary is that it goes against the grain of the platform – you now have invisible mutable state that cannot be inspected through normal OTP tools, will not appear in stack traces, is not obvious to other developers that it’s being used.

This breaks a long-held debugging expectation – that you can simply copypasta a stack trace from a failing function, apply it to that function, and debug directly from there without needing further state.

You are assuming that all your code is actually sequential; as the user of many BIFs they may provide a functional interface but use casts, calls, or plain message passing behind the scenes. Your process dictionary data will not be accessible to those libraries, so you’d have to pass it through a variable. What did you gain then by using the process dictionary?

In common usage, the process dictionary is applied as a last resort to get a final erg of speed out of the platform, after all other avenues have been exhausted. That’s not to say its not super handy in the right circumstances, its just using it by default in something as common as Plug or some other HTTP handler, would have to have signifcant speed benefit to justify the loss of straightforwards debugging and tracing.

michalmuskala · January 8, 2017, 3:43pm

Two places where popular elixir libraries use pdict are:

logger - to store the metadata that is attached to each message. It would be hard to achieve current API without use of process dictionary here - you’d need to pass logger state around to each logger call.
ecto - to keep track of current transaction. Inside the function passed to Repo.transaction a database connection is checked out and saved in process dictionary. Subsequent calls to other Repo functions look for the connection in the process dictionary before checking out another connection. It would be possible to rule out the use of pdict here with explicit connection passing. Because of ecto’s architecture connection cannot be shared between processes (to have a transaction spanning multiple processes) - because of this the use of pdict is not particularly limiting, but it does introduce a level of indirection and makes some use cases problematic (those that require dynamic connection management).

sasajuric · January 8, 2017, 4:28pm

Another example in Erlang stdlib is rand. You can use :rand.uniform/0,1 to get a random number, in which case the state is implicitly managed in the process dictionary.

If you want to be explicit about the RNG state, you can use :rand.uniform_s/1,2.

OvermindDL1 · January 8, 2017, 7:01pm

Mostly just immutable state holding through callbacks when a behaviour does not handle handing state around (its happened) is the most common use-case for me (this is what the Logger module does for example). In general I treat the process dictionary as a write-once memory store. The rest of times has been purely for speed reasons when even a NIF call overhead would dominate (I was doing a lot of odd math work at one point where it helped).

whatyouhide · January 9, 2017, 12:29am

To add to the real-world use cases, Gettext stores the locale for each backend in the process dictionary and reads the locale out of the process dictionary when translating strings.

kelvinst · January 9, 2017, 11:35am

That being said, better not to parallelize something that needs translations. Right?

hubertlepicki · August 3, 2018, 3:18pm

It does seem to me that both of these solutions could be worked around not to use process dictionary and rely on something like Registry, which would keep track of logger state/metadata or ecto’s currently checked out connection. There are advantages of this that include being able to properly clean things up after crashes compared to storing stuff in process dictionary that you basically lose when process dies.

minhajuddin · August 12, 2018, 3:20pm

There is a good talk about process dictionaries by @gregvaughn here: https://www.youtube.com/watch?v=zDIoFWwfBO0

gregvaughn · August 12, 2018, 4:25pm

Whoa. Now we’ve come full circle. I also briefly reference this thread in that talk!