Process Dictionary vs. Passing State

aj-foster · March 11, 2024, 5:45pm

tl;dr I have a process that passes around a lot of state. Doing this instead of using the process dictionary affects coding design decisions. What would you do?

I have a project that passes around a lot of state. For example, this tracks the processing of an OpenAPI description as operations and schemas are discovered and named.

So far, I’ve followed the best practice of always passing this state around as arguments rather than using the process dictionary. It makes a few things awkward, like the occasional function that doesn’t need to know or update state except in rare circumstances, but must pass it through anyway. Because this project is meant to be pluggable for its consumers, I’ve made certain choices to avoid passing state in and out of callbacks in situations where a consumer is likely to mismanage the data. This has secondary effects for the project’s architecture which, although workable, reduce flexibility.

As the author, I can foresee and enforce that all of the processing will occur in a single process. I can also provide a suite of safe state management functions that discourage any manual editing of the data. The flexibility of always having the state available to read and write is alluring.

What would you do? Why? If you’re on the side of never use the Process Dictionary, can you imagine a circumstance where you would?

mpope · March 11, 2024, 6:11pm

I’m on the side of ‘Use the Process Dictionary when it makes sense’. It is often easier to use the process dictionary to store and access ‘context constants’, like the pid of a parent or a user’s name or ID that will never change in the current context, and those processes die at the end of the context so they do not linger. Mutable state or it’s side effects caused by changes to Process Dictionary can be an absolute nightmare to debug. In your case being used from a single process as mentioned could be very difficult without care, what if two instances were being used in the same process for example.

So far, I’ve followed the best practice of always passing this state around as arguments rather than using the process dictionary.

I think I agree that there is some cumbersomeness with having to pass a state term to APIs even though in some cases it might not be used, and further returning it from all library functions. Consider this: passing the state and returning the state even if the library might not actually use it is easier for to remember than ‘Does this API require the state object or not…’. A consistent interface is useful.

The alternative to returning the {state, value} tuple is to use getters and having the state object perform all tracking:

state_new = OpenAPI.Processor.Type.from_schema(state_old)
value = OpenAPI.Processor.Type.get_schema_value(state_new)

discourage any manual editing of the data

Good luck

derek-zhou · March 11, 2024, 7:12pm

Then shouldn’t you just use a GenServer and hide the state from the outside?

al2o3cr · March 11, 2024, 8:03pm

An “intermediate” storage strategy might fit better: keep an ETS table (or several) that are passed into functions.

That would give the same “implicit write everywhere” mutability as the process dictionary, without the exactly-one-per-process limitation.