4) Lonestar ElixirConf 2018 - Let's Talk Process Dictionary - Greg Vaughn

axelson · March 19, 2018, 8:54pm

Let’s Talk Process Dictionary - @gregvaughn

Well, the first rule of Fight Club, I mean: process dictionary is that we don’t talk about process dictionary. That’s wise because people new to functional algorithms and immutable data structures may use it as a crutch, however, once we’re comfortable with that background, we should no longer be afraid of it. There’s legitimate reasons when to use it and when not to use it, which we’ll discuss in this talk.

Audience: Intermediate, Advanced

All talks thread: 0) Lonestar ElixirConf 2018 Talks

gregvaughn · March 19, 2018, 9:14pm

Also available:
Slides: https://speakerdeck.com/gvaughn/lets-talk-process-dictionary
Code: https://github.com/gvaughn/process_dictionary_talk

axelson · March 20, 2018, 7:00pm

@gregvaughn thanks for your talk! Makes the Process Dictionary much more understandable. The use cases for logger and especially :rand/:crypto make a lot of sense (since they make the interface to those modules much easier to use). And I especially enjoyed the Q&A session at the end of the talk.

Although, based on my understanding for the CSV parsing example couldn’t you have just stored the long query results in a map or in a library like Dataloader?

AstonJ · March 20, 2018, 7:05pm

Looks like a great talk Greg - and I love see references to things said on the forum

gregvaughn · March 20, 2018, 7:07pm

@axelson Thanks! It was a lively Q&A, so I felt better about finishing a bit early.

In retrospect, ets was the proper solution for my CSV example. I wouldn’t have wanted to deal with an extra dependency for something so simple. To try to store a map locally would have meant realizing the stream (and pulling all data into memory) sooner with an Enum.reduce to have an accumulator for the lookup values.

Now for the real confession: that code is no longer used in production anyway

JEG2 · March 20, 2018, 9:48pm

I enjoyed this talk. Thanks @gregvaughn.

I did want to add two small clarifications.

First, Greg mentions that the :rand “could” allow you to manage passing the seeds around yourself, instead of using the process dictionary. In fact, it does that too! You can choose your API with :rand. Here’s some usage without touching the process dictionary:

iex(1)> seed = :rand.seed_s(:exrop)
{%{
   bits: 58,
   jump: #Function<8.15449617/1 in :rand.mk_alg/1>,
   next: #Function<5.15449617/1 in :rand.mk_alg/1>,
   type: :exrop,
   uniform: #Function<6.15449617/1 in :rand.mk_alg/1>,
   uniform_n: #Function<7.15449617/2 in :rand.mk_alg/1>,
   weak_low_bits: 1
 }, [19318074725392827 | 240284246166041928]}
iex(2)> {num, new_seed} = :rand.uniform_s(seed)
{0.9006764809368724,
 {%{
    bits: 58,
    jump: #Function<8.15449617/1 in :rand.mk_alg/1>,
    next: #Function<5.15449617/1 in :rand.mk_alg/1>,
    type: :exrop,
    uniform: #Function<6.15449617/1 in :rand.mk_alg/1>,
    uniform_n: #Function<7.15449617/2 in :rand.mk_alg/1>,
    weak_low_bits: 1
  }, [40790438808678228 | 56479292960350440]}}
iex(3)> Process.get_keys
[:iex_evaluator, :elixir_parser_columns, :iex_history, :"$initial_call",
 :"$ancestors"]

The other thing I wanted to mention is that the existence of side-effect requiring concerns, like I/O (which Greg mentions), doesn’t prevent runtimes from keeping your code pure. If you want to perform I/O, you would instead ask the runtime to do it for you. The runtime would then call into your code with details about the success or failure of the operation. Languages like Elm work this way.

Shameless self-promotion: It just so happens that I wrote a blog post about both of these subjects a while back.

I’m being extremely nit picky in all of this though. I really did love Greg’s talk!

gregvaughn · March 20, 2018, 10:28pm

Thanks, @JEG2 ! I appreciate the accountability. Those who know me have seen some nits I have picked too I did speak a bit “loosely” in some places.

Funny enough, during a follow-up slack conversation with the first person in Q&A (asking about :rand specifically) I dug in more and realized that yes, it does allow you to pass the seed explicitly. (Aside, I also learned: the older :random library also allows both explicit seed and process dictionary forms, but if you do not initially set the seed explicitly, it’ll set it to 0 so you can have repeatable randomness. “it’s a feature, not a bug”). So, in the bigger picture of Process Dictionary usage, this is a great example of having a pure way of using the library, but also allowing the convenience of the Process Dictionary too! Even better justification for using PD.

I absolutely agree with your I/O point. I was thinking more in terms of the entire system (including the runtime) but did not clarify that point. In any case Elixir is not pure, so the broader point stands.

I really appreciate the discussion, folks. I don’t often get to have this sort of feedback on a talk. Now I’m ready to come up with an improved v2 version of this talk. I think I’ll rename it “Mutants in the BEAM”

JEG2 · March 20, 2018, 10:50pm

Of course this is right, of course, but I do feel like consciously dividing up pure and impure concerns in your Elixir programs is a powerful design tool. I have been hugely inspired by Gary Bernhardt’s Functional Core, Imperative Shell and @sasajuric’s To Spawn or not to Spawn.

We don’t get all the way to Elm’s side-effect free runtime, obviously, but knowing exactly where those lines our for your own code is a very powerful design strategy. I feel this is why you so heavily advocate—in the talk—for isolation of process dictionary code (preferably in one function).

I hear you though. Elixir’s not pure and it never will be. But I sure learned a ton about how to write Elixir code that I could manage by playing with pure languages.

gregvaughn · March 20, 2018, 11:08pm

@JEG2 Agreed. I think he went off mic for some of the video, but Francesco Caesarini was front-and-center and engaged during Q&A. As CTO of Erlang Solutions, he has many war stories of being brought in to debug as a last resort when a company is having problems on the BEAM. He and I also talked more afterward and he convinced me that I was not cautionary enough about Process Dictionary. It is absolutely another place that state is managed. It should be treated very cautiously. We’re used to GenServers managing state, but Process Dictionary use is yet another way to manage state beyond that.

In addition to your references to Bernhardt and Juric, Rich Hickey talks about how state is not evil, but should be used intentionally, in few places in code. Centralizing state management is a great design guidepost. Thanks for digging deeper into this point with me.

OvermindDL1 · March 21, 2018, 2:58pm

I personally consider the Process Dictionary the ‘Environment’ of a process in operating system terms, except it’s not inherited by children by default.