Some help on designing Elixir application based on "Data", "Behavior" and "Processes"

qwerescape · January 2, 2018, 4:34pm

In @josevalim 's talk https://youtu.be/IZvpKhA6t8A?t=1299, he mentioned data, behavior and process, I really like the concept of “Immutable data”, “Pure function behaviors that transform data” and “process capturing state changes with time”.
However I feel like I really suck at implementing this, mainly because this concept works beautifully when the relationship is one way: “process calls behavior that transforms data”; it is weird when behavior needs to know about process. An example would be retrieving a pid from the Registry, something like Foo.get_pid(key, reg \\ Registry.SomeRegistry), this function definitely involves “time” because whether the key is in the registry depends on what happened before the function call. Does that mean my Foo module should be a GenServer? Do you guys put functions in different modules based on their purity? If so what are some examples? If not, how do you justify your design?

Thanks in advance

peerreynders · January 2, 2018, 5:47pm

I think we kind of touched on this in your other topic - you may be getting caught up in the syntax (it’s a function call) while ignoring the underlying semantics (it’s an interprocess request/response).

This is why I like a clear demarcation between functions not requiring access to the concurrency primitives (spawn, send, receive), i.e. that use purely sequential code and those requiring the use of the concurrency primitives.

While not all “sequential” functions are pure functions, pure functions can only consist of sequential code - therefore their use will never breach the process boundary. So “pure” functions can only ever be used to effect transformations within the local process. However they can be used to transform the local process state - change which can affect how the process will interact later with other processes.

I simply think it needs to be obvious when a function has the potential to breach the process boundary - segregating functions into distinct modules is a possible approach, though I think it’s probably too heavy handed in many cases.

Does that mean my Foo module should be a GenServer?

No. It could simply play the role of a “utility script” for a frequently needed capability.

In terms of the “time” aspect I think you need to view it in it’s role with respect to a protocol (not the Elixir kind but the protocol defining an interaction between many participants (processes)). For example the registry’s API may consist of a number of functions to “access” it - but what is really important is how that API is used to implement the protocols of registering, querying, updating and un-registering of key/value pairs. So protocol dictates that Foo.get_pid/2 will return a pid provided one was registered previously for the specified key and hasn’t been unregistered since.

So in a sense process state should be “protocol state”.

qwerescape · January 2, 2018, 6:31pm

@peerreynders I was hoping that you’d reply, thanks!
This is still unclear to me. I might be mixing up your ideas, please bear with me.

I simply think it needs to be obvious when a function has the potential to breach the process boundary - segregating functions into distinct modules is a possible approach, though I think it’s probably too heavy handed in many cases.

What do you do to make these functions obvious? I also agree that modules are heavy handed (hence this question), so how do you distinguish between “data”, “behavior” and “process”? Maybe Jose was just talking about separating them conceptually and not semantically?

In terms of the “time” aspect I think you need to view it in it’s role with respect to a protocol

I am having trouble understanding what you mean by this, could you explain it in a different way? My definition of “time” is that: a function depends on time if given the exact same inputs, it will return different outputs when called at different times. Loading a pid from Registry falls into that description. My confusion was “do us programmers need to worry about functions like that by segregating them into separate modules?”

Thanks

kelvinst · January 2, 2018, 7:15pm

Important to notice that being a time-aware code, does not imply the need to implement a GenServer or any other OTP behaviour.

GenServer is a tool, and as a tool, it was designed to solve a problem, which was to define a common interface for servers of a client-server relation. The OTP documentation is great to know what tool to use for each problem.

Here is a very easy to understand example: the DateTime module. You might see it as a very basic module. But well, even being this basic, DateTime is time-aware (obviously), otherwise DateTime.utc_now/0 should return always the same result, and it obviously doesn’t.

But, as you might notice, there are pure functions on DateTime too, like DateTime.to_string/1, which will always return the same result while the same input is given.

The question here is: all the DateTime functions are somehow related to date and time things. Be it converting the current time given from the OS to a DateTime struct, or even converting a DateTime back to the OS representation.

That’s what defines what goes inside DateTime and what not. The module concern defines what should be its content and not the inverse. Otherwise, we would have two modules for DateTime, and that would make the API a little bit more confusing.

This is actually one of the great advantages of the BEAM languages IMO. The ability to abstract asynchronous execution inside your behaviour so well you don’t even notice you are sending and receiving messages from another process.

peerreynders · January 2, 2018, 7:48pm

Naming mostly (one of the reasons I’m very quick with converting anonymous functions into module functions or at least a module function that creates the anonymous function (for the benefit of the closure)). To me

Foo.get_pid(key, reg \\ Registry.SomeRegistry)

telegraphs intent of crossing the process boundary given that our own pid is available via self() and a “registry” is mentioned which typically identifies a process pid for the process that manages that information.

What would be even more helpful is a Typespec that would identify reg as a pid().

The thing is that some of the naming may be a bit subtle for a beginner - who may need to be hit with a two-by-four to get the idea - but once your mental model for OTP and process based programming has formed sufficiently it should be fairly easy to “telegraph your intent” (which I suspect @OvermindDL1 was talking about).

I am having trouble understanding what you mean by this, could you explain it in a different way?

To come at it from a different angle I can go back to the OO days of UML and CASE-tools. People often provided lots of information to assemble their static class diagrams - which can be helpful to get a sense of the static partitioning of the logic in the system especially if you are trying to hunt down some logic that you need to tweak.

By and large the class diagrams are utterly useless when you are trying to understand how the system works because that behaviour is dynamic. For that you need activity and sequence diagrams because they describe the protocols enacted between the object instances to get stuff done. Activity and sequence diagrams have a time line which sequences the “messages” as they are exchanged between the objects.

I’m sorry that I long ago coined the term “objects” for this topic because it gets many people to focus on the lesser idea.

The big idea is “messaging”

And we’re no talking about individual messages - we are talking about the “set of messages” that are exchanged between the participants to enact a protocol.

With processes it’s too easy to focus on a single process and what it does - when in fact it is necessary to observe the entire protocol over all the processes involved as the protocol unfolds (evolving protocol states of the participants) over time.

With an API it’s too easy to focus on just the available methods (or functions) - ultimately it’s more important to understand in what sequence (again passage of time) the functions have to be used to effect a desired outcome.

Watching

Torben Hoffmann - Thinking like an Erlanger
[Torben Hoffmann: Protocols - The Glue for Applications]
(Protocols - The Glue for Applications)

will probably give you a better idea of what I’m trying to convey.

qwerescape · January 2, 2018, 8:03pm

Darn I can’t mark 2 solutions. Thanks @peerreynders and @kelvinst!