Process and Stateness are they object?

Woody88 · June 24, 2017, 3:46pm

Hey everyone,

So I have been reading two elixir books Functional Web Development with Elixir, OTP, and Phoenix and Elixir in action.
In the first book , it shows you how to hold state using Agents then later on changes the agents to GenServers. Now, I was starting to really understand what they were trying to convey, and the idea that kinda came up was that processes can be use kind of like objects to save your state. Now in Elixir in action, I don’t remember if it was the in book or in one of @sasajuric’s blog (Sorry I have been so busy and I have been reading a lot of other books as well so my mind is kinda of all of the place), but I believe that he mentions that we should avoid using processes has a mean to create objects in elixir.

Can someone please clarify and elaborated on this ? Because now I feel like I am abusing processes… It’s as if I am going back in that database model form. I’m finally getting the part where your phoenix is not your app, I try to tackle things by creating my system separately with elixir first to move away from creating my module around DB model. But now I am confused again.

I feel like I am missing more things so you guys can better understand, but I don’t have much time and I really wanted to post this now because its been bugging my mind for a while now. I will try to add what I forgot later on if any.

Thanks in advance.

JEG2 · June 24, 2017, 5:02pm

Functional Web Development with Elixir, OTP, and Phoenix is currently being revised towards a more idiomatic usage of Elixir processes. I would say that the design @sasajuric shows in Elixir in Action and blog posts like To spawn or not to spawn? is more what you should be paying attention to at this time.

Processes are not Elixir’s equivalent of objects from OO languages. You should model your data as some combination of the provided primitives and model behavior as functions that operate on those data structures. Processes are for isolating subsystems for reasons of performance and error handling.

Azolo · June 24, 2017, 8:10pm

If you were to have a crude analogy of underlying concepts then: data is data, processes are threads, and state in processes are like mutexes.

Everything in the process is happening in a synchronous manner, meaning state in a process is manipulated in a synchronous manner.

If your data is manipulated in a pipeline, then you don’t need a new process to hold state. The process spawned from the web request is good enough to maintain the request state then return it when you’re done.

If you need an asynchronous non-blocking data-manipulation pipeline, then a new process is the right way.

If you have multiple processes that need to manipulate the same data then a process is how erlang prevents race conditions. (This is essentially GenServer)

peerreynders · June 24, 2017, 8:46pm

Don’t think that works - mutexes (locks) are only necessary when threads share data that they intend to mutate. The shared-nothing approach of actors does away with the need for locks entirely.

I suspect you meant to say:

And race conditions are still a concern in Erlang/Elixir (however OTP has your back for the most part).

Qqwy · June 24, 2017, 8:46pm

Another analogy that I find helpful myself:

Think of your application as a company. Think of a process as an employee of that company. Different employees perform different tasks, preferably with as little coordination needed as possible, because coordination cannot happen in parallel. When a worker encounters someone they cannot cope with, they quit. Then their supervisor replaces them, and possibly their colleagues as well if it turns out the replacement cannot handle the job (with this team) either.

When doing OOP, you are usually only talking about the work that should be done in your company, rather than also who (of your companies workers or teams) should do it.

Azolo · June 24, 2017, 9:09pm

I said it was a crude analogy.

But the mutex analogy was because actor the actor model exempts the need for them. I was trying to say in other models where you might need a mutex you would instead use process that controls state.

And I was referencing race conditions that might come about in regards to manipulating state. Sorry if I wasn’t clear.

The “synchronous -> sequential” comparison makes more sense though. I was thinking, a GenServer handles each message while other processes remain blocked while they wait for a call response.

Also, I should point out that the actor model isn’t unique to elixir. Java has Akka that uses the actor model. I bet there are probably tons of objects in their actors.

I guess a good thought excercise may be, “If I was using objects in my processes, how would I interact with those secluded objects?”

peerreynders · June 24, 2017, 10:32pm

A mutex is locking mechanism used to synchronize access to a resource - and an actor doesn’t share it’s state with anyone. My point was that trying to draw an analogy could be counterproductive - what is needed is a shift in thinking.

Yes, but the blocking isn’t fundamentally related to waiting to gain access to a shared resource. A synchronous call is necessary to verify that a message has been successfully processed and/or to deliberately wait for the result. In many cases a cast can be just as effective (and the result can be cast back) and nobody needs to be blocked at all (and ultimately even a GenServer is just another process - it’s just that the OTP guidelines enforce a certain structure to it but it’s actually the least special (and therefore most generic and useful) of all the behaviours). Also synchronous and asynchronous relate to inter-process activity. When we are talking about intra-process activity there is only sequential execution of expressions.

The process would simply act as an aggregate to the objects contained therein - anything outside of the aggregate boundary doesn’t have access the the objects - they have to communicate with the aggregate in order to query or mutate the aggregate state.

So in Akka all that has to happen is that somehow it has to be guaranteed that at any point in time no more than one single thread is running through the actor instance. Mutation isn’t a problem when data isn’t shared. Anything the actor has access to isn’t shared so there is no need for locks. Any and all collaboration goes through the actor’s mailbox (which is where most likely the locks are hiding in the underlying framework as there aren’t any schedulers).

peerreynders · June 25, 2017, 3:49am

Old quote:

Ten Pounds in a Five Pound Sack - Representation Is the Essence of Programming

… Show me your flowcharts and conceal your tables, and I shall continue to be mystified. Show me your tables, and I won’t usually need your flowcharts, they’ll be obvious.

Frederick Brooks - Mythical Man Month (1975)

Even today a lot Frederick Brook’s insights still apply - but when it comes to this particular point I personally believe that it has become obsolete. By focusing primarily on representation you’ll always end up with a CRUD system.

These days I tend to side with Sam Newman:

These capabilities may require the interchange of information — shared models — but I have seen too often that thinking about data leads to anemic, CRUD-based (create, read, update, delete) services. So ask first “What does this context do?”, and then “So what data does it need to do that?”

Sam Newman, Building Microservices (2015)

Ultimately the application is supposed support use-cases that will enable the users to meet some set of concrete goals and it’s the use-cases that help identify what data is necessary to achieve these goals - so really what the application is able to do should be the primary concern, the data that enables the “doing” is a necessary but secondary concern. So it makes sense to create some of the logical boundaries around the system’s activities (runtime berhavior) rather than the just focusing on neatly partitioning the data for persistent storage.

Actors and objects have one thing in common - they need to collaborate. However the means of that collaboration is quite different - message passing vs. method invocation. The motivation behind the encapsulation of state are quite different as well.

An actor doesn’t share state so it can maintain run-time autonomy and isolation (a byproduct of which is that there is no need for synchronization) - if it dies it doesn’t take the “rest of the world” with it. OOD heuristics on the other hand focus primarily on design-time autonomy and isolation. OO is more focused on segregating commonalities from variabilities and hiding implementation details that may be subject to change during the entire lifecycle of the application. When OO was conceived computers were still mostly uniprocessors and OO never really concerned itself with runtime isolation as fault tolerance and distributed processing were never stated design goals.

Both actors and objects are essentially “black boxes” that expose an (message or method-based) interface but the criteria that lead to their “optimal boundaries” are quite different. Now that doesn’t mean that the essence behind some of the OO design principles can’t, after some careful re-interpretation, apply to actor-based systems - but thats a separate topic.

To spawn, or not to spawn? is quite specific:

Use functions and modules to separate thought concerns.
Use processes to separate runtime concerns.
Do not use processes (not even agents) to separate thought concerns.

Now initially I would have been tempted to use “design concerns” instead of “thought concerns” - but that would have been a grave mistake because in Erlang/Elixir “runtime concerns” are also “design concerns” - less so because of performance but moreso because of fault-tolerance. So my interpretation of “thought concerns” is the typical compartmentalization of functionality that we engage in to make things “easier to reason about” while simultaneously structuring the application to align itself with the user’s domain. So deck, hand, round align with the domain concepts while round_server and notifier are primarily concerned with orchestrating how the rounds unfold over time.

Now it would be a mistake to try to compare the “size” of processes and objects - some processes can be large, i.e. hold complex state that would typically be divided over a large number of objects but there are also processes that are extremely short-lived and trivial (e.g. completing a (blocking) “call” for the parent process which can’t afford to be blocked).

OO design often obsesses with reuse - ignoring YAGNI in the process. Lately a new criterion for choosing boundaries seems to have emerged - impermanence and replaceability. Actually, it’s not all that new:

Program to an interface, not an implementation.

p.18 Design Patterns: Elements of Reusable Object-Oriented Software (1994)

Now typically this is seen as a means to break the coupling to a concrete implementation but it also has the effect that you can “throw away” the current implementation and replace it with an entirely new (rewritten) one.

So recently some teams have been placing more emphasis on composing systems from parts which are “small enough” to be rewritten in one day to two weeks (depending on the circumstances). If any one part gets “large enough” to become “too large or difficult to replace” it is refactored towards a smaller implementation or broken down into an arrangement of more replaceable parts. So “replaceability” could be another aspect that can potentially help to find better boundaries for something as small as a process or something a large as an OTP application.

sasajuric · June 25, 2017, 8:54am

That is pretty much what I refered to with the term “thought concerns”.

I deliberately didn’t use “domain concerns”, since sometimes different domain entities might be powered by separate processes. For example, in a chat server, a private conversation might exist in the context of a chat room. Yet, you might still end up using a separate process for that conversation, because it only happens between two members of the room. That would boost the scalability (with the respect to the number of private conversations in a room), and fault-tolerance (failure of one conversation doesn’t take down all other conversations, nor the public chat). Although there is a domain relation between a chat room and private conversations, that doesn’t mean they have to be powered by the same process.

So, to summarize, in that post “thought concerns” refer to “I want to somehow organize a larger chunk of code, so it’s easier to work with”, while “runtime concerns” refer to “I want to get some observable runtime benefits, such as fault-tolerance, scalability, or potential for parallelism”.

aseigo · June 25, 2017, 10:10am

I really like @peerreynders’ response … quality stuff

In addition to what they wrote, processes behave like objects and can be viewed as such. In fact, they are closer to what objects in early OO languages were envisioned to be than what we typically have today in OO languages: they encapsulate state and pass messages.

Where things diverge, however, is that OO tightly couples the code (as defined in i.e. a class) and the runtime instantiation (the object): the code defines the object. As a result, OOD revolves around sculpting objects that map to a problem space which leads directly to how the code is arranged. That has deep design implications, which can be good or … less … good, largely depending (at least IME) on the domain that is being modeled. (Also the care given and skill of the people doing the design and implementation, but let’s assume we’re all reasonably competent and caring coders here … )

With Elixir’s processes (which are almost-but-not-perfectly Actors, at least academically), those two concerns are separated. The code should be distributed into modules along topics (however you end up defining those) to keep like actions on like concerns together. But then the “objects”, those processes, can run any mix of that code as they wish.

So with Elixir you really have two stages of design thought: the modules that provide the functionality, and (separately!) how those will be combined at run time to achieve different tasks. The former is “static” (written, compiled, runnable) and the latter is “dynamic” (can be entirely non-deterministic in terms of how many processes are created, when they are created, and what code paths they run).

So while in OO you sort of have one design phase where you try to model both the structure of your code and the expected runtime profile of it, with Elixir you think about these things separately.

Which is why, while Elixir processes are pretty much objects, they don’t direct design as the do in an OO language. This in turn is why we shouldn’t try to distribute the code in our Elixir applications in terms of processes (or: “design in a classical OO way with processes being the objects”).

For OO languages, it’s “nice” in that you can see a direct and obvious mapping between the code in a class and the runtime shape of that same thing. They are coupled. This makes it simpler to reason about. But also ties ones’ hands. And this is where mixins and multiple inheritance start popping up to help paper over the pitfalls.

So while in Elixir I usually go through a “two-phase” design process, one thinking about the modules of code and one about the runtime properties (processes) that leverage that code, I am free to do what is best for each without compromise. A downside is that there is no mapping staring at you in the face from your modules of code as to what the runtime shape of things will be.

Currently Elixir requires us to build a separate, and largely implicit, mental model of the runtime shape of our application. It has occurred to me on a few occasions that it would be nice to have a more explicit set of language features for the definition and management of processes, syntatic sugar over the usual mix of supervisors, gen servers, spawn/spawn_link/spawn_monitor, process pools, registrations, etc. but which may grant us an “eagle eye’s view” of the runtime shape of our applications.

At one point I even drafted a small spec for what such a process definition DSL might look like … I think it would doable and useful, though I doubt it would be able to entirely replace the current APIs we use in all situations … but often 95% is better than 0% in these cases.

astery · June 25, 2017, 10:59am

@aseigo, just curious, can you give a link to the “small spec”?

Woody88 · June 25, 2017, 8:02pm

[quote=“peerreynders, post:8, topic:6342, full:true”]

Woody88:

To spawn, or not to spawn? is quite specific:

Use functions and modules to separate thought concerns.

Use processes to separate runtime concerns.

Do not use processes (not even agents) to separate thought concerns.

Now initially I would have been tempted to use “design concerns” instead of “thought concerns” - but that would have been a grave mistake because in Erlang/Elixir “runtime concerns” are also “design concerns” - less so because of performance but moreso because of fault-tolerance. So my interpretation of “thought concerns” is the typical compartmentalization of functionality that we engage in to make things “easier to reason about” while simultaneously structuring the application to align itself with the user’s domain. So deck, hand, round align with the domain concepts while round_server and notifier are primarily concerned with orchestrating how the rounds unfold over time.

First I would like to say thank you for taking the time to write everything that you wrote , that also includes everyone else. I wanted to reply to multiple people with this single reply but not sure how…

Now in regards to the principle of separating thought concerns I think that I get the the picture, but where I kind of get lost is when you need to interact with a database or an entity that holds your state for example processes, ets, mnesia, etc… Usually one would tend to write their module based on their data storage format. For example a User module would probably be defined as struct so you could give it the id, name, and other attributes. But with the approach of the @sasajuric and from reading everyone else reply - it seems like the the system itself and whatever data storage you use(even memory) would be a complete different entity, if I may say a complete different system that would interact with your main system. I’m I wrong ?

So in other words one would have to create a separate client that would handle retrieving data in the storage and transform it in the format the main system module requires it to be in to be able to continue processing the data.

So the picture that I have is for example: System <— Client ----> Data storage .
(I’m very visual so pictures, some kind of representation or code are very helpful.)

The reason why I am mentioning data storage is because I still feel uneasy with the part that in fp you don’t have some kind of object/class to hold your state. On contrary in fp one would use a function that would either have the whole state as an argument or some separate module such as a process that would hold the state.

peerreynders · June 25, 2017, 9:22pm

This is why in DDD there is the concept of a repository:

Repository: A mechanism for encapsulating storage, retrieval, and search behavior which emulates a collection of objects.

So while everybody is talking about Ecto’s “repo” - in the DDD sense that is not your application’s repository because Ecto’s repo is leaking all sorts of implementation details about the storage technology (RDBMS and schema details) that are irrelevant to the application’s domain. The application repository would have to wrap Ecto’s repo and “talk” to the rest of the application only in terms of the domain’s types.

Now for smaller projects this effort may not be worth it.

aseigo · June 25, 2017, 9:30pm

At the end of the day, those are actually the same things with Elixir. In the process there is always a function that is currently executing (possibly waiting on a message …), and the state is indeed handed from one function to the next in a state-holding process.

The reason the state is not visible from the “outside” is that the state is held in a function waiting on messages to arrive … when a message arrives (without the state, of course), the receiving function now has the “next thing to do” from the message and the state that it received when it started.

This is why GenServer callbacks (for instance) all take and return a state parameter! So processes are a nice way to wrap those functions-bearing-state in an easy to use package.

So now this:

SQL databases, ets and mnesia are about “durable” storage → storage that exists outside of the rest of the program and may live a different life cycle … (ok, there are details in there i am glossing over, but let’s roll with that simplification for now).

Data in processes are just parameters in functions (representing state) that are waiting to be used when a message is received.

Yes, they are separate entities … but you can view them all as being “things we wrap in functions”. So for a SQL database we write Ecto queries (or maybe go straight to postgrex if we’re feeling like that ) inside of functions and write other functions that know how to manipulate those queries and their results (perhaps with structs). Those functions create the surface with which you call out into these “separate systems”, and when using those functions your program can generally ignore the “detail” that they are separate from your program.

I think I may understand where you feel the flow is dropping out from under your feet: with OO there is this make-believe world of data being managed in very solid things (called “objects” even!), but it’s really just functions wrapped behind functions with syntactic sugar that keeps the data<->functions association for you (in that sense, Elixir processes are analogous to objects); with structural languages like C, there are globals and the endless passing of pointers … but that also is just data being handled by functions.

There is no extra magic, or lack thereof, in functional programming in that regard. It just doesn’t try to hide the fact that data and functions are loosely coupled, if at all … it goes so far as to tell us that functions are also just data, which is obvious when you think about the fact that they are just bits in memory that the computer reads through to know what instructions to execute … bits are bits, right?

When I came to functional programming, I also felt a bit of vertigo over where was my data going to find its way through the program … all the facades I was used to relying on as mental abstractions for the reality were gone. But then you learn not to worry about it … it’s just data, some of which are executable functions

… and that’s true in EVERY language … they just have different ways of exposing it, and different rules for interacting with the data that is your program.