Structuring an OTP application

Matt · June 21, 2018, 5:00am

I have a question about the structure of an OTP application. Given a fictitious employee time tracking application for my example, I am trying to reason which application layout would make the most sense using OTP.

Note: This is just an example fictiicous application, nothing I am working on. Just an example to better understand OTP orchestration.

The fictitious employee time tracking application will have employees, schedules, time tracking (clock in/clock out), vacation requests, and overtime submissions. Following OTP, each one should be a process, gen_server for example. Which application makes the most sense:

Option Number One

Each employee is a gen_server process.
The employee process holds the employees name, id number, etc.
Each schedule is a gen_server process.
A schedule holds an employee’s schedule, which days they work, etc. The schedule process is linked to an employee, or a worker process under an employee gen_server?
Each time tracking event is a gen_server process.
This process holds a clock-in or clock-out event. Also linked to an employee, or supervised by an employee gen_server, not sure which is better. This does seem like too many processes may accumulate over time though.
Each vacation request is a gen_server process
This process holds information regarding an employees vacation request, dates, and approval information. Also linked, or supervised by an employee gen_server process.
Each overtime submission is a gen_server process
This process holds overtime information, and who may have approved it. Also linked to an employee, or supervised by an employee gen_server process.

This setup seems like it might spawn too many gen_server processes especially for the clock-in/clock-out. So would the second option be more appropriate?

Option Number Two

Each employee is a gen_server process.
The employee process holds the employees name, id number, etc.
Each schedule is a gen_server process.
A schedule holds an employee’s schedule, which days they work, etc. The schedule process is linked to an employee, or a worker process under an employee gen_server?
Each time events is a gen_server process.
This process holds a list of clock-in or click-out events. One process for each employee. So, instead of one process for each clock-in or click-out even, just one process holding a list of clock-in and click-out events.
Each vacation requests is a gen_server process
Same as the above, one process per employee, containing a list of vacation requests instead of each request being a process itself.
Each overtime submissions is a gen_server process
Same as the above, one process per employee, containing a list of overtime requests instead of each request being a process itself.

I’m very curious to hear your thoughts. Please remember this application is not real, not a web app question and not a persistence question. Just simply curious about people’s opinion on structuring processes.

idi527 · June 21, 2018, 8:32am

Have you considered using ETS? It is part of OTP after all.

ericmj · June 21, 2018, 9:14am

It’s a common misconception that you should use processes to structure your application. Instead use modules to organize your application and use processes when you need concurrency or shared state between processes.

I think you should start by considering the external interface of your application, right now it’s a black box so you can design it however you want. But if you add an HTTP interface you probably want to handle requests in parallel, so the web server will start a process per connection or request. So now you need shared state between these processes which can be solved as you proposed, but your solutions are overly complex. Why a process per employee, why not a single process for all employees, or a single process for your whole database?

You should also be careful about using processes to store information that is not transient. You don’t want to lose employee information because the employee process crashed.

outlog · June 21, 2018, 9:57am

lack of persistence, comes to mind… what happens in catastrophic events?

and then the “to spawn or not to spawn”
http://www.theerlangelist.com/article/spawn_or_not

eg. your usage of processes could be construed as OO programming with all the side effects and “spaghetti code” of that…

Wouldn’t the employees also work in different locations, and then in different departments, shifts, etc. and that is perhaps also were a genserver boundary starts to make sense…

I like this talk as a good introduction to context mapping/DDD etc.

Matt · June 21, 2018, 2:52pm

It’s just an example, to ask about processed. Persistence isn’t in question. But I know what you’re saying.

Matt · June 21, 2018, 2:55pm

My question is just about processes only. Not concerned about persistence, nor Phoenix— the application isn’t even real just curious to hear about people’s opinion about structuring processes.

Matt · June 21, 2018, 3:10pm

Perhaps I didn’t form my question correctly. This application isn’t real, my focus isn’t the interface, nor disk bases persistence. I’m not even saying such an application is practical. Just structuring an application with processes — a pure Elixir OTP application using gen servers. The persistence in my example are gen servers is all.

Is it really a misconception to structure an app with processes when you need state? Is that not the whole idea of thinking in processes?

I think you’re right, there’s pretty much no standard it’s all up in the air as to how you want to do it.

idi527 · June 21, 2018, 3:20pm

ETS tables are stored in memory, so I didn’t suggest it for persistence – I suggested it because having a single “source of truth” for state is simpler (at least in my experience) to manage than if it is distributed over multiple processes.

Plus :ets has many other helpful functions for iterating over the data which you’d probably have to reimplement yourself if you go with bare genservers.

ericmj · June 21, 2018, 3:22pm

Yes, that’s the misconception. You should start by structuring your application using modules and then add processes where they are actually needed to solve your business problems.

Matt · June 21, 2018, 3:23pm

Yes, I get it. Thank you for your suggestion. What I’m really after and it’s hard to get answers like this because we are all speed readers is I’m just curious about people’s opinion structuring an all gen_server application, essentially. One process for each element, one process holding a list of elements, etc.

Matt · June 21, 2018, 3:33pm

Yes, ets is awesome. I use it, and mnesia. I don’t know why people hate on mnesia so much. Again, my question isn’t about easy — just: if you were limited to just gens how would you structure this app. Maybe that’s how I should have asked the question

idi527 · June 21, 2018, 4:16pm

Have you read Designing for Scalability with Erlang/OTP: Implement Robust, Fault-Tolerant Systems by any chance? IIRC, in the very first chapters they suggest hiding the implementation details behind a functional interface. And for me what you are describing in OP are implementation details.

So a functional interface is where I’d start. After that I’d begin to add processes as needed. I’d probably not go “all in” with them and introduce one genserver to manage all users, another – to manage all schedules, etc.

Once they become a bottleneck, I’d switch to ets. Once a single ets table for a resource (users, schedules, etc) becomes a bottleneck, I’d start sharding the data among cpu cores (one ets table per cpu).

But the functional interface wouldn’t change. Sure, you can implement everything as a process behind this interface, but what’s the point? It would only increase complexity.

ericmj · June 21, 2018, 4:23pm

This is impossible to answer. If we take your employees as an example, using one process per employee is likely wrong because it’s hard to argue why you should have one process per employee instead of a single process holding a list of employees. You also want to store schedules, should they be stored separately from employees? Possibly, but what would you gain from that?

Eventually you can consider storing the whole database in one process. What are the benefits and downsides of this? One benefit is less complexity, one process is simpler than multiple, another is that you can more easily implement transactions because you don’t need synchronization between multiple process.

One downside can be seen as an code organization, all your database would be in one location, but this is the misconception. Code is organized with modules, so you can have multiple modules for one process. Another downside can be scalability, you may want to shard your database to multiple process so you can use all your cores. This is when you should start considering using multiple process.

So to summarize, it is hard to answer this question because you are asking how to structure an application using processes when we are saying that you shouldn’t. I would suggest that you start writing the application without processes or with a single process and then when you hit a road block that you think processes will solve come back and ask a question that is specific to your problem and less abstract and hypothetical.

Matt · June 21, 2018, 4:30pm

Thanks for your feedback. Again, I’m not building anything. Just a purely conceptual question that popped onto my head after listening to some Erlang developer talking. He does everything in gen_server processes — I’m curious as what that looks like.

Matt · June 21, 2018, 4:34pm

My Elixir applications are traditional (if we even have such a thing yet). I was listening to an Erlang programming talking about how he does not use databases and evading is a process (gen_server) to store state — essentially replacing the database.

So to summarize, it is hard to answer this question because you are asking how to structure an application using processes when we are saying that you shouldn’t.

This is what has me highly curious what that looks like translated to Elixir. I think you’re right there is no right answer really.

tty · June 21, 2018, 4:49pm

The two major models are to map data to process or task to process.

For example: a chat app would likely have a process per person (data) while a banking app a process per debit/credit transaction (task). This allows you to expand the relevant bottleneck i.e. add more transaction tasks independently.

Processes can be rich or thin. A rich process would accept a wide range of messages that call out to other modules (not necessary other processes). A person process is a good example of this.

Task processes tend to be thin and focus on specific messages and to interact with other processes.

On the larger architectural front, splitting into nodes and applications is key. Fortunately it is easy to begin either top-down or bottom-up when designing the architecture.

Matt · June 21, 2018, 4:53pm

I love it. I see the concepts you’re talking about thank you!!

jeremyjh · June 21, 2018, 10:06pm

You might reference the talk so we can be on the same page with you. My guess, is that the application he is talking about is not some CRUD application manipulating and storing business data. More likely it is a soft real-time application, such as may be used in the control plane of a network appliance, or even a manufacturing facility. There is no abstract “best way” to structure such applications - it is all very specific.

stefanchrobot · June 22, 2018, 9:40am

I think you’d be better off if you tried to solve a problem that actually needs concurrency at it’s core, like writing a reliable message queue consumer or a web crawler.

peerreynders · June 22, 2018, 3:05pm

Employees, timecard events, schedules, vacations, and overtime may be important concepts for structuring data and possibly even parts of application state but really don’t inform much in terms application behaviour.

Alan Kay (1998)

I’m sorry that I long ago coined the term “objects” for this topic because it gets many people to focus on the lesser idea. The big idea is “messaging” …

This comment highlights how people like to focus in on (static) “objects” because that is comparatively easy when in fact the application value is derived primarily by the (dynamic) “collaborations” that implement application behaviour.

Similarly in a BEAM application the ideal process structure is influenced much more heavily by the behaviour the application is meant to exhibit rather than the structure of the data it is managing or transforming.

This Erlang developer puts a different spin on it:
Lambda Days 2015 - Torben Hoffmann - Thinking like an Erlanger

Processes as the building blocks for protocols - so the big idea is designing protocols realized through communicating processes to implement application behaviour.