Should we adopt Dave's way of building applications as a series of components? (Dave's talk has now been added!)

discussion
microservices
elixir-for-programmers-course
replaceable-component-architecture

#49

I am the opposite direction. I very much prefer when applications have been tagged as being “library applications” (http://erlang.org/doc/design_principles/applications.html). Then I know it doesn’t need to be part of the supervision tree and I can design my application around it

They are not exactly the same. A library application is not part of a supervision tree so it matters quite a bit to the VM. A library application only has to be loaded, not started and stopped. This means it it doesn’t have any callbacks the VM needs to wait for and it is not part of the application tree meaning hot code loading is much easier.

Hiding complexity to make it easier to use. Classic easy vs simple example. I understand that it is a trade-off but I don’t think it is as clear cut as you say. Making people learn how things actually work underneath also has benefits. Can these things be confusing and hard to learn? Yes, but if you have this knowledge you are likely able to make better decision further down the line.


#50

This is about how you define a contract and how you handle collaborators, there is nothing specific about being one or multiple files. Give me any code that you think it is testable because of multiple files and I will present you a direct equivalent using a single modules in a single file.

It is still likely that the sum of the churn of the small files will be equal to the churn of the big file. I.e. you are likely still changing the code as frequently, except that now you spread the churn around.

I was making an absurd example on how “hard to test” is not a property of multiple files but a property of the code. I understand if you believe multiple files makes you write more testable code, but it is a by-product and not a direct consequence of multiple files. YMMV.

Many of the functions in the Enum module are small and do belong together. I personally don’t see the appeal in following such strict rule for breaking every function into a separate file.

Although, as I said earlier, I agree this is a matter of taste. There are, however, a couple things that I can prove would be a downside when splitting the Enum module:

  1. Modules require more space in memory and disk and more time to compile and load - if the whole Elixir stdlib was broken up this way, booting your Elixir apps would take longer and use more memory upfront

  2. Elixir/Erlang can’t inline code across modules - so it would also limit optimizations we could apply

  3. Calling a remote function is more expensive than a local one

It is unlikely that those would matter in your application, except maybe for compilation time, but for building the language itself, splitting every function into multiple modules would directly affect the usability of Elixir as a language, especially functions that may be called under a tight loop such as the ones in Enum.


#51

That’s a great point right there. Sometimes, library having a supervision tree is just an implementation detail which users should not be aware of.

There’s a good rule of thumb “mutable on the inside, immutable on the outside” which states that as long as the public functions behave like they are operating on immutable values, I shouldn’t care if inside they are mutating some temporary variables - it’s an implementation detail. The case is similar for library/application separation. As long as some piece of code behaves like a library, should we always care that there is some processes running there? I know - it depends. Sometimes it is important. But forcing this distinction on developers could lead to just blindly following the rule without much consideration.


#52

Please see my earlier reply but looking at the supervision tree is not enough to assert this. Maybe the application changes some global state elsewhere. Or maybe the application only has a supervision tree because it simply wants to spawn tasks/processes.

While you should not care about the callback part, the hot code reloading is a good point. But, as I said above, the tree is not enough to determinate any code reloading issues. For example, you would still to double check if an application without supervision tree is using the app env.

The goal of an application is exactly to hide those concerns because the huge majority of times you shouldn’t care. Knowing how things work is always going to be helpful but it shouldn’t be used as a reason to remove abstractions. For example, learning how maps are laid out in memory and how the VM loads beam modules has helped me in certain situations but I am not advocating for manually managing small/large maps or for loading modules by hand.


#53

@josevalim what is your opinion on GenServer and the Configuration module?

Definitely I believe that we could add another layer of abstraction on top of the GenServer specially around handle_* functions

I think change a little bit how things works following his proposal (despite the names) it will help a lot to develop the future code.


#54

I think it is important to clarify that, and I think Dave would agree, is that GenServer is not wrong. Rather we are wrong in using GenServer to build higher level components, often with domain/business logic, and we need higher level abstractions to solve those cases. When building low-level components, the distinction between client, server, and the low-level callbacks are important, but in your application domain you want to get rid of the bureaucracy and put the focus on the domain code. There are also other libraries attempting to tackle this particular issue.

I agree with Dave in many of the problems he raised: yes, we need to improve configuration (there was a whole other thread about it in the forum). Yes, it would be helpful to know if an application is stateful or is a singleton (and as I said in this thread, I don’t think looking at the supervision tree gives the complete answer). As Dave said, there is a lot to explore. :slight_smile:


#55

I think we can figure out other things first before we try to figure out this one. Seeing Configuration as just another external OTP app rather than a black box component will be nice.

No doubt about it.

As this is something that requires a lot of macros, would you considerate to bring this to the Core instead of letting the community to rely on those 3rd libraries?


#56

Perhaps I am misunderstanding something but a library application does not have a supervision tree. The .app file doesn’t contain a mod key so there is no tree being started. I guess it could still spawn processes but generally just for short lived stuff which doesn’t require supervision.

Reading through the thread again (because I have a feeling I have misunderstood something) and I guess what tripped me was this:

And I likely misinterpret this. Because isn’t this is exactly what is being done today? The system differentiate between applications with a supervision tree and applications without. I.e mix new for a library application and mix new --sup for an application with a supervision tree so the user must already understand the difference.


#57

I expect @michalmuskala did mean the user of an application (e.g. me using ecto) does not need to know if the application does start processes to do it’s job or not. The implementor of an application (e.g. me creating an api client) surely does need to know how it works.


#58

Yes, that is understood. But as you are writing applications you already know the distinction so why would you necessarily hide the fact? To me its almost as knowing the difference between an Operating System service and a command line utility.


#59

I don’t think the solution actually needs to leverage macros. I would like to see such venues explored as well.

I believe @michalmuskala was talking about the usage perspective. As user of application xyz, we shouldn’t care if it has a supervision tree or not, especially because the presence or not of a supervision tree does not tell much about an application. The authors of libraries definitely need to know the difference.


#60

As usual, Dave makes some great and provocative points in his talk. There’s definitely a lot of food for thought here.

However, I don’t necessarily agree with all the ideas. In particular, I have a slightly different opinion on GenServers. With GenServer, we end up with the client-side and the server-side code in the same module. Personally, I believe that this is a natural coupling. The client-side code delivers messages to the server-side code, optionally waiting for the response. Splitting these out makes no sense to me, because both sides of the code are revolving around a well defined single concern: client-server communication.

However, I agree that GenServers tend to become bloated. In most cases I’ve seen, this happens because we tend to embed complex server-side logic into our handlers, and thus we end up with two complex concerns in the same module.

Another symptom I’ve noticed is when we mix multiple complex server-side concerns into callback functions. So for example, if we manage some complex pure functional state, and mix this with some time-related concerns such as timeouts, backoff retries, handling :EXIT and :DOWN messages, we may end up with multiple concerns coupled together.

Sometimes, the interface functions have to do more complex transformation of the input parameters to message being delivered. That’s yet another concern mixed in the same module.

I think that these other concerns should be carved out if they become too big. So one thing I frequently do is implement a complex state in a separate module. Many times, this separation helps testing, because I can test a complex state without needing a GenServer. Also, I feel that it improves code readability, because I can analyze the state code without worrying about GenServer concerns (and vice versa).

As an example, you can take a look at my parent library. Here, I’ve extracted the state management into a separate module. I did this upfront after some initial premeditation. In fact, the functional part was the first thing I started working on.

I should note though that I only extract the state if it “feels too large”. The Stack server used in the talk is an example of where I’d just leave the state concern in the same module, as it’s very small.

Either way, when it comes to communicating, I personally feel that this is one concern which belongs to a single module. Packing the data into a message, unpacking it, responding, and waiting for the response are all part of the same job which is client-server communication. Splitting that feels unnatural and confusing to me.


#61

This is simply not true. A library application can have dependencies which have supervision trees and need to be started and stopped. This means you need to start and stop every application to ensure all the dependencies are started correctly, regardless of that application’s own supervision tree. It’s not enough to just load it. This is handled by VM just fine - applications that don’t have a supervision tree can be started and stopped like anything else - and this is on purpose. The abstraction of an application does not care if there is or not a supervision tree - from outside of that application it’s completely opaque and irrelevant.

One example: The OAuth2 application does not have a supervision tree, but it depends on hackney that does have a supervision tree. This means it has to be started.

I’ll repeat again that the distinction is not useful in any way, and might even be harmful - for example you could look at the OAuth2 application, see it doesn’t have a supervision tree and assume you don’t have to start anything, which would probably just lead to strange errors about dead processes or missing ets tables. And people who would face those issues would be beginners, not advanced users. That’s why I think introducing this distinction would be more harmful than helpful.

Could you say what benefit exactly does it give you to know if Ecto or Phoenix, for example, starts a supervision tree or not? How would you interacting with them change based on that knowledge?


#62

I’d even ask: “Would anyone care if a previously stateless application introduced some process or supervision tree, if the external API stays the same?”. Take an imaginary Fibonacci application. It would just get faster if it introduced a cache instead of calculating from the beginning each time again.


#63

you don’t even need a supervision tree to be stateful. any “library” that uses an ets table is stateful without needing a supervision tree


#64

Excuse the longish post - I too watched Dave’s talk and it brings up some important points.

In this posting I want to talk about composabilty.

To me the unit of composabilty should be the process and NOT functions - to be clearer, of course functions should be composable but this problem is nicely solved ( F1 |> F2 |> F3 |> … :slight_smile:

I think the gold standard for composabily were unix pipes, and the beautiful way they could be composed in the shell.

a | b | c | d ...

The principle design idea was “the output of my program should be the input of your program”

This allows a b and c to all be written in different languages - but this has a few nasty problems:

  1. text flow across the boundaries
    so there is a lot of extra parsing and serialising involved
  2. if something in the middle fails (say c) there is no nice way to close the pipeline down

One excellent feature is that (say) b does not know who sends it inputs and does not know to whom the outputs should be sent.

Now consider Erlang - one of the above problems gets solved - text does not flow across the boundaries but Erlang messages. X ! M sends a message, receive M -> … end receives a message so no parsing and serialising is involved and it’s very efficient.

Processes do not know where they get messages from (=good) but have to know where they send messages to (=bad).

A better way would be to use ports, call them in1, in2, in3 for inputs and out1, out2, out3 for outputs and control1, control1 for controls

We can now make a component - assume a process x that has an input in1 which doubles its input and sends the result to out1 - this is easy to write in erlang

   loop() ->
      receive 
          {in1, X} ->
              send(out1, 2*X),
              loop()
      end

Clever people can write this in Elixir as well :slight_smile:

All the component knows how to do is turn numbers on the in1 port into output on out1 but it does not know where in1 and out1 are.

Now we have to wire things up.

The pipe syntax X | Y | Z means “wire up the output of X to the input of Y” (and so on)

The important point is that a) components do not know where they get their inputs from and do not know where they send their outputs to and b) “wiring up” is NOT a part of the component.

Elixir has a great method for wiring up functions X |> Y |> Z but the X,Y’s and Z’s are functions
NOT processes.

We can imagine components to be processes with inputs (in1, in2, in3, …) outputs (out1, out2, …) control ports (control1, control2, …) and error ports (error1, error2,…) - what are the error ports?

Error ports are for (guess what) errors - sending an atom to the in1 port of my doubling machine would result in an error message being sent to error1 (or something).

All of this can be nicely specified with some type system -

Machine M1 is

 in1 x N::integer -> out1 ! 2*N :: integer

etc. :slight_smile:

With this kind of structure software starts looking very much like hardware and we can make nice graphic tools to show how the components are wired up. The reason we do not program like this in sequential languages is because all the components MUST run in parallel (which is what chips do)

There is actually nothing new in the above - these ideas were first written down by John Paul Morrison in the early 1970’s (see https://en.wikipedia.org/wiki/Flow-based_programming) –

This (flow based programming) is one of those ideas we could (and should) revisit and cast into a modern form.

All of this means a bit of a re-think since most frameworks are structured on top of essentially sequential platforms.

Really we should be thinking in terms of “black boxes that send and receive messages” and “how to wire up the black boxes” and NOT functions with inputs and outputs, the latter problem is solved.

Think - “messages between components” and “what messages do I need in my protocol” NOT “input and output types” and “what functions and modules do I need”

(I called this Concurrency Oriented Programming a while back - but the term did not seem to latch on :slight_smile:

As Alan Kay said “the big idea is messaging”

Cheers

/Joe


#65

Thanks for keeping driving this point home. Please crosspost to the C++ and Java worlds :wink:

One of the things that struck me when picking up Elixir is that in Elixir, and as far as I can tell in Erlang as well, functions seem to be seen as being more important as messages and messages are just low level implementation details, not the API - the API is always functions.

So, a gen_server will have “client” methods, and tests will typically exercise these. Client methods are just simple facades to hide the implementation detail of the actual messages flowing over the wire, which in gen_server's case are completely hidden from view.

I’ve always found it odd (as a Smalltalker, I’m more than a little bit influenced by what Kay has to say…), and it also makes things harder to test and forces earlier-binding (late binding is another of Kay’s “essential things around OO”); a message is easy to construct at run-time and send to a random port/process, but a function call is resolved at compile time.

Having a truly “message first” thing on top of BEAM is a little experiment that still needs to make it to the top of my todo list; I think it’d be extremely powerful.


#66

When I first ran into this practice I remember being (extremely?) disappointed. From a more traditional background it just seems wrong to have both the client code and the server code in the same module because it just seems right to separate client and server code - irrespective of the notion of “keeping things together that change together”.

And on a more pedantic level I personally felt that the process request and response messages were the “true server API/contract” - not the convenience functions. In fact I felt that the convenience functions were potentially dangerous as they tend to hide the fact that the process boundary is being crossed.

Since then I’ve adopted a “when in Rome, do as the Roman’s do” attitude toward these sort of matters and withhold criticism until such time I have a better understanding of how things came to be.

Frankly this is where the mental model of an OTP application as a “component” completely breaks down for me. The concept suggests that a.) there can be multiple independent instances, b.) each with their own independent, isolated state. From what I understand that isn’t possible for an OTP application on a single node. That would imply that an OTP application would have to manage various “state instances” for its various client applications internally (increasing its internal complexity) or that each client application would have to “adopt” the necessary additional state (into it’s own) that is then simply managed by the library code. I suspect that the second approach is more scalable - but a library managing external state sounds nothing like a component to me.


#67

I remember that it took me awhile to accept this. It’s worth remembering that, running on BEAM, Elixir is specific in that the client and the server are running in the same OS process, and are a part of the same code base. So while it perhaps makes sense to separate client and the server in other technologies, the same isn’t necessarily true in BEAM.

I agree that messages are the true contract. However, I think the API functions are very useful, because they keep that contract in one place. Imagine if you had {:my_req, foo, bar} sprinkled all over the code, and now you want to change something in the format. At this point you need to manually search, and carefully cherry pick the requests which are issued to that particular GenServer. That’s quite error prone.

With API functions, making the change is localized. Granted, if you change a function signature, you need to update the client sites too, but because invocation is wrapped by module functions, compiler warnings and dialyzer can help you there. With GenServer functions, there’s no such help at all.


#68

It seems like you are describing a system very similar to Flow. In what ways is your proposed ‘Concurrency Oriented Programming’ different?