Should we adopt Dave's way of building applications as a series of components? (Dave's talk has now been added!)

michalmuskala · June 4, 2018, 11:56am

This is simply not true. A library application can have dependencies which have supervision trees and need to be started and stopped. This means you need to start and stop every application to ensure all the dependencies are started correctly, regardless of that application’s own supervision tree. It’s not enough to just load it. This is handled by VM just fine - applications that don’t have a supervision tree can be started and stopped like anything else - and this is on purpose. The abstraction of an application does not care if there is or not a supervision tree - from outside of that application it’s completely opaque and irrelevant.

One example: The OAuth2 application does not have a supervision tree, but it depends on hackney that does have a supervision tree. This means it has to be started.

I’ll repeat again that the distinction is not useful in any way, and might even be harmful - for example you could look at the OAuth2 application, see it doesn’t have a supervision tree and assume you don’t have to start anything, which would probably just lead to strange errors about dead processes or missing ets tables. And people who would face those issues would be beginners, not advanced users. That’s why I think introducing this distinction would be more harmful than helpful.

Could you say what benefit exactly does it give you to know if Ecto or Phoenix, for example, starts a supervision tree or not? How would you interacting with them change based on that knowledge?

LostKobrakai · June 4, 2018, 12:03pm

I’d even ask: “Would anyone care if a previously stateless application introduced some process or supervision tree, if the external API stays the same?”. Take an imaginary Fibonacci application. It would just get faster if it introduced a cache instead of calculating from the beginning each time again.

talentdeficit · June 4, 2018, 1:25pm

you don’t even need a supervision tree to be stateful. any “library” that uses an ets table is stateful without needing a supervision tree

joeerl · June 4, 2018, 4:00pm

Excuse the longish post - I too watched Dave’s talk and it brings up some important points.

In this posting I want to talk about composabilty.

To me the unit of composabilty should be the process and NOT functions - to be clearer, of course functions should be composable but this problem is nicely solved ( F1 |> F2 |> F3 |> …

I think the gold standard for composabily were unix pipes, and the beautiful way they could be composed in the shell.

a | b | c | d ...

The principle design idea was “the output of my program should be the input of your program”

This allows a b and c to all be written in different languages - but this has a few nasty problems:

text flow across the boundaries
so there is a lot of extra parsing and serialising involved
if something in the middle fails (say c) there is no nice way to close the pipeline down

One excellent feature is that (say) b does not know who sends it inputs and does not know to whom the outputs should be sent.

Now consider Erlang - one of the above problems gets solved - text does not flow across the boundaries but Erlang messages. X ! M sends a message, receive M -> … end receives a message so no parsing and serialising is involved and it’s very efficient.

Processes do not know where they get messages from (=good) but have to know where they send messages to (=bad).

A better way would be to use ports, call them in1, in2, in3 for inputs and out1, out2, out3 for outputs and control1, control1 for controls

We can now make a component - assume a process x that has an input in1 which doubles its input and sends the result to out1 - this is easy to write in erlang

   loop() ->
      receive 
          {in1, X} ->
              send(out1, 2*X),
              loop()
      end

Clever people can write this in Elixir as well

All the component knows how to do is turn numbers on the in1 port into output on out1 but it does not know where in1 and out1 are.

Now we have to wire things up.

The pipe syntax X | Y | Z means “wire up the output of X to the input of Y” (and so on)

The important point is that a) components do not know where they get their inputs from and do not know where they send their outputs to and b) “wiring up” is NOT a part of the component.

Elixir has a great method for wiring up functions X |> Y |> Z but the X,Y’s and Z’s are functions
NOT processes.

We can imagine components to be processes with inputs (in1, in2, in3, …) outputs (out1, out2, …) control ports (control1, control2, …) and error ports (error1, error2,…) - what are the error ports?

Error ports are for (guess what) errors - sending an atom to the in1 port of my doubling machine would result in an error message being sent to error1 (or something).

All of this can be nicely specified with some type system -

Machine M1 is

 in1 x N::integer -> out1 ! 2*N :: integer

etc.

With this kind of structure software starts looking very much like hardware and we can make nice graphic tools to show how the components are wired up. The reason we do not program like this in sequential languages is because all the components MUST run in parallel (which is what chips do)

There is actually nothing new in the above - these ideas were first written down by John Paul Morrison in the early 1970’s (see https://en.wikipedia.org/wiki/Flow-based_programming) –

This (flow based programming) is one of those ideas we could (and should) revisit and cast into a modern form.

All of this means a bit of a re-think since most frameworks are structured on top of essentially sequential platforms.

Really we should be thinking in terms of “black boxes that send and receive messages” and “how to wire up the black boxes” and NOT functions with inputs and outputs, the latter problem is solved.

Think - “messages between components” and “what messages do I need in my protocol” NOT “input and output types” and “what functions and modules do I need”

(I called this Concurrency Oriented Programming a while back - but the term did not seem to latch on

As Alan Kay said “the big idea is messaging”

Cheers

/Joe

cdegroot · June 4, 2018, 4:14pm

Thanks for keeping driving this point home. Please crosspost to the C++ and Java worlds

One of the things that struck me when picking up Elixir is that in Elixir, and as far as I can tell in Erlang as well, functions seem to be seen as being more important as messages and messages are just low level implementation details, not the API - the API is always functions.

So, a gen_server will have “client” methods, and tests will typically exercise these. Client methods are just simple facades to hide the implementation detail of the actual messages flowing over the wire, which in gen_server’s case are completely hidden from view.

I’ve always found it odd (as a Smalltalker, I’m more than a little bit influenced by what Kay has to say…), and it also makes things harder to test and forces earlier-binding (late binding is another of Kay’s “essential things around OO”); a message is easy to construct at run-time and send to a random port/process, but a function call is resolved at compile time.

Having a truly “message first” thing on top of BEAM is a little experiment that still needs to make it to the top of my todo list; I think it’d be extremely powerful.

peerreynders · June 4, 2018, 4:21pm

When I first ran into this practice I remember being (extremely?) disappointed. From a more traditional background it just seems wrong to have both the client code and the server code in the same module because it just seems right to separate client and server code - irrespective of the notion of “keeping things together that change together”.

And on a more pedantic level I personally felt that the process request and response messages were the “true server API/contract” - not the convenience functions. In fact I felt that the convenience functions were potentially dangerous as they tend to hide the fact that the process boundary is being crossed.

Since then I’ve adopted a “when in Rome, do as the Roman’s do” attitude toward these sort of matters and withhold criticism until such time I have a better understanding of how things came to be.

Frankly this is where the mental model of an OTP application as a “component” completely breaks down for me. The concept suggests that a.) there can be multiple independent instances, b.) each with their own independent, isolated state. From what I understand that isn’t possible for an OTP application on a single node. That would imply that an OTP application would have to manage various “state instances” for its various client applications internally (increasing its internal complexity) or that each client application would have to “adopt” the necessary additional state (into it’s own) that is then simply managed by the library code. I suspect that the second approach is more scalable - but a library managing external state sounds nothing like a component to me.

sasajuric · June 4, 2018, 5:31pm

I remember that it took me awhile to accept this. It’s worth remembering that, running on BEAM, Elixir is specific in that the client and the server are running in the same OS process, and are a part of the same code base. So while it perhaps makes sense to separate client and the server in other technologies, the same isn’t necessarily true in BEAM.

I agree that messages are the true contract. However, I think the API functions are very useful, because they keep that contract in one place. Imagine if you had {:my_req, foo, bar} sprinkled all over the code, and now you want to change something in the format. At this point you need to manually search, and carefully cherry pick the requests which are issued to that particular GenServer. That’s quite error prone.

With API functions, making the change is localized. Granted, if you change a function signature, you need to update the client sites too, but because invocation is wrapped by module functions, compiler warnings and dialyzer can help you there. With GenServer functions, there’s no such help at all.

Qqwy · June 4, 2018, 5:45pm

It seems like you are describing a system very similar to Flow. In what ways is your proposed ‘Concurrency Oriented Programming’ different?

CptnKirk · June 4, 2018, 5:52pm

I agree with this! And yet as a new Elixir user, things like the error below bite me. And this is part of Elixir core…

=ERROR REPORT==== 4-Jun-2018::10:37:34.956251 ===
** Task <0.1255.0> terminating
** Started from <0.1251.0>
** When function  == #Fun<Elixir.App.2.9790988>
**      arguments == []
** Reason for termination ==
** {{badmatch,
        {error,
            {#{'__exception__' => true,'__struct__' => 'Elixir.RuntimeError',
               message =>
                   <<"cannot use Logger, the :logger application is not running">>},
             [{'Elixir.Logger.Config','__data__',0,
                  [{file,"lib/logger/config.ex"},{line,53}]},
              {'Elixir.Logger',bare_log,3,[{file,"lib/logger.ex"},{line,614}]},
              {gen_server,init_it,2,[{file,"gen_server.erl"},{line,374}]},
              {gen_server,init_it,6,[{file,"gen_server.erl"},{line,342}]},
              {proc_lib,init_p_do_apply,3,
                  [{file,"proc_lib.erl"},{line,249}]}]}}},
    [{'Elixir.TcpSocketListener','start!',2,[]},{'Elixir.App',main,0,[]}]}

=CRASH REPORT==== 4-Jun-2018::10:37:34.956191 ===
  crasher:
    initial call: Elixir.GenStage:init/1
    pid: <0.1256.0>
    registered_name: []
    exception error: #{'__exception__' => true,
                       '__struct__' => 'Elixir.RuntimeError',
                       message =>
                           <<"cannot use Logger, the :logger application is not running">>}
      in function  'Elixir.Logger.Config':'__data__'/0 (lib/logger/config.ex, line 53)
      in call from 'Elixir.Logger':bare_log/3 (lib/logger.ex, line 614)
      in call from gen_server:init_it/2 (gen_server.erl, line 374)
      in call from gen_server:init_it/6 (gen_server.erl, line 342)
    ancestors: [<0.1255.0>,'Elixir.TaskSupervisor',<0.1252.0>,<0.1251.0>]
    message_queue_len: 0
    messages: []
    links: []
    dictionary: []
    trap_exit: false
    status: running
    heap_size: 610
    stack_size: 27
    reductions: 283
  neighbours:

As far as a new (or possibly any) developer is concerned. The library was properly imported and called with valid parameters, yet it fails. As a user of this library, I shouldn’t have to care about what it does behind the scenes of its implementation.

But in this case, if I don’t have knowledge about the implementation of this lib, its supervision strategy, and how to properly configured mix to give the library what it needs, it fails. And this is for a core logging library.

Once people develop software with this as a template for what is acceptable, or even encouraged. Then you have an ecosystem where you have no idea, given a module, how to use that module properly. Configuration and startup semantics absolutely matter, even though I wish they didn’t.

Exadra37 · June 4, 2018, 6:35pm

Do you know an article, video, book or paper that can help me to expand my knowledge on this?

I just gave the Enum example because I was expanding on your example, but I agree that in the language itself may not be the best approach.

CptnKirk · June 4, 2018, 6:36pm

This isn’t always true. In fact the BEAM facilitates better than most the ability to have the client on one node, the server on another, both running different module versions.

CptnKirk · June 4, 2018, 6:44pm

The problem with Flow in its current incarnation is that it is actually very difficult to chain custom Flow components together. That is, given GenStage components A, B, C, D, it is annoyingly verbose to attempt to chain these stages together. Also, if you do eventually do this, they don’t operate as a single Flow, they operate as multiple independent Flows stitched across async boundaries that you (the developer) need to manage externally outside of the Flow framework/library.

This is my biggest complaint about Flow. It isn’t actually a library for orchestrating data flow through a GenStage component ecosystem. I very much want such a thing to exist. And when it does, we can argue about where to put code and people will give presentations titled “GenStage is not your application”.

CptnKirk · June 4, 2018, 7:16pm

As an side, it would be great if erlang:open_port/2 had better support for this model. Trying to get an Erlang/Elixir program to be the pipe is problematic due to Ports active IO on the output side. This is problematic when the rate of the output isn’t tied directly to the rate of the input. With OTP 21 IO changes, I’m hoping that Ports can behave more like TCP sockets.

In more unrelated news, the forum just told me it would rather we had fewer longer posts, instead of a post per thought/response. The discovery seemed apropos in this thread.

joeerl · June 4, 2018, 7:25pm

Another thought.

Let me try and define what a component or service might be.

A service is a long lived persistent object with a global or local name that responds to a precise set of management messages.
The management messages determine how the service behaves.

Much of the power of the OTP system comes from the fact that the generic servers etc. all obey the same management protocol.

One can imagine making a simple generic service that responds to start/stop/move_to_new_node/checkpoint/
/hibernate/wake_up/change_code/change_data/connect_port …
Messages.

In a way this is what an erlang gen_server does - but it only has a very small number of callbacks, handle_call, handle_cast, init.

It would be very much more useful if it responded to a larger set of management messages.

Services and components should be understood in terms of the protocols they respond to - and NOT the APIs that hide the protocols.

If there is a fault in how we program Erlang and Elixir it is that we spend too much time thinking about types modules and functions and not enough time thinking about messages and process structure.

Systems are made from black boxes talking to each other - to understand the system we have to understand the messaging and the interconnections. How the internals of a black box work is of secondary importance.

The Erlang model of the world is that everything is an erlang process - and that the only way to influence anything is to send it messages, the only way to learn what things do is to receive messages. This is the purest form of OO programming.

We should build systems of named objects that can send and receive messages between the objects and be notified of failures. The objects should all speak a generic management protocol AND an object specific application protocol.

Think messages and protocols - not functions and APIs.

Cheers

/Joe

sasajuric · June 4, 2018, 7:27pm

Of course, but even then the source is a part of the same project. Granted, when you’re doing a rolling upgrade, there will be some time where a server and a client are running a different version. The same thing could also happen on a single node when doing a hot code reloading. Yes, while updating we might experience occasional periods different versions of some modules, but such is the nature of a distributed system.

But my point still stands. BEAM is the common context of our system, and even if we’re running it on multiple nodes, BEAM is still the context, and the system is defined in one project. And even if we’re using umbrellas, or other means of combining multiple OTP applications, it’s still one project having BEAM as its context.

In contrast, with most other technologies, OS is the context. In such setting, you need two OS processes if you want to run a client and a server (e.g. two microservices), and then the code ends up being separated across multiple projects. BEAM imposes no such limitations, not even in a multinode setting, and so there are less technical reasons to split the communication protocol across separate modules.

CptnKirk · June 4, 2018, 7:58pm

This sounds like a great idea. But the tooling would need to get a lot better in order for this to become a reality. I did this in a prior Akka project and it was a miserable experience. There was no way to properly document and communicate the message protocols. The compiler didn’t help in any way. Much dragons.

If Erlang/Elixir did have first class support for this, I think you’d get some good adoption. Field of Dreams – If you build it, they will come.

josevalim · June 4, 2018, 8:33pm

Can you please expand on this on a separate discussion and copy me? My initial reaction is that it should be a matter of linking from_stages and into_stages together but I am likely missing some context.

josevalim · June 4, 2018, 8:51pm

It is because the GenServer should be an implementation detail. Imagine that your GenServer is a bottleneck and now you need to replace it by a pool of GenServers… if you treat GenServer.call/cast as your actual API, you just broke all client code while the behaviour is the same! That’s why we expose our functionality through modules and functions.

Unless, of course, you actually rely purely on send/receive, and not the abstractions built on top of them.

desmond · June 4, 2018, 9:15pm

Whether or not you’re using GenServer may be an implementation detail, but callers of the code know whether or not they expect a response - that is, they implicitly understand whether the code is synchronous or asynchronous. I think it’s clearer and more explicit to just hit the server at the callpoint:

GenServer.cast(MyModule, :thing_i_want_to_do, args)

vs

MyModule.thing_i_want_to_do(args)

…not to mention removing a ton of code. It always felt like needless indirection to stick a client function (with a nearly identical name) in the middle of this operation. A refactor to many GenServers like you describe is not a common situation and would probably require a more extensive refactor of the interface.

In the case of renaming/repurposing the client function, you would almost certainly want to rename the message string as well to keep the semantics aligned. A global find-and-replace on the message string is just as effective as a find-and-replace on the function call.

cdegroot · June 4, 2018, 9:20pm

So again there seems to be a convolution of layers/concerns. GenServer (and I’m gonna type gen_server from now on because that’s the actual implementation has a bunch of messages, some of which are straight erlang messages (handle_info) and some of them are mixed with data to form a protocol around async/sync messaging and because nothing is generic and there are no helpers to build these messages, it becomes very brittle.

I guess that that is where Joe’s ideas come in. If you give me a destination for a well-specified message, maybe with some generic helper code to morph it into some standard format with metadata like reply address, then the client could just keep sending messages and be utterly oblivious of whether the target mailbox it is sending it to represents a pool or a single genserver or just a blackhole application. For the client, that is indeed an implementation detail, and it shouldn’t matter.

At the end of the day, I guess that the OO ideals of “late binding, message passing” combine to “I don’t want to have compile-time knowledge of the code I’m sending messages to and me sending messages to, say, a GenServer should be seen as a first class action instead of a hack”.