PragDave’s new Component library - his preferred way of building apps

OvermindDL1 · January 17, 2019, 10:16pm

http://erlang.org/pipermail/erlang-questions/2014-November/081843.html

erts_use_sender_punish is a flag hard coded to 1 (true), when it is true
a process sending messages to another process will have its reduction
count reduced by the number of messages in the receivers message queue
multiplied by four. Sending messages to processes with zero messages in
the queue is free in terms of reductions, but sending messages to load
queues is very expensive and will lead the scheduler to context switch
to another process more often.

I.E. reductions are reduced based on the mailbox size of the receiver process.

It’s not designed to ‘save’ a system, just allow it to handle load better. If you runaway by spamming messages that the receiver can’t go through fast enough then you will still eventually run out of memory, you should use call if that’s possible, thus causing a synchronization point, however not even call will save you if a process is being spawned (like a web request) to make that call, you can still run out of memory that way, but backpressure will still ‘slow it down’ so it won’t happen quite as fast (and in fact in these short-lived process cases then cast could be better as the process could die sooner, saving memory). The BEAM’s built-in backpressure works best with many sending to few processes many times.

EDIT: Also, as a side note, using call instead of cast is not backpressure, it is forcing a queue size of 1 instead of N between those two processes, which only helps when one process is sending a lot of messages to another, not when a lot of processes are sending a message to one (which it can hurt).

EDIT2: Oh hey, OTP 19 added a flag (max_heap_size) to set the maximum size of the entire process, including its mailbox, before OTP will kill the process so you can set that to prevent entire VM death.

EDIT3: You know, the new :atomics module would make it quite nice to make a wrapper to pretend the mailbox of a process is self-limitating, or perhaps pretend it’s like a bounded queue so the caller can do ‘something else’ if it is getting overloaded… This could be a good and cheap pattern for pretending that some messages are allowed to be lossy in certain situations… ^.^

EDIT4: Oh hey, @ferd has a webpage about all kinds of different backpressure patterns and ways to handle overload via a variety of different ways for a variety of different circumstances, it looks pretty comprehensive.
https://ferd.ca/handling-overload.html

sasajuric · January 17, 2019, 10:54pm

Glancing at the code of DirWalker, I see no reasons for it to be a GenServer. It could just as well be a plain functional module and still be able to keep state, perform read-aheads, and provide streaming. I didn’t thoroughly read the code, but I don’t see any current API concern which would justify using a separate process. If I was writing something like this, I’d make it a plain functional module.

I feel a similar way about your last post about state. You seem to argue that using a separate process for the game separates concerns and improves encapsulation. That’s certainly true, but I think that in your example you can get those same benefits by using plain functional abstractions (i.e. functions and modules). For a detailed discussion on this you can check out my To spawn, or not to spawn? post.

The gist of that long post is that I believe that design concerns, such as encapsulation, separation of concerns, or the shape of the API, do not play any weight in deciding whether something should run in a separate process or not. The reasons for spawning another process are IMO almost (if not completely) mechanical. I reach for a process if I want runtime benefits, such as concurrency and/or fault-tolerance. If I don’t get any of these benefits, then I stick with plain functions/modules.

JEG2 · January 17, 2019, 11:11pm

Wow. Thanks for the lesson!

ericmj · January 17, 2019, 11:58pm

This is no longer true since OTP 21, because it wasn’t a good enough backpressure mechanism.

It is one form of backpressure, it is arguably not enough in some circumstances though.

OvermindDL1 · January 17, 2019, 11:59pm

Ah hah, I thought I recalled reading somewhere about a major change about it, thanks!

Yep, there are many many forms for many many different situations, that’s why I like that ferd link I just found, it has a lot of good information there.

pragdave · January 18, 2019, 1:02am

I did the same. Then, as part of my anti-cargoculting ethic, I tried switching over to making stuff async.

I found it more difficult at first, because I’ve spend a lifetime writing procedures. But once I got some simple patterns into my head, I find it’s actually quite fun.

My current mental model for programming is based to some extent on the idea that 10 years from now we’ll all be working on IoT stuff. In that environment it’s all one-way, transient, pubsub-y and so on. By practicing doing stuff async now, I’m hoping to start building the reflexes that’ll let me think about that future.

pragdave · January 18, 2019, 1:19am

In agree 100% about that particular example.

The problem with examples is coming up with something that demonstrates the stuff you want to demonstrate without also having tons of extra code. In this case, I wanted to demo the actual library.

Couldn’t agree more. And in a case such as Task.async_stream, where the state is all effectively opaque to the outside world, it’s definitely a good idea.

But… the other side of the coin is that when a module lets someone else handle its state, it’s actually exposing its implementation to the outside world. Now you can always say “I’m passing you my state, but your code should treat it as an opaque type.” But my experience is that folks aways want to open the kimono.

A great example of this is the way the phoenix_ecto populates FormData from an ecto ChangeSet:

Here’s the code in phoenix_ecto/html:

    def to_form(changeset, opts) do
      %{params: params, data: data} = changeset
      ...

Here we’re initializing fields in one module from fields in the state of an entirely different module: one that’s not even in the same top-level application. We’re coupling to the type of data (must be a map), the names of fields in that map, and the total internal structure of those fields. Make an internal change in ecto, and you could easily break phoenix_ecto.

Now that particular thing isn’t going to happen: it’s a small and very clever team working on both.

But, in the general case, I’d much, much rather see a guarantee of information hiding over the trivial cost of spawning another process.

JEG2 · January 18, 2019, 3:10am

The CPU cost is very low, but I personally find the cognitive load added to the developers to be significant. It goes from “you need to understand X” to “you need to understand a concurrent X,” doesn’t it?

pragdave · January 18, 2019, 3:27am

Only if it’s concurrent

A two_way function acts identically to a regular function call, but with added state.

sasajuric · January 18, 2019, 7:46am

The changeset type is publicly documented, and so the data structure is part of the API. Changing the structure breaks the API, so it’s not an internal change.

Using struct matching would made this a bit more obvious:

%Ecto.Changeset{params: params, data: data} = changeset

You don’t get that even with GenServer. Peeking into the state and changing it is as easy as invoking :sys.get_state and :sys.replace_state.

I don’t think that cost is trivial at all.

First, by moving from functions to processes, you’re changing the paradigm. All of the sudden, instead of game = Game.something(game, ...) we’re just writing :ok = Game.something(game, ...). In other words we’re moving from immutable functional to side-effectful imperative.

This style now becomes similar to OO, except it has some extra issues. For example, it becomes easier to leak processes. If e.g. the game process traps exits, it won’t be immediately taken down when it’s parent terminates. If there’s some bug, the child might even linger on forever.

Another issue is that debugging becomes harder. If one process crashes, we’ll get cascading crashes (assuming no one in hierarchy is trapping exits), and that’s going to add a lot of log noise. If a crash happens in a “one-way” handler, the stack trace won’t be complete (i.e. you won’t be able to tell how did you arrive to that one-way handler). Tracing multiple processes is going to be harder than tracing a single one.

Going further, if an abstraction is process-based we can’t implement protocols for it. In a hierarchy of process-based entities, we can’t just invoke Jason.encode or :erlang.term_to_binary on the root element. Even ad-hoc debugging with IO.inspect(game) becomes unusable.

Yet another problem is passing the abstraction to other processes. If we pass data to another process, we make a copy, and two processes can safely work on the data concurrently. In contrast, when a pid is passed, it’s almost like pass-by-reference. So again we encounter a paradigmatic switch.

Moving to the performance realm, spawning a process and communicating with it is “cheap” (by some hand-wavy definition of cheap), but it’s still much more expensive than not using a process. In a tight loop where an abstraction is frequently accessed, the performance penalty might become really significant.

You might also experience weird timeouts here. Imagine you pass a million pieces of data to the abstraction in a one-way fashion (you seem to prefer that for mutations), and immediately after that invoke a getter. This might easily lead to a five-seconds timeout error.

Another problem is memory usage. A process overhead is about few kilobytes - an order of magnitude more compared to an empty map or a struct. So if we start creating a bunch of process based entities, and do this for every web request, the memory usage will skyrocket even in a moderately loaded system handling a few hundred or few thousand connected users.

This is further exacerbated by the fact that the data is copied across process boundaries. So in a two-way invocation of Game.do_something(some_data) we keep two copies of the data in memory, neither of which is garbage. Again, it’s usually not a problem, but overuse processes, multiply by the number of connected users, and you might find yourself in trouble.

Don’t get me wrong, I’m a huge fan of processes. Heck, the main focus of my book is on processes, and even in my aforementioned article I’ve used them extensively. Used judiciously, they can do wonders for our systems. But misused, they will bring a lot of harm with little to no good. I’m speaking from experience here, because I’ve spent my first few years of Erlang programming using processes for encapsulation (mostly influenced by my own OO heritage), and I’ve bumped into most of the issues mentioned above.

So tl;dr - no, I wouldn’t say that the cost of spawning a process is trivial

LostKobrakai · January 18, 2019, 8:03am

Isn’t that how dependencies should work out? Have two applications, which want to work together, create an interface for how to work together (Phoenix.HTML.FormData / phoenix_html) and have a higher level application depending on both and implementing the interface (phoenix_ecto).

The type of a changeset is documented and therefore part of the public API, so I don’t really see a reason not to depend on it. Having accessors for those fields in Ecto.Changeset would have the benefit of ecto having more control over how data is extracted out of their struct and how they can refactor without breaking changes, but they seem to be certain that it won’t change (in a breaking fashion) in the future.

On the other hand MapSet is a map internally, but the community and the docs are very adamant in making sure that people do not depend on that implementation, especially as the implementation indeed was changed once already.

The benefit of having public types of data is that people can customize and add functionality running on the same datatype, while with opaque/hidden ones you’re limited by what the you get out of the box by the implementor. Like if you’re missing an accessor to a part of the data then you’re out of luck.

If one still pokes into opaque types it’s on their own to maintain that implementation detail. I don’t feel that artificially moving computation into processes for essentially making the data private would do us any good. See @sasajuric for the reasons, which he added while I wrote this.

It feels like adding processes for data hiding, but introducing all the issues of distributed computation. Sure locally we don’t need to think to much about it as there’s no network involved, but it’s essentially that on the beam. The simplest example might even be timeouts. As soon as message passing is involved I need to be aware of how long a certain computation might take and how long I’d like to wait for it.

shifters98 · January 18, 2019, 9:28am

Thanks for the example dave!

Interesting discussion being generated!

sztosz · January 18, 2019, 9:47am

I really like the library, although I didn’t have chance to use it. I have my doubts that hiding so much implementation and adding layer of abstraction is making it a bit “magical” but the trade-off might be worth it.

And I really enjoy reading discussion about using functions over processes etc. It’s so insightful!

jeremyjh · January 18, 2019, 4:45pm

OTP processes are not intended as a mechanism for encapsulation and information hiding, and they are not free. I thought you agreed with this from your response in the related issue I opened. I do think encapsulation is a bit of an issue with structs. If you want to encapsulate data, one possibility is to use private records. Of course people can still fiddle with it, but at least it makes the intent very clear. You can also use @opaque types if you are using dialyzer.

OvermindDL1 · January 18, 2019, 5:06pm

Proper data hiding in a functional language should be handled by the typing system. Hopefully we get a decent one ‘on top of’ elixir someday (wish I had the time to write it!).

Otherwise we have as much data hiding as Python has, which is to say not really any. Even Dialyzer is so optional that most people don’t use it.

pragdave · January 18, 2019, 5:33pm

Sure, but using those is clearly not sanctioned by the owner of the original server, whereas passing out state is. It’s a question of intent.

Yes, I agree that’s something to think about. And that is one of the reasons I’m exploring this. I’m trying not to be doctrinal when I play with all this stuff: OO vs functional etc are just labels, and often labels divide opinion more that they help.

One thing that motivated all this thinking was Phoenix itself. Think about plug. Is there a less functional piece of code in the Elixir world? Side effects, hidden state, the magic conn variable? And routes, with the side effect of creating an entire module of code.

Initially I really disliked this .It went against everything I thought I knew about functional design. But the more that I thought about it, they more I realized that all that design stuff wasn’t more important than the simple model that plug represented. Would I prefer it to be “pure”. Sure. But what’s probably more important is that it presents a model that people can work with.

So, based on that kind of thinking, I’m exploring different ways of thinking about design in Elixir. It has the advantage of not being a fully functional language, so it allows us some wriggle room in which to experiment. I’m seeing what happens if we relax some of the “rules” that everyone says are necessary.

You say my approach is imperative. It might be, in the small. But my mid-term objective isn probably closer to a continuation-passing style, or Joe’s idea of process pipelines, or event sourcing.

The experiment is to design code as sets of cooperating but relatively independent processes. Each process acts as a kind of reducer, taking an input and the current state and producing and output and an updated state.

When I first tried coding like this, every function was stateless: it received input and state as arguments and returned an output and updated state.

But if you try that for any real world program, you end up with a big ball of mud: the state becomes unwieldy. So instead I’m currently trying the opposite: each function (I guess really, each server) manages its own state, and that state is always private.

Do you lose some of the benefits of other approaches? Sure. You mentioned error reporting, and I agree that’s a major issue. (But that’s as much the fault of the horrible error reporting that the Beam does. I think that some effort spent there would make life easier regardless of where state is held.)

But my approach is not to focus on the stuff that we might lose, but instead to look at what could be gained. I understand all the negatives, all the reasons that “this is how we do things.” Instead, I’m excited by the idea that there might be changes that end up making it easier to write code in this increasingly complex world.

OvermindDL1 · January 18, 2019, 5:40pm

Isn’t Plug exceedingly functional? It’s a simple pipeline that passes in a Plug.Conn and passes out a new version of it (not editing the original, since it can’t), with no side effects, hidden state, only magic is the macro pipeline builder (plug Blah instead of just doing |> Blah.call(unquote(Macro.escape(Blah.init([]))))), which is quite easy to understand overall. The only impure parts about it is when the socket is communicated with like getting the request body or sending the result, which are done using messages to the socket process, but Plug itself is entirely pure otherwise. o.O

Functional doesn’t mean pure, they usually go hand in hand in most cases because purity makes functions trivial, but it’s not a requirement. The BEAM breaks purity because you can pass messages and set process flags and data. OCaml breaks purity with ref’s (although when the Algebraic Effects get pulled in then that will become pure, but right now it’s not), as well as an escape hatch for doing low level assembly calls. Etc… etc… But they are both very Functional, generally some of the prime examples of Functional Languages.

You can always set process flags/data at each call to make it more obvious where some data came from as well as decorating the state immediately with some ‘recently accessed’ info or so, there are patterns.

pragdave · January 18, 2019, 5:53pm

You’re making my point for me. Plug.Builder is remarkably not functional, but no one cares because it is easy to understand.

That’s exactly what I said.

OvermindDL1 · January 18, 2019, 6:17pm

But it is functional though.

Plug.Builder is a set of compile-time macro’s that transform ast to ast. And all calls made within it are also functional.

pragdave · January 18, 2019, 6:28pm

Sorry, but anything that keeps state in module attributes is not functional. The use of @before_compile is a pretty good hint that we’re storing state somewhere to be used later.

The Plug DSL is decidedly stateful…