PragDave’s new Component library - his preferred way of building apps

I did the same. Then, as part of my anti-cargoculting ethic, I tried switching over to making stuff async.

I found it more difficult at first, because I’ve spend a lifetime writing procedures. But once I got some simple patterns into my head, I find it’s actually quite fun.

My current mental model for programming is based to some extent on the idea that 10 years from now we’ll all be working on IoT stuff. In that environment it’s all one-way, transient, pubsub-y and so on. By practicing doing stuff async now, I’m hoping to start building the reflexes that’ll let me think about that future.

4 Likes

In agree 100% about that particular example.

The problem with examples is coming up with something that demonstrates the stuff you want to demonstrate without also having tons of extra code. In this case, I wanted to demo the actual library.

Couldn’t agree more. And in a case such as Task.async_stream, where the state is all effectively opaque to the outside world, it’s definitely a good idea.

But… the other side of the coin is that when a module lets someone else handle its state, it’s actually exposing its implementation to the outside world. Now you can always say “I’m passing you my state, but your code should treat it as an opaque type.” But my experience is that folks aways want to open the kimono.

A great example of this is the way the phoenix_ecto populates FormData from an ecto ChangeSet:

Here’s the code in phoenix_ecto/html:

    def to_form(changeset, opts) do
      %{params: params, data: data} = changeset
      ...

Here we’re initializing fields in one module from fields in the state of an entirely different module: one that’s not even in the same top-level application. We’re coupling to the type of data (must be a map), the names of fields in that map, and the total internal structure of those fields. Make an internal change in ecto, and you could easily break phoenix_ecto.

Now that particular thing isn’t going to happen: it’s a small and very clever team working on both.

But, in the general case, I’d much, much rather see a guarantee of information hiding over the trivial cost of spawning another process.

3 Likes

The CPU cost is very low, but I personally find the cognitive load added to the developers to be significant. It goes from “you need to understand X” to “you need to understand a concurrent X,” doesn’t it?

2 Likes

Only if it’s concurrent :grinning:

A two_way function acts identically to a regular function call, but with added state.

The changeset type is publicly documented, and so the data structure is part of the API. Changing the structure breaks the API, so it’s not an internal change.

Using struct matching would made this a bit more obvious:

%Ecto.Changeset{params: params, data: data} = changeset

You don’t get that even with GenServer. Peeking into the state and changing it is as easy as invoking :sys.get_state and :sys.replace_state.

I don’t think that cost is trivial at all.

First, by moving from functions to processes, you’re changing the paradigm. All of the sudden, instead of game = Game.something(game, ...) we’re just writing :ok = Game.something(game, ...). In other words we’re moving from immutable functional to side-effectful imperative.

This style now becomes similar to OO, except it has some extra issues. For example, it becomes easier to leak processes. If e.g. the game process traps exits, it won’t be immediately taken down when it’s parent terminates. If there’s some bug, the child might even linger on forever.

Another issue is that debugging becomes harder. If one process crashes, we’ll get cascading crashes (assuming no one in hierarchy is trapping exits), and that’s going to add a lot of log noise. If a crash happens in a “one-way” handler, the stack trace won’t be complete (i.e. you won’t be able to tell how did you arrive to that one-way handler). Tracing multiple processes is going to be harder than tracing a single one.

Going further, if an abstraction is process-based we can’t implement protocols for it. In a hierarchy of process-based entities, we can’t just invoke Jason.encode or :erlang.term_to_binary on the root element. Even ad-hoc debugging with IO.inspect(game) becomes unusable.

Yet another problem is passing the abstraction to other processes. If we pass data to another process, we make a copy, and two processes can safely work on the data concurrently. In contrast, when a pid is passed, it’s almost like pass-by-reference. So again we encounter a paradigmatic switch.

Moving to the performance realm, spawning a process and communicating with it is “cheap” (by some hand-wavy definition of cheap), but it’s still much more expensive than not using a process. In a tight loop where an abstraction is frequently accessed, the performance penalty might become really significant.

You might also experience weird timeouts here. Imagine you pass a million pieces of data to the abstraction in a one-way fashion (you seem to prefer that for mutations), and immediately after that invoke a getter. This might easily lead to a five-seconds timeout error.

Another problem is memory usage. A process overhead is about few kilobytes - an order of magnitude more compared to an empty map or a struct. So if we start creating a bunch of process based entities, and do this for every web request, the memory usage will skyrocket even in a moderately loaded system handling a few hundred or few thousand connected users.

This is further exacerbated by the fact that the data is copied across process boundaries. So in a two-way invocation of Game.do_something(some_data) we keep two copies of the data in memory, neither of which is garbage. Again, it’s usually not a problem, but overuse processes, multiply by the number of connected users, and you might find yourself in trouble.

Don’t get me wrong, I’m a huge fan of processes. Heck, the main focus of my book is on processes, and even in my aforementioned article I’ve used them extensively. Used judiciously, they can do wonders for our systems. But misused, they will bring a lot of harm with little to no good. I’m speaking from experience here, because I’ve spent my first few years of Erlang programming using processes for encapsulation (mostly influenced by my own OO heritage), and I’ve bumped into most of the issues mentioned above.

So tl;dr - no, I wouldn’t say that the cost of spawning a process is trivial :slight_smile:

20 Likes

Isn’t that how dependencies should work out? Have two applications, which want to work together, create an interface for how to work together (Phoenix.HTML.FormData / phoenix_html) and have a higher level application depending on both and implementing the interface (phoenix_ecto).

The type of a changeset is documented and therefore part of the public API, so I don’t really see a reason not to depend on it. Having accessors for those fields in Ecto.Changeset would have the benefit of ecto having more control over how data is extracted out of their struct and how they can refactor without breaking changes, but they seem to be certain that it won’t change (in a breaking fashion) in the future.

On the other hand MapSet is a map internally, but the community and the docs are very adamant in making sure that people do not depend on that implementation, especially as the implementation indeed was changed once already.

The benefit of having public types of data is that people can customize and add functionality running on the same datatype, while with opaque/hidden ones you’re limited by what the you get out of the box by the implementor. Like if you’re missing an accessor to a part of the data then you’re out of luck.

If one still pokes into opaque types it’s on their own to maintain that implementation detail. I don’t feel that artificially moving computation into processes for essentially making the data private would do us any good. See @sasajuric for the reasons, which he added while I wrote this.

It feels like adding processes for data hiding, but introducing all the issues of distributed computation. Sure locally we don’t need to think to much about it as there’s no network involved, but it’s essentially that on the beam. The simplest example might even be timeouts. As soon as message passing is involved I need to be aware of how long a certain computation might take and how long I’d like to wait for it.

5 Likes

Thanks for the example dave!

Interesting discussion being generated!

I really like the library, although I didn’t have chance to use it. I have my doubts that hiding so much implementation and adding layer of abstraction is making it a bit “magical” but the trade-off might be worth it.

And I really enjoy reading discussion about using functions over processes etc. It’s so insightful!

OTP processes are not intended as a mechanism for encapsulation and information hiding, and they are not free. I thought you agreed with this from your response in the related issue I opened. I do think encapsulation is a bit of an issue with structs. If you want to encapsulate data, one possibility is to use private records. Of course people can still fiddle with it, but at least it makes the intent very clear. You can also use @opaque types if you are using dialyzer.

2 Likes

Proper data hiding in a functional language should be handled by the typing system. Hopefully we get a decent one ‘on top of’ elixir someday (wish I had the time to write it!). :slight_smile:

Otherwise we have as much data hiding as Python has, which is to say not really any. Even Dialyzer is so optional that most people don’t use it. :frowning:

2 Likes

Sure, but using those is clearly not sanctioned by the owner of the original server, whereas passing out state is. It’s a question of intent.

Yes, I agree that’s something to think about. And that is one of the reasons I’m exploring this. I’m trying not to be doctrinal when I play with all this stuff: OO vs functional etc are just labels, and often labels divide opinion more that they help.

One thing that motivated all this thinking was Phoenix itself. Think about plug. Is there a less functional piece of code in the Elixir world? Side effects, hidden state, the magic conn variable? And routes, with the side effect of creating an entire module of code.

Initially I really disliked this .It went against everything I thought I knew about functional design. But the more that I thought about it, they more I realized that all that design stuff wasn’t more important than the simple model that plug represented. Would I prefer it to be “pure”. Sure. But what’s probably more important is that it presents a model that people can work with.

So, based on that kind of thinking, I’m exploring different ways of thinking about design in Elixir. It has the advantage of not being a fully functional language, so it allows us some wriggle room in which to experiment. I’m seeing what happens if we relax some of the “rules” that everyone says are necessary.

You say my approach is imperative. It might be, in the small. But my mid-term objective isn probably closer to a continuation-passing style, or Joe’s idea of process pipelines, or event sourcing.

The experiment is to design code as sets of cooperating but relatively independent processes. Each process acts as a kind of reducer, taking an input and the current state and producing and output and an updated state.

When I first tried coding like this, every function was stateless: it received input and state as arguments and returned an output and updated state.

But if you try that for any real world program, you end up with a big ball of mud: the state becomes unwieldy. So instead I’m currently trying the opposite: each function (I guess really, each server) manages its own state, and that state is always private.

Do you lose some of the benefits of other approaches? Sure. You mentioned error reporting, and I agree that’s a major issue. (But that’s as much the fault of the horrible error reporting that the Beam does. I think that some effort spent there would make life easier regardless of where state is held.)

But my approach is not to focus on the stuff that we might lose, but instead to look at what could be gained. I understand all the negatives, all the reasons that “this is how we do things.” Instead, I’m excited by the idea that there might be changes that end up making it easier to write code in this increasingly complex world.

3 Likes

Isn’t Plug exceedingly functional? It’s a simple pipeline that passes in a Plug.Conn and passes out a new version of it (not editing the original, since it can’t), with no side effects, hidden state, only magic is the macro pipeline builder (plug Blah instead of just doing |> Blah.call(unquote(Macro.escape(Blah.init([]))))), which is quite easy to understand overall. The only impure parts about it is when the socket is communicated with like getting the request body or sending the result, which are done using messages to the socket process, but Plug itself is entirely pure otherwise. o.O

Functional doesn’t mean pure, they usually go hand in hand in most cases because purity makes functions trivial, but it’s not a requirement. The BEAM breaks purity because you can pass messages and set process flags and data. OCaml breaks purity with ref's (although when the Algebraic Effects get pulled in then that will become pure, but right now it’s not), as well as an escape hatch for doing low level assembly calls. Etc… etc… But they are both very Functional, generally some of the prime examples of Functional Languages. :slight_smile:

You can always set process flags/data at each call to make it more obvious where some data came from as well as decorating the state immediately with some ‘recently accessed’ info or so, there are patterns. :slight_smile:

1 Like

You’re making my point for me. Plug.Builder is remarkably not functional, but no one cares because it is easy to understand.

That’s exactly what I said.

But it is functional though. :slight_smile:

Plug.Builder is a set of compile-time macro’s that transform ast to ast. And all calls made within it are also functional. :slight_smile:

2 Likes

Sorry, but anything that keeps state in module attributes is not functional. The use of @before_compile is a pretty good hint that we’re storing state somewhere to be used later.

The Plug DSL is decidedly stateful…

Except it only uses those at compile time, and they are inline then, like here:


This is where you use Plug.Builder, it defines a few function, sets a few compile-time variables (behaviour and plug_builder_opts), and at the end it does register an attribute to aggregrate an attribute. However, a module attribute at compile time is the same as just normal bindings in a normal function at runtime (it is runtime at compile-time), they are not accessible before defined and they can be rebound just like function bindings.

Even still, this relates to Purity, not Functionality, it is not entirely ‘pure’ as I think Module.register_attribute/3 sets something in an ETS table, but it is still Functional. :slight_smile:

It is stateful in the same vein as bindings in a function are stateful, like this:

def something(a, b) do
  c = a + b
  blah(c)
end

This function is stateful in that there is intermediary state before the c = a + b expression, a new state where c is bound, you cannot use the c binding before it is bound (state!), and the state is passed into the blah/1 call, but this is still Pure and Functional both.

5 Likes

Personally I’m interested in the net gain, so I think both should be considered when evaluating some proposal. In this particular case, my impression is that the only supposed benefit is information hiding. Given the downsides I’ve listed, I think we lose much more than we get. In fact, I’m currently not convinced that there’s any benefit here. Information hiding can be done without processes too, by declaring the type as opaque and using dialyzer to verify it.

Note that my comments are related only to this single aspect of your exploration. The components experiment seems to pack a lot more than just separating concerns with processes. In fact, based on what I’ve read so far, I’m not really sure how important the “design with processes” approach is to the entire components thing.

In any case, I wish you best of luck with your exploration. My comments might be somewhat negative, but I’m actually a fan of challenging old dogmas, so I’m looking forward to see how your experiment will evolve!

10 Likes

Personally I quite like the idea and the implementation.

Provided you understand when not to (ab)use processes – which many comments here indicated can itself be a problem, but it’s one solved with experience regardless – then I think this library has a really good niche. Even if you use it to only reduce boilerplate, it’s still a big win.

Because a DSL compiles to functional code does not a functional language make :). Though, yes, it is much about “feel” and not technical details.

When I see a plug router with the DSL I do not see functions that take inputs and return results - -at least I assume Dave is referring to the router because he mentions the conn variable.

I wonder if it’s a documentation issue… I’ve seen plug as a purely functional construct that passes, essentially a monad from function to function to function, returning a new version of it, but I’m not sure I’ve even read half of its documentation, I learned it by reading its code. Maybe the structure of ‘what’ it is could be made more clear in the docs?

2 Likes