PragDave’s new Component library - his preferred way of building apps

AstonJ · January 9, 2019, 4:08am

I’ve not had a chance to read this myself yet (on mobile) but noticed he’s just released this.

Some older discussions on the topic can be found via the elixir-for-programmers-course tag

For a while now I’ve been doing my Cassandra impersonation, telling everyone who’ll listen (and quite a few folks who won’t) that we need to be writing code in smaller chunks. I know what happens when we don’t, as I was the author of one of the largest early Rails applications (65kloc), and it became a nightmare to work with.

I don’t want the same thing to happen in the Elixir world. But if I’ve learned one thing, it’s that you can’t tell people that something is a good idea and expect them to do it.

No, you have to make it easier to do things the right way.

So, I’m releasing a first version of my Elixir Component library.

Anyway, the philosophy of all this is not to save on typing. Instead the intent is to nudge people into writing their programs using lots of small, independent components, linked via dependencies. That’s how I’ve been coding for the last year or so, and so far I’m really, really liking it.

Tutorial/guide: component/README.md at main · pragdave/component · GitHub

Blog post: Small is Beautiful—The Component Library

GitHub: GitHub - pragdave/component: Experiment in moving towards higher-level Elixir components

Tweet: https://twitter.com/pragdave/status/1082820327904821248?s=21

kblake · January 9, 2019, 6:03pm

I’d recommend reading the README over the blog post as it has more thorough examples. I’m curious to hear what others think!

AstonJ · January 9, 2019, 6:05pm

Yeah I agree, I labelled it as a tutorial in the post above as it seems quite in-depth

Unfortunately I haven’t got time to go through it right now though, so looking forward to hearing what everyone else thinks as well

otuv · January 9, 2019, 8:03pm

Reading the README it seems about right. I have had similar thoughts for quite a while but I have yet always failed at the implementation. My hopes is that this, or resulting best practices and examples could help me get of the ground.

AstonJ · January 9, 2019, 8:21pm

Have you (or anyone else) seen his Elixir for Programmers (PragDave) (Currently on offer for $30!) course… does it differ much to that?

otuv · January 9, 2019, 8:35pm

Unfortunately not. But reading the reviews it sure looks like something I should have been looking for!

axelson · January 9, 2019, 9:08pm

Yeah it different substantially since this is a library that implements the GenServer structure that he advocates for in the course and a little more.

sfairchild · January 9, 2019, 9:30pm

I went through the course about 2 months ago. It was great and he is really a great teacher.

I haven’t had the time to read the full article or README yet, hopefully tonight I can sit down and review it. But from the little snippet I did read it was very similar to the approach he taught.

jeremyjh · January 10, 2019, 12:41am

I’m a little bit concerned that this (blog post at least) encourages people to use servers where only modules are needed. In web applications for example, putting your application logic in a GenServer is an anti-pattern as it will lead to single-threading your requests, not to mention all the copy operations. I can’t argue that raw GenServer requires a lot of boilerplate but ex_actor has been cutting down the boilerplate for 5+ years. Is there more to it?

axelson · January 10, 2019, 12:46am

@jeremyjh that sounds like a good point to bring up on the issues list

AstonJ · January 17, 2019, 2:08am

New video:

blatyo · January 17, 2019, 4:55am

I’ll say my impression from what I’ve seen is that there is excessive use of GenServer and it appears to be using those GenServers almost like objects. There is this post by Saša Jurić that explains why that approach isn’t so good. In addition to the excessive use of GenServer, I’d say that a project per GenServer seems excessive as well.

I don’t think everything in the video is horribly wrong though. The things that stand out to me as odd are the DirWalker which is implemented as a GenServer that produces a stream and the HashStore as it’s own GenServer. It’s odd to me that the DirWalker is implemented as a GenServer, when it could just as likely be implemented with Stream.resource/3 in a module with just functions. The problem being solved in the video is effectively a map > reduce > filter problem. So, I can imagine it being very reasonable to have multiple mappers, though it could very well be slower in this case. My implementation to this problem would likely have been:

def find_duplicate_in(tree) do
  DirWalker.stream() # this wouldn't use a GenServer, why send a blocking request to another process?
  |> Task.async_stream(&HashGenerator.process/1) # Why reinvent tasks?
  #or |> Stream.map(&HashGenerator.process/1) # Concurrency could perform worse
  |> Enum.group_by(&(&1.hash), &(&1.filename))
  |> Enum.filter(fn 
    {_hash, files} when length(files) > 1 -> true
    _ -> false
  end)
end

pragdave · January 17, 2019, 7:04pm

If DirWalker had no process where would it keep its state: it creates the path list lazily: important on file systems with millions of files.

Internally the Hungry strategy uses async_stream, but it adds some stuff to it. First, it can be used in GenServers without messing up the message mailbox. It also adds convenient callbacks.

Finally, the overall approach above is synchronous. In want my component stuff to default to asynchronous for most uses, because that’s today’s world: event streams and reduces.

But, having said all this, I don’t think you’re wrong. I’m, exploring, just as everyone else is. I’m driven by this idea that things should be easier than we make them. The component abstraction is just the starting point for that exploration.

Cheers

Dave

blatyo · January 17, 2019, 8:38pm

With Stream.resource/3 you can return the list of the next items and an accumulator, which is where I would keep track of state. I agree about being lazy.

I don’t think I understand what that means. Could you elaborate?

I agree they’d be convenient in the model of components vs building your own GenServer. But I think, contrasting against passing just a function, I’d personally pick that.

I guess I would argue that the code I wrote is as asynchronous as your code is, even though you have more processes. Aside from the mapping stage, which we could easily achieve the same level of concurrency for, the places where you’re using processes, you’re effectively blocking one process to wait on another process that is synchronously doing some work. In my code, I’d argue I just removed the message pass by having it synchronously do that work. Maybe you just mean you’ve broken your code up into more concurrent primitives, but I’d argue that isn’t so important to do on the BEAM, where it has a preemption scheduler.

The solution I proposed does make the assumption that there aren’t going to be too many files. Were I to find it to be substantially more, I’d probably use Flow, which wouldn’t require much reworking of the initial solution.

I usually avoid creating bespoke GenServer’s when I can. I prefer using the functional aspect of Elixir over the actor parts. The things I think about that cause me to create processes are:

Does this data have a lifecycle longer than the operation I’m performing?
Do I want to treat the failure of this differently?
Am I performing stuff sequentially that has no dependence between each other?

I think the only places we disagree are where process boundaries should be and when to split code up. Most of the code you wrote, I’d probably copy verbatim if I was doing something similar. It’s also possible I’d be convinced components were a better fit solving a different problem.

Thanks

pragdave · January 17, 2019, 8:55pm

Sure: DirWalker uses Stream.resource internally. But it has a broader API than just streams, and so it uses a GenServer to provide that.

Task.async_stream sends messages to the pid that invokes it to synchronize the worker tasks. The fact that both it and gen_server receive messages on that same mailbox gets things al messed up. If you have a look at the component code, I check to see if async_stgream is being used synchronously or asynchronously. In the latter case I spawn a helper process in which the stream runs.

Actually, I don’t think I am blocking anywhere (apart from the call into the HashStore art the end). Everything else is event driven

gregvaughn · January 17, 2019, 9:37pm

@pragdave I really like this 'hungry" component abstraction and definitely plan to dig deeper.

But my initial question is about the one_way macro. I assume that maps to a GenServer.cast? I’ve read and watched plenty about backpressure though I don’t have much practical experience with it. One of the rules of thumb I took from that is to prefer call over cast but I heard you advocating for cast purely on whether the caller needed a response. Have you considered backpressure and if so, how do you approach it?

OvermindDL1 · January 17, 2019, 9:42pm

Backpressure is handled based on message sending, so if the gen_server gets lots and lots of message then processes that send it a message get ‘slowed’ down (higher reductions used) to throttle them enough so the system doesn’t get overwhelmed. It doesn’t matter whether call or cast is used for that. Generally you want call when you want to serialize the call or cast to async the call.

Do note, I think something about the backpressure mechanism changed or is going to change in OTP recently, so this may not be accurate as of the latest version…

gregvaughn · January 17, 2019, 9:46pm

Right. I have heard of that mechanism, but perhaps I misunderstood how completely it handled the situation. Good to know.

JEG2 · January 17, 2019, 10:09pm

I’m not aware of a “backpressure mechanism” in OTP.

One way backpressure is commonly added is to check the process’s mailbox size in the caller, before sending the message. If the size is under some high water mark, a cast is performed. If the size is at or over the same mark, a call is made instead. This forces the sender to wait on the reply, thus slowing them down.

JEG2 · January 17, 2019, 10:12pm

Yeah, I tend to favor call for the reasons you outlined. If I don’t really need a response I just return :ok. I switch to a cast after identifying a need: raw speed, circular message passing, etc.