Should we adopt Dave's way of building applications as a series of components? (Dave's talk has now been added!)

josevalim · June 3, 2018, 5:49pm

My concern is that having or not having a process tree does not tell anything about a library being well designed or not. For example, I could be using a library without a process tree that spawns processes outside supervisors. Then I could have another library that does the same thing, but correctly spawns processes under supervisors, and therefore has a process tree.

Someone may compare those two libraries and think the first one is simpler because it doesn’t have tree, when in fact it is poorly designed. So we need to be very careful with the “it has a process tree or not” label.

josevalim · June 3, 2018, 6:07pm

Sorry for being pedantic but, since you are asking someone else to build on their arguments, I would like to point out that you have not explained why any of the points above are true. I.e. it is not clear why the approach you mentioned above is easier to test, change, or reason about.

Just to make an absurd point, maybe the contract you define between those components pass all of the arguments between functions via the process dictionary. Maybe you do this:

# in the controller
Process.put(:product, product)
NewProductHandler.call()

# in the handler
def call() do
  product = Process.get(:product)

Now your code is broken between many files but it is still coupled and hard to test!

Every time somebody brings the argument that a module should only have a single function in their domain, I propose this: what if we remove Enum.map/2, Enum.reduce/3, etc and instead we introduce Enum.Mapper.call/2, Enum.Reducer.call/3 and so on?

To clarify: I am not saying breaking your software into multiple files is bad a thing but programmers clearly leverage different mechanisms to group functionality together. We need to identify when those mechanisms are measurably better (i.e. you can prove it leads to less coupling and better tests) and we need to identify when we feel the code is better.

Theoretically speaking, there is no difference between how easy to test or change is a big module MyApp with a 100 functions call_a, call_b, etc compared to 100 modules with a single function MyApp.A.call, MyApp.B.call, etc. There are, however, differences on how a developer understands and communicates with another developer in this codebase, which is what causes some people to feel one approach is superior than the other, but I can’t prove one is better than the other.

It is very important to make a distinction between proofs and feelings because they lead to very different discussions.

AstonJ · June 3, 2018, 7:38pm

Dave has just tweeted a blog post called Elixir Project Structure…

https://pragdave.me/blog/2018/06/02/project-structure.html

I think the way we organize our projects’ files obscures the code and make simple things complex. We do it not because we want to, but because we feel we need to in order to tame complexity. There’s another way: stop writing complex code.

Exadra37 · June 3, 2018, 8:07pm

@josevalim not being pedantic at all… I really appreciate you had took the time to comment on my thougs

Is easier to reason about because the code on that file is only about 1 specific action. So using the example of creating a new Product it is easier to reason about only this action if the file only contains code related to it. Now if you need to scroll the file and jungle with code that handles several actions on the Product it will become more difficult to grasp what is going there.

Is easier to test because I only need to test 1 action on that file, thus only need to mock the Collaborators that help with processing that action. So my test files is also smaller and easy to understand what they are testing.

Is less prone to bugs because smaller files of code are easier to understand and less probable to misunderstood or to miss some important bits that may lead to that situations where you fix 1 bug and create 1 or more other bugs.

Also smaller files of code tend to have a lower churn, meaning that you touch them less times, what for me represents less opportunities to create bugs.

josevalim:

Just to make an absurd point, maybe the contract you define between those components pass all of the arguments between functions via the process dictionary. Maybe you do this:
# in the controller
Process.put(:product, product)
NewProductHandler.call()

# in the handler
def call() do
  product = Process.get(:product)
Now your code is broken between many files but it is still coupled and hard to test!

The Handler is the the entry-point to really process the action, while in my example I just show the Controller we can use the Handler from a CLI command or from a Socket to full-fill the same action without the need to duplicate the code across Controller, CLI and Socket as we would do in 1 file approach.

So in OOP I practice this approach is a no brainier to test and does not couple at all.

Now if in Elixir is harder to test Modules that rely on Collaborators, then your affirmation of being hard to test and coupling makes sense to me, otherwise I am missing the point.

I would prefer to split each to is own module and then use the @pragdave approach of having a single module to define the public Api for all Enum functions. So the Enum.map/2 would be defined in the Enum module and then would delegate the concrete implementation to another module Enum.Mapper. In my point of view this would have the benefits of a nice public API with the benefits a more easy to understand code base.

michalmuskala · June 3, 2018, 8:08pm

The thing is that this distinction is completely irrelevant. For example - all ecto, phoenix and postgrex have supervision trees for some additional features (mostly caches) - the primary API you use does not leverage them, though. Cowboy for example also has a tree, but the only process is a simple process that refreshes cached date/time information for requests. Absinthe does not have one.
For me as a user of all of those - this is completely irrelevant. I don’t need to know how the supervision structure looks or what it does. It’s entirely internal to the application. There’s nothing that I, as a consumer of the application, need to do differently depending on the fact if there a supervision tree or not.

For example Ecto added the supervision tree in release 1.1 being entirely backwards compatible to 1.0 - nothing changed for the users.

Creating a distinction between applications with and without a supervision tree means that a newcomer must learn about those. The distinction is entirely artificial - as far as the VM is concerned both applications with and without supervision tree work exactly the same.

If we talk about making it easier for people working on the application and not the consumers - I don’t think doing this is useful. There’s usually a handful of people working on the application and thousands consuming it. We increase cognitive load on that thousands just to make it slightly easier for the couple. This is not a reasonable trade-off in my book.

Exadra37 · June 3, 2018, 8:24pm

The public Api for your library, component or whatsoever naming convention is used can be done in a single Module containing all the endpoint available for the consumers of it, but the concrete implementation for each endpoint would have their own Module.

So in my opinion this approach does not increase cognitive load on the thousands consuming it, but will reduce it on the few developing it.

michalmuskala · June 3, 2018, 8:26pm

I was talking specifically about distinguishing applications with and without supervision tree. The discussion about code organisation is a separate one.

peerreynders · June 3, 2018, 11:09pm

Clean doesn’t automatically translate to well-organized, it’s well-organized that has a habit of looking clean.

For the longest time it was a best practice to keep your HTML, JavaScript, and CSS separate in the name of “separation of concerns” (markup vs behaviour vs appearance) but these days people sing the praises of Vue components which let you combine JavaScript, (pseudo-)markup and CSS because it allows them to keep things together that change together (CSS can be a bit tricky because you have to separate structural from cosmetic CSS).

Just my opinion™:

The end result is files of code that rarely go above 200 lines of code and that only have 1 public entry point.

While I agree that small file sizes are desirable, size isn’t an absolute measure. There are always exceptions to the “rule” - but all exceptions have to be justified. And there is such a thing as files that are too small - if they separate things that belong together (i.e. lack of cohesion). That being said, most of the time a large module is concealing multiple small modules struggling to get out. The point is: “it depends”; as unhelpful as that may be in the general case.

Making things easier to reason about is about partitioning your problem space into well defined concepts that build on one another.

Some of the smallest concepts can be represented by a single line of code - which I personally have no qualms sticking in a single named function - not for reuse but because the name is a better representation of the concept than that line of code. Software that Fits In My Head essentially states that code needs to fit on a single screen to be easy to reason about. That single screen can, via named functions, refer to thousands of lines of code off-screen as long as those functions implement clearly (and coherently) defined concepts in the problem domain and still be easy to reason about. Now when a single function doesn’t fit on a single screen questions need to be asked. But (module) file size isn’t necessarily an indicator as to whether it is easy to reason about.

So the folder structure is like:

src
   Resource
       Product
           New
               NewProductController
               NewProductHandler
               NewProductRepository
               NewProductModel
               NewProductLogic
               NewProductView
           Modify
           Discontinue

In my personal judgement some things that jump out at me:

The code organization is dominated with concerns about resources, while at the same time the role that product plays in the problem domain is completely obscured. This system clearly serves resources but what problem do these resources serve? This is similar to a Rails application always looking like a Rails application, regardless of what the Rails application was actually created to do.
Controller, Handler, View deal with the representation. In my mind Logic, Model, and Repo are not part of the network interface implementation so they don’t belong here and there is likely functionality that needs to be shared across New, Modify, Discontinue. Also in REST I would see the controller as a manifestation of the Uniform Interface ({POST,New},{PUT,Modify},{DELETE,Discontinue}) so I would expect it on the Product level, no lower.
REpresentational State Transfer is an architectural style but as such only describes the protocol for how applications can interact over a network that observes the constraints as established by the HTTP protocol. But that doesn’t imply that the “resource organization structure” necessarily has to continue into the deepest bowels of your system. Doing so may superficially make your job easier right now when you are building the application for the first time with a bare minimum of capabilities - but the rigidity of this approach will likely make changes more difficult in the future. The internal system architecture should be optimized for what the system is supposed to do. The resource-oriented interface is then designed and built on top of that, “inventing” whatever additional resources are necessary to enable the necessary business transactions.

Software that is developed with re-usability,

This is an idea who’s time has passed.

Intentionally provocative but the point being is that lots of complexity has been committed in the name of reusability that has never delivered on the promise. Often you need more than two different places that want to use something before you can determine what is truly generic/reusable rather than trying to predict what (and how it) should be reusable.

The Rocky Mountain Ruby 2016 - Kill “Microservices” before its too late by Chad Fowler topic features lots of content that explores the notion that replaceability rather than reusability can be the more important design concept for successful systems.

cmkarlsson · June 4, 2018, 2:18am

I am the opposite direction. I very much prefer when applications have been tagged as being “library applications” (Erlang -- Applications). Then I know it doesn’t need to be part of the supervision tree and I can design my application around it

They are not exactly the same. A library application is not part of a supervision tree so it matters quite a bit to the VM. A library application only has to be loaded, not started and stopped. This means it it doesn’t have any callbacks the VM needs to wait for and it is not part of the application tree meaning hot code loading is much easier.

Hiding complexity to make it easier to use. Classic easy vs simple example. I understand that it is a trade-off but I don’t think it is as clear cut as you say. Making people learn how things actually work underneath also has benefits. Can these things be confusing and hard to learn? Yes, but if you have this knowledge you are likely able to make better decision further down the line.

josevalim · June 4, 2018, 7:26am

This is about how you define a contract and how you handle collaborators, there is nothing specific about being one or multiple files. Give me any code that you think it is testable because of multiple files and I will present you a direct equivalent using a single modules in a single file.

It is still likely that the sum of the churn of the small files will be equal to the churn of the big file. I.e. you are likely still changing the code as frequently, except that now you spread the churn around.

I was making an absurd example on how “hard to test” is not a property of multiple files but a property of the code. I understand if you believe multiple files makes you write more testable code, but it is a by-product and not a direct consequence of multiple files. YMMV.

Many of the functions in the Enum module are small and do belong together. I personally don’t see the appeal in following such strict rule for breaking every function into a separate file.

Although, as I said earlier, I agree this is a matter of taste. There are, however, a couple things that I can prove would be a downside when splitting the Enum module:

Modules require more space in memory and disk and more time to compile and load - if the whole Elixir stdlib was broken up this way, booting your Elixir apps would take longer and use more memory upfront
Elixir/Erlang can’t inline code across modules - so it would also limit optimizations we could apply
Calling a remote function is more expensive than a local one

It is unlikely that those would matter in your application, except maybe for compilation time, but for building the language itself, splitting every function into multiple modules would directly affect the usability of Elixir as a language, especially functions that may be called under a tight loop such as the ones in Enum.

mkaszubowski · June 4, 2018, 7:27am

That’s a great point right there. Sometimes, library having a supervision tree is just an implementation detail which users should not be aware of.

There’s a good rule of thumb “mutable on the inside, immutable on the outside” which states that as long as the public functions behave like they are operating on immutable values, I shouldn’t care if inside they are mutating some temporary variables - it’s an implementation detail. The case is similar for library/application separation. As long as some piece of code behaves like a library, should we always care that there is some processes running there? I know - it depends. Sometimes it is important. But forcing this distinction on developers could lead to just blindly following the rule without much consideration.

josevalim · June 4, 2018, 7:38am

Please see my earlier reply but looking at the supervision tree is not enough to assert this. Maybe the application changes some global state elsewhere. Or maybe the application only has a supervision tree because it simply wants to spawn tasks/processes.

While you should not care about the callback part, the hot code reloading is a good point. But, as I said above, the tree is not enough to determinate any code reloading issues. For example, you would still to double check if an application without supervision tree is using the app env.

The goal of an application is exactly to hide those concerns because the huge majority of times you shouldn’t care. Knowing how things work is always going to be helpful but it shouldn’t be used as a reason to remove abstractions. For example, learning how maps are laid out in memory and how the VM loads beam modules has helped me in certain situations but I am not advocating for manually managing small/large maps or for loading modules by hand.

yordisprieto · June 4, 2018, 8:24am

@josevalim what is your opinion on GenServer and the Configuration module?

Definitely I believe that we could add another layer of abstraction on top of the GenServer specially around handle_* functions

I think change a little bit how things works following his proposal (despite the names) it will help a lot to develop the future code.

josevalim · June 4, 2018, 8:37am

I think it is important to clarify that, and I think Dave would agree, is that GenServer is not wrong. Rather we are wrong in using GenServer to build higher level components, often with domain/business logic, and we need higher level abstractions to solve those cases. When building low-level components, the distinction between client, server, and the low-level callbacks are important, but in your application domain you want to get rid of the bureaucracy and put the focus on the domain code. There are also other libraries attempting to tackle this particular issue.

I agree with Dave in many of the problems he raised: yes, we need to improve configuration (there was a whole other thread about it in the forum). Yes, it would be helpful to know if an application is stateful or is a singleton (and as I said in this thread, I don’t think looking at the supervision tree gives the complete answer). As Dave said, there is a lot to explore.

yordisprieto · June 4, 2018, 8:45am

I think we can figure out other things first before we try to figure out this one. Seeing Configuration as just another external OTP app rather than a black box component will be nice.

No doubt about it.

As this is something that requires a lot of macros, would you considerate to bring this to the Core instead of letting the community to rely on those 3rd libraries?

cmkarlsson · June 4, 2018, 9:09am

Perhaps I am misunderstanding something but a library application does not have a supervision tree. The .app file doesn’t contain a mod key so there is no tree being started. I guess it could still spawn processes but generally just for short lived stuff which doesn’t require supervision.

Reading through the thread again (because I have a feeling I have misunderstood something) and I guess what tripped me was this:

And I likely misinterpret this. Because isn’t this is exactly what is being done today? The system differentiate between applications with a supervision tree and applications without. I.e mix new for a library application and mix new --sup for an application with a supervision tree so the user must already understand the difference.

LostKobrakai · June 4, 2018, 9:19am

I expect @michalmuskala did mean the user of an application (e.g. me using ecto) does not need to know if the application does start processes to do it’s job or not. The implementor of an application (e.g. me creating an api client) surely does need to know how it works.

cmkarlsson · June 4, 2018, 9:30am

Yes, that is understood. But as you are writing applications you already know the distinction so why would you necessarily hide the fact? To me its almost as knowing the difference between an Operating System service and a command line utility.

josevalim · June 4, 2018, 10:12am

I don’t think the solution actually needs to leverage macros. I would like to see such venues explored as well.

I believe @michalmuskala was talking about the usage perspective. As user of application xyz, we shouldn’t care if it has a supervision tree or not, especially because the presence or not of a supervision tree does not tell much about an application. The authors of libraries definitely need to know the difference.

sasajuric · June 4, 2018, 10:36am

As usual, Dave makes some great and provocative points in his talk. There’s definitely a lot of food for thought here.

However, I don’t necessarily agree with all the ideas. In particular, I have a slightly different opinion on GenServers. With GenServer, we end up with the client-side and the server-side code in the same module. Personally, I believe that this is a natural coupling. The client-side code delivers messages to the server-side code, optionally waiting for the response. Splitting these out makes no sense to me, because both sides of the code are revolving around a well defined single concern: client-server communication.

However, I agree that GenServers tend to become bloated. In most cases I’ve seen, this happens because we tend to embed complex server-side logic into our handlers, and thus we end up with two complex concerns in the same module.

Another symptom I’ve noticed is when we mix multiple complex server-side concerns into callback functions. So for example, if we manage some complex pure functional state, and mix this with some time-related concerns such as timeouts, backoff retries, handling :EXIT and :DOWN messages, we may end up with multiple concerns coupled together.

Sometimes, the interface functions have to do more complex transformation of the input parameters to message being delivered. That’s yet another concern mixed in the same module.

I think that these other concerns should be carved out if they become too big. So one thing I frequently do is implement a complex state in a separate module. Many times, this separation helps testing, because I can test a complex state without needing a GenServer. Also, I feel that it improves code readability, because I can analyze the state code without worrying about GenServer concerns (and vice versa).

As an example, you can take a look at my parent library. Here, I’ve extracted the state management into a separate module. I did this upfront after some initial premeditation. In fact, the functional part was the first thing I started working on.

I should note though that I only extract the state if it “feels too large”. The Stack server used in the talk is an example of where I’d just leave the state concern in the same module, as it’s very small.

Either way, when it comes to communicating, I personally feel that this is one concern which belongs to a single module. Packing the data into a message, unpacking it, responding, and waiting for the response are all part of the same job which is client-server communication. Splitting that feels unnatural and confusing to me.