Elixir mono-repo best practices

pnezis · March 7, 2023, 3:02pm

We are currently in the process of restructuring our codebase. Till now we follow the monolith way (well structured though) but we have started facing scaling issues (especially on CI pipelines). Umbrella is not an option since we have multiple applications and we wish to split the configuration as well.

The main goals of this restructuring are:

Speed up CI pipelines - by splitting the codebase into multiple packages we can build, lint test only the affected parts of the codebase
Code organization and clear separation of concerns
Better development experience - a developer working on the API does not need to compile dozens of Broadway pipelines or LiveView applications

A Ponzo like project with path dependencies seems as the best option:

Maximize code resue
Consistent tooling, code guidelines, CI pipelines
No internal dependencies nightmare

On the other hand there are some caveats, the most important of which is:

Each project will have each own lockfile, deps and _build paths making CI configuration more complex (e.g. deps caching) and introducing incompatibilities between external dependencies

Since custom paths for lockfile, dependencies and build paths are supported for umbrella projects I was thinking the possibility of solving this issue by having a shared artifacts folder for all packages/applications ( something like rust workspaces with a common lockfile and output directory for all packages). Imagine a folder structure like the following:

project
├── .artifacts
│   ├── _build
│   └── deps
├── applications
│   ├── admin_ui
│   │   └── mix.exs
│   ├── api
│   │   └── mix.exs
│   └── data_pipelines
│       └── mix.exs
├── mix.lock
└── packages
    ├── package_a
    │   └── mix.exs
    ├── package_b
    │   └── mix.exs
    └── package_c
        └── mix.exs

Where each mix.exs will have deps_path, lockfile and build_path properly defined:

def project do
    ...
    deps_path: deps_path(),
    lockfile: lockfile_path(),
    build_path: build_path()
end
...

I have tested this on a sample project and it seems to work fine. The documentation of build_path though suggests avoiding overriding this variable:

This option is intended only for child apps within a larger umbrella application so that each child
app can use the common _build directory of the parent umbrella. In a non-umbrella context,
configuring this has undesirable side-effects (such as skipping some compiler checks) and should be avoided.

What are these side effects? Does anybody has experience with elixir mono-repos? Any advice / alarms on the aforementioned structure?

Thanks!

BradS2S · March 7, 2023, 6:55pm

Agreed that diverging config by app under a umbrella project is an anti-pattern.

Functional Web Development with Elixir, OTP, and Phoenix starts by building the business logic as a separate application, without Phoenix. Might be worth thinking about breaking out your app as separate applications that are added as dependencies to the main app.

D4no0 · March 7, 2023, 7:22pm

I don’t think it is wise what you are trying to do, from what I understood you want a single runtime, but multiple independent codebases. This will require all the codebase to be compiled at the end of the day, because you will want at some point to start a part of the application on dev.

Instead of making this abomination, just take a step back and think how can you refactor it in a smart and easy to use way:

business domain - this definetly should be refactored to a library, as your business domain logic should contain no runtime logic.
endpoints, workers, etc. - refactor to entirely separated standalone services, then use a communication channel if you need sync or async communication between then, I would start with the most basic thing like inter-node communication and a library like swarm. As for the database, you can easily point all the services to the same database at the beginning, hoping that you ensured normalization when it was designed, then slowly refactor to different ones if there is a need.

While all of this sounds more complicated than the solution you provided, this approach will ensure total separation, making it much easier to monitor, deploy and debug of the applications.

pnezis · March 7, 2023, 8:14pm

I don’t think it is wise what you are trying to do, from what I understood you want a single runtime, but multiple independent codebases. This will require all the codebase to be compiled at the end of the day, because you will want at some point to start a part of the application on dev.

You misunderstood, I want the exact opposite - multiple runtimes from a single codebase. I want to avoid compiling the complete codebase since it includes multiple applications that should be separated. But I want a single code-base, mono-repo style in order to have everything under a single git repository. The CI would then build, lint and test only the affected packages based on the dependencies graph and the modified files.

Instead of making this abomination, just take a step back and think how can you refactor it in a smart and easy to use way:

business domain - this definetly should be refactored to a library, as your business domain logic should contain no runtime logic.

endpoints, workers, etc. - refactor to entirely separated standalone services, then use a communication channel if you need sync or async communication between then, I would start with the most basic thing like inter-node communication and a library like swarm . As for the database, you can easily point all the services to the same database at the beginning, hoping that you ensured normalization when it was designed, then slowly refactor to different ones if there is a need.

The application even in the current monolithic state is already well separated in multiple layers, including business domains, helper libraries and applications (presentation layer). So no actual refactoring is needed only a mv of some folders to some new mix projects under the same git repo.

What I want to avoid is to have multiple deps/_build folders per internal package. Umbrella projects support it already and I am wondering if this will work on the suggested approach as well.

D4no0 · March 7, 2023, 8:19pm

Is that only because of CI build time or there is another reason behind keeping the code in a single git repository?

pnezis · March 7, 2023, 8:20pm

Totally agree, thanks for the reference I have already read through it. The application is already well structured. The business logic, presentation layers (API, UI) and the data layers are completely separated with boundaries enforced. This is the plan to split it into multiple applications but I don’t want to go the umbrella way for multiple reasons, one of which is the application config, the other being that the codebase includes completely independent applications.

pnezis · March 7, 2023, 8:24pm

I want to keep the code in a single repository for multiple reasons:

atomic commits between changes across packages
avoid dependencies management nightmare (we have tried the poly-repo approach with a few internal packages and this corresponds to multiple commits in different repos for a single change)
better visibility for the team
common tooling, ci rules, and coding standards for all applications
consistent e2e testing

BradS2S · March 7, 2023, 8:25pm

What are the upsides of this? It seems like having it split up means less opportunities to break things or have two people work on the same file, etc. Also as your team scales, you have a natural demarcation of what different teams should support. Just my two cents.

Yes, this is the main benefit cited when considering whether go with an umbrella project.

BradS2S · March 7, 2023, 8:28pm

We haven’t talked about configuration yet, but from here we can build the intuition that all configuration and dependencies are shared across all projects in an umbrella, and it is not per application.

Source: https://elixir-lang.org/getting-started/mix-otp/dependencies-and-umbrella-projects.html#umbrella-projects

D4no0 · March 7, 2023, 8:41pm

If the packages are truly standalone this shouldn’t be the case.

This is true, however having 2 people work on the same file is not better.

I would never want to have all my codebase in a single place, it should be the rule of giving the minimal privileges, you don’t want to give to a new employee all your codebase on a platter.

Can be achieved by referencing the script from a common source.

This is a valid point, however taking in consideration multiple runtimes I don’t think it will be any easier than having separate applications.

From my personal experience, umbrella projects are more trouble than they are worth, I find it as a good tool when it comes to refactoring from a monolith application to services, after that they become a mess starting from management of dependencies and ending with release of application. I wasted a lot of time at a few projects having to deal with that and ended either refactoring to a monolithic application or migrating to services. If you don’t need services at this stage of your product, I would recommend to leave things the way they are, if developers are organized there will never be a mix of concerns in your code, as for compilation, everything is cached so they compile all the project only once.

pnezis · March 7, 2023, 9:40pm

From the same page:

Umbrella projects are a convenience to help you organize and manage multiple applications. While it provides a degree of separation between applications, those applications are not fully decoupled, as they share the same configuration and the same dependencies.

The pattern of keeping multiple applications in the same repository is known as “mono-repo”. Umbrella projects maximize this pattern by providing conveniences to compile, test and run multiple applications at once.

If you find yourself in a position where you want to use different configurations in each application for the same dependency or use different dependency versions, then it is likely your codebase has grown beyond what umbrellas can provide.

This is our case, the codebase is very big (thousands of elixir files) handling fully decoupled applications with multiple domains and the split is necessary for scaling.

pnezis · March 7, 2023, 10:00pm

Having to update multiple repositories (version bumps) for a change on a single package breaks the atomicity

If 2 people need to work on the same file they will need to work on the same file even if it is on a separate package.

We don’t apply any privilege control and I am not of the philosophy that employees should have limited visibility. All our employees get access to all our gitlab repos (>100). Also scaling is not an issue, even Google, Microsoft Facebook have mostly monorepos and their codebases are huge. A good read on monorepos: https://monorepo.tools

That’s correct

I agree and that is the reason I want to avoid an umbrella. And this is also the reason I don’t want to split it into multiple services on their own repos. With path dependencies on the same repo and independent mix projects you solve all of these issues. With the proper tooling you can also speed up the CI pipelines no matter how big the codebase is.

D4no0 · March 7, 2023, 10:21pm

Then the only way is just to put all projects in the same repo, using the domain logic as a library, referencing it by relative path and for runtime interaction to use any tools for communication.

As for keeping lock and deps in the same folder doesn’t seem to be the best idea, you will lose the flexibility of running some of the libraries with older versions and because elixir library dependency management sucks it will only get worse with time.

c4710n · March 8, 2023, 1:28pm

Besides umbrella, there’s also poncho which is introduced by Nerves community. You can check it out.

There are also some topics in this forum, you can use the “site search” for relevant information. The keyword is “poncho”.

mgibowski · March 8, 2023, 1:41pm

Your set up seems sound to me.
I don’t have much experience with that, but if you wanted to take it further (and have time/resources for that), you could investigate adopting Bazel.
I imagine you could declare dependencies between your projects and build them in parallel + get caching benefits.

There is a related thread:

pnezis · March 8, 2023, 1:42pm

Poncho is the closest to what I want to achieve, my only concern with this is the different deps and build paths per project.

What I want to achieve is something like the cargo workspaces in rust

Cargo offers a feature called workspaces that can help manage multiple related packages that are developed in tandem.
A workspace is a set of packages that share the same Cargo.lock and output directory.

lafka · July 18, 2023, 9:46pm

Did you end up using a monorepo structure?

I’ve just started an experiment exmonorepo to validate if mix can use a monorepo with directory structure similar to yours where the elixir apps/libraries are automatically discovered., I would be interested to learn about your experience implementing monorepos in elixir.

pnezis · July 19, 2023, 2:33pm

We are still in the process of transforming a very big codebase to a mono-repo. So far it works great.

At the same time I am in the process of creating a library for managing elixir mono-repos which I plan to open-source soon.

D4no0 · July 19, 2023, 5:18pm

Have you decided to go with umbrella projects, or a different solution?

pnezis · July 20, 2023, 9:40am

Plain mix projects with path dependencies under a single git repo.