Managing codebases of large umbrella projects

Hello fellow beamers!! I wish you a beautiful…? no, wonderfull…? no, charming…? oow to the heck with it. I wish you an downright exquisite and merry Christmas… Don’t celebrate it? Don’t worry! It seems to have turned into a consumeristic way to celebrate something that we don’t really value or understand anymore anyway… so nothing big you are missing… moreover you can be free to gift and love yourself (and/or others) any day of the year. (just an interesting bit of black humor before I begin :stuck_out_tongue: , no offence ;)).

But no, really, Christmas is nice, enjoy it :slight_smile: , then then remind yourself that the rest of the year is also nice, next enjoy that too!! :smiley: .

Sooo… this post is intended to be a discussion for good and bad practices of managing large umbrella projects, by large meaning it containing many applications. Not sure if its a git or elixir discussion.
I’ll share my opinions and views a story with my opinions and views, but my main concern is to hear, what the rest of you thinks about this “issue” and how do you handle it in your projects.
Hopefully we can get a conversation started to discuss all sorts of interesting things about this, maybe include how to manage these large apps in production in multyple servers and such.
Below starts my story/view on the subject, it’s long, more of a fun thing, if you are opinionated you might just want to skip it and give out your opinion to start a discussion.

Suppose we are given the job of building an office application, we are told that it should take care of many different tasks, for example there should be some domain logic, a web part, a backoffice application, invoice management and generation, a crawler to track stuff on the web, connection with 3rd parties, analytics, the list just goes on… and as one might suspect there is no insurance it won’t grow in the future.

Naturally we decide to go with an umbrella application and separate the concerns as needed, instantly we are faced with our first code management decision. How do we source control this system? We certainly could create one big repository and work with that, but being a bit far sighted we see that does not feel appropriate. Sure, we are small now, not many developers, and the code for the applications will be small and simple (for some time atleast), but we assume things will change, it is probable that soon developers responsible for some or one of the applications will want to manage them directly, without the main repository, maybe they’ll want a separate issue tracker, and they will definitely want to have a git log dedicated to the specific problem (makes things easier to follow, does it not?).

“Submodules to the rescue!!” we exclaim, and surely, they do feel like the perfect solution, that is until we realize that we will have to create a non-negligible amount of repositories (how many did we mention like 7? try adding 3-4 more and now it’s like a lot!) before we can even begin to code (well not really). And while creating them locally is pretty straight-forward, creating, linking and generally managing them online (in services like Github) is certainly going to be an undesirable chore… We pause thinking to ourselfs… maybe we should just man/woman :wink: up and go through this initial (and possibly latter aswell) pain.
Or maybe we could compromise, creating a big repository upfront and later extracting applications into submodules we deem it necessary, this seems like a pretty good compromise, however we would be losing some commit history for the respective applications.

Options with trade offs start to appear, we decide to consult the all knowing Google, so we open up a browser and write up something among the lines of “git manage large umbrella”, but the most useful/insightful thing to stumble upon seems to be the following phrase:

“Shaumik is an optimist, but one who carries an umbrella.”

  • The All Most Knowing Google

At this points the rest of the team will probably just slip away.

But you are still there, you wonder for a few seconds whether, that sentence is part of an article, a comment or some kind of ad. So you click on it. Turns out its non…
It’s a post though, so you read it… And it’s quite interesting, and entirely unrelated. (no!.. I won’t take “but its about git!!!” as an argument)

So you say to yourself… We got this!! We’ll write down the trade offs and figure this out.
Of course by now you are alone, but that never stopped you, you take out your big-ass blog, your mechanical pencil, and that eraser you end up never using while destroying the on-pencil one.
And you start. You write, you draw, you graph, then contemplate, then write, graph and draw some more…

After some time and a lot of struggle and consideration, a fellow team-member comes to inform you that your work is no longer necessary. He continues by explaining that while you where trying to figure this out, a super intelligent AI has come to existence and all human made software was no longer necessary,
also that it’s no longer the “norm” to have humans solve problems…
Apparently you where concluding for quite some time now…
He also mentions that the project turned out to be a huge success (how could it not?! it was build with elixir, right? :smiley: ), finally he notifies you that the team just ended up using a huge git repository anyway. And that, quote:
“It was a little annoying at times, but one could get things done.”

So yes… :stuck_out_tongue: let me recite my questions. What does the rest of you think about this, and how do/did you handle the codebases of large umbrella projects?

I really hope not to hear something among the lines, “Well… git is really bad for umbrellas, you should use [insert_your_scm_here] instead.”

P.S. Sorry if this was too long. I kinda like writing. Like a lot…

4 Likes

I certainly don’t think there’s anything wrong with starting out in a single git repository! You may know / believe / hope that your project will quickly outgrow that solution, but if it does… Just extract the individual applications that need it to their own repo; and no, that doesn’t have to mean losing history:

https://help.github.com/articles/splitting-a-subfolder-out-into-a-new-repository/

As for submodules, sure, if you’re comfortable working with them… But at that point I’d almost rather publish separate Hex packages, and use them like regular dependencies. If it’s internal use only, one should be able to set up a local Hex repo with a bit of work.

Then again, I’ve never worked with submodules - only heard discouraging tales about them - so I might yet be enlightened, who knows? :sunglasses:

4 Likes

I did not know about the splitting! Thanks for sharing that! <3

I’ve used submodules a couple of times before, once in a project for some external dependencies we did not want or could’nt handle with package managers, or maybe we also wanted to have the source code lying around, can’t remember really. Then I simply used one in my dotfiles for similar reasons. But that is a completely different use compared to how one would use Submodules in an umbrella context, in the former we hardly if ever looked at or modified the contents, while in an umbrella the submodules would actually be your project, therefore they would be used and modified constantly.

In fact I have been using submodules for an elixir umbrella project for quite some time now. So I’ll share my experience with that.
I was assigned this long running (development wise), mutlypurpose application, similar to the one I mentioned in the first post, being new to elixir at that time I created a phoenix monolith to test out the technology and rapidly create a prototype, elixir turned out to be a better fit than I could ever imagine and all went well, after some time I decided to go serious and separate the concerns into an umbrella, like the community and many influential talk encourage you to. I had also heard of submodules being used for the separate applications so I decided to use them aswell, even though I was and will likely stay the sole developer to work on the project.

Turned out both of the ideas suited me and the project pretty well, umbrellas are simply amazing, we all (should) know that, if not there are plenty of talks and discussions to learn about them. But the submodule approach also turned to be very helpful, in ways I did not initially expect.

Specifically it made it a lot easier for me to find files, and navigate through the projects. I guess this depends a lot on your editor and practices… using emacs (spacemacs to be exact), I will most often navigate with projectile, it uses a fuzzy finder that searches for files in the specific git repo, this way having separate git repos reduces the amount of files for each one significantly, that is especially noticeable when there is/are a/some phoenix app(s), since they usually have a lot of files. But thats just the start of it, elixir applications tend to have some standard files, most notably mix.exs, or others like model.ex, test_case.ex, it’s also common to have application.ex for supervised apps, in an unified repo these files can be annoying to deal with as you always have to carefully read what you are about to open, and still you are bound to open the wrong file from time to time, but while being inside a submodule you are ensured to have a single mix/model/application.ex file, resulting in not having to break your focus to find the right one. This “effect” has grew on me, and now I actually use it for my domain logic often. Different applications might “adopt” the same entities in a different context, for example tours, extrenal_services and invoices might all have a safari.ex but as expected each would have an entirely different functionality, this makes file names a lot simpler they feel nicer, having these files namespaced (how does external_service_safari.ex or safari_external_service.ex make you feel?) would definitely look worse. Given we have adapted a bit to this system, we can get to our files almost instantly, but most importantly while breaking little to no focus. One thing I did not mention is that we can also swap/switch projects with projectile, again through the convenience of a fuzzy finder, and in this kind of setup we would do so often.

Now one could point out that this is way too specific to our development setup, and that the approach might be horrible for most, I can only agree with such an argument… But then again maybe its time to step up your practices. That is if you think it will bring benefits.

Another “side-effect” from submodules I found interesting is that it will make it harder/more unlikely to commit many files at once, it often happens that I will start working on a feature and among the way do some refactoring, or maybe even fix an unrelated bug. At the end I have to either stash files separately which I usually struggle against or make one ugly commit that looks something like "add feature A, refactor B, fix C" I always hate myself for doing these commits, but you know… its just easier some times. With submodules because of separation I find that happens less frequently… way less, and if it happens usually its not a drag to stash and commit files separately. Also I find my commit messages have gotten more specific and cleaner in general. On the other hand with submodules you always end up having to do more commits, as a single feature might involve 2-3 repositories, and you have to make separate commits for each, honestly I don’t think that is necessarily something bad, but others might think differently.

Up till now I’ve only had experience by using this submodule practice as a sole developer on one machine, I am still to see the benefits or drawbacks in other aspects or/and on larger teams.I suspect that the approach could benefit projects and teams, but being both new to elixir and as a developer in general, I have failed to see this approach in the wild other than in my own creations.

That said I’m particularly curious to see how one might leverage this practice of using git Submodules, and what could be the potential “true” benefits for larger projects and teams. I am also interested if it can be helpful in deployment processes, even though I would expect larger and more mature systems to use otp-releases, and therefore not directly being influenced with how the source is managed.

5 Likes

I have umbrella project with 4 applications in it today:

  • 2 Phoenix without database (web for customers, web for admin),
  • 1 Phoenix with database (provides access to database for previous two and common functions),
  • 1 Elixir with database (user management system).
    One of the applications (user management system) will be used in future projects.

I found very handful approach with git submodules for development:

  • each project can be used separately elsewhere,
  • each project has it’s own git history.
    A bit frustrating is to make git commit in umbrella project with messages “Update apps” after commits in submodules.

I’ve yet to use umbrella projects myself, instead I make a top-level application that includes the parts as dependencies then just bootstraps them as necessary. I’m actually not entirely sure of the drawbacks or benefits of it compared to umbrella’s. For configs I just call my dependencies config default files then make edits as I wish. View’s are pretty transparently handled. I include other routers via a plugin system I built, etc…

The main advantage of umbrella vs multiple packages is that you usually keep the umbrella in a single VC repository and version it together. From my experience maintaining a ruby app that was composed of multiple gems, almost half of the commits were “Bump version of something”, which was very annoying.
Beside that there are some mix tasks that handle umbrellas differently, which is very helpful.

Everything is the same at runtime - umbrella is mostly about managing code as in the source files and directories it goes to.

I’d say that using submodules with umbrellas completely defeats the purpose of an umbrella project.

6 Likes

Makes sense, but I actually do keep the project-specific dependencies in the same git repo anyway, brought in as a dep via file. :slight_smile:

I’ve probably rebuilt umbrella apps honestly, but I came from an old Erlang perspective where umbrella’s were not really an idea. ^.^

Yes, that can be a bit frustrating. However I came up with a discipline early on to avoid these commits. According to it one would commit to the main/umbrella repo only under 3 circumstances.

1st. A feature request/story is ready across the apps, the commit message would look something like “Completed feature A”
2nd. A minor or major release of the sofware as a whole (usefull for bugfixes). For this I would simply write “Version bump x.xx.xx”
3rd A new submodule/app is created, there the commit could be “Application xxxxx added”

This means there are times I work in the submodules for days without committing to the main repo.

Oww there is also the extra case of when I change something on the umbrella itself like mixfile or something but thats rare.

Hmmm I can somehow understand that, but up til now I have only positive things to say in my experience using submodules for my umbrella, in fact I would say that using submodules comes naturally and fits really well with umbrellas, the only negative experience is when I have to create the online repos for each of the submodules, especially since I don’t really need them. I only want to be able to sync with production and don’t need issue tracker and the like. I would love some git server that could handle the repo with submodules using nice abstractions, preferably nearly invisible but flexible ones.
I might even work on a personal solution that would do that.

1 Like

Thank you for hints with commits :slight_smile: