Hello,
I want to share a project I’ve been working on for a while:
Background
Some time ago I came across a talk: How we scaled git lab for a 30k employee company.
(basic overview of the system architecture)
The presentation was about how the team at git-lab solved scaling issues on their platform. After a few slides I wondered how this could be approached with languages like Erlang and Elixir.
After some moments of reflection, I had it! A basic concept on how things could fit together:
(joke aside, that’s pretty much what I came with )
Building blocks
Erlang/Elixir and OTP provide a lot of building blocks to power a scalable Git platform. The idea was to use nothing but Elixir and libgit2.
So here’s a more detailed overview of the architecture:
- Phoenix does multiple things
- handles incoming Git HTTP commands.
- renders HTML for browsing users, repositories, etc.
- provides a basic GraphQL API for alternative clients (mobiles, etc).
- SSH server implements
:ssh_daemon_channel
to handle Git SSH commands. - Ecto stores application data such as users, repositories, etc.
Authentication
A really nice thing about :ssh
is that it provides support for authentication via password and public/private keys out of the box:
- Password based authentication is supported both for HTTP and SSH.
- Users can provide one or more SSH public keys to authenticate with.
NIFs / libgit2
If you are not familiar with libgit2, it’s a C written implementation of the Git core methods and functions. One very unique feature of the library is that you can provide your own storage backend. Which means you can plugin you own distributed K/V database instead of writing everything to the filesystem.
I heavily used code from the Erlang :geef library, refactored a good part and added a bunch of missing functions. Check the Elixir module and the C bindings.
Git transfer protocol & Packfile format
This is the fun part of the project .
libgit2 does not support server side commands, it only focuses on the client implementation. In the first iteration I cheated and used Ports to execute git-upload-pack
and git-receive-pack
. It worked well, both for SSH and HTTP.
But I wanted to have more control over the process (hooks, etc.) and having to depend on git
only for the transfer protocol was a shame…
So I started digging in the protocol internals, docs. I worked with a lot of different network protocols in my career (medical field, DICOM, HL7, etc) but I must admit, the Git transfer protocol and the Git Packfile format was a quiet heavy sh**t to grasp.
- It has lots of different binary optimisations.
- It uses zlib to inflate chunks but only gives you the resulting size of the deflated data so I had to come with my own zlib C implementation.
- The transfer protocol’s differs depending on the transport protocol.
- Documentation is,
hard to find,scarce, well hmm.
Its currently quiet messy, but have a look here for implementation details.
Project state
Still a proof of concept, it’s working but still. Almost no tests so unexpected things my happen.
If you are interested, download the code and give it a try. PR are very welcome.