I want to share a project I’ve been working on for a while:
Some time ago I came across a talk: How we scaled git lab for a 30k employee company.
(basic overview of the system architecture)
The presentation was about how the team at git-lab solved scaling issues on their platform. After a few slides I wondered how this could be approached with languages like Erlang and Elixir.
After some moments of reflection, I had it! A basic concept on how things could fit together:
(joke aside, that’s pretty much what I came with )
Erlang/Elixir and OTP provide a lot of building blocks to power a scalable Git platform. The idea was to use nothing but Elixir and libgit2.
So here’s a more detailed overview of the architecture:
- Phoenix does multiple things
- SSH server implements
:ssh_daemon_channelto handle Git SSH commands.
- Ecto stores application data such as users, repositories, etc.
A really nice thing about
:ssh is that it provides support for authentication via password and public/private keys out of the box:
- Password based authentication is supported both for HTTP and SSH.
- Users can provide one or more SSH public keys to authenticate with.
NIFs / libgit2
If you are not familiar with libgit2, it’s a C written implementation of the Git core methods and functions. One very unique feature of the library is that you can provide your own storage backend. Which means you can plugin you own distributed K/V database instead of writing everything to the filesystem.
Git transfer protocol & Packfile format
This is the fun part of the project .
libgit2 does not support server side commands, it only focuses on the client implementation. In the first iteration I cheated and used Ports to execute
git-receive-pack. It worked well, both for SSH and HTTP.
But I wanted to have more control over the process (hooks, etc.) and having to depend on
git only for the transfer protocol was a shame…
So I started digging in the protocol internals, docs. I worked with a lot of different network protocols in my career (medical field, DICOM, HL7, etc) but I must admit, the Git transfer protocol and the Git Packfile format was a quiet heavy sh**t to grasp.
- It has lots of different binary optimisations.
- It uses zlib to inflate chunks but only gives you the resulting size of the deflated data so I had to come with my own zlib C implementation.
- The transfer protocol’s differs depending on the transport protocol.
- Documentation is,
hard to find, scarce, well hmm.
Its currently quiet messy, but have a look here for implementation details.
Still a proof of concept, it’s working but still. Almost no tests so unexpected things my happen.
If you are interested, download the code and give it a try. PR are very welcome.