I’m implementing a pure Elixir Git server (supporting SSH and HTTP) for fun an profit.
One of the tedious tasks i’m currently working on is the parsing of Git packfiles.
The pack-format documentation states following:
packed object header:
1-byte size extension bit (MSB)
type (next 3 bit)
size0 (lower 4-bit)
n-byte sizeN (as long as MSB is set, each 7-bit)
size0..sizeN form 4+7+7+..+7 bit integer, size0
is the least significant part, and sizeN is the
most significant part.
With Elixir we have the possibility to pattern-match on bitstrings which is great. In order to find the variant size of an object, i came with following implementation:
# MSB is not set, size is contained in four bits.
defp unpack_obj_head(<<0::1, type::3, size::4, rest::binary>>) do
{type, size, rest}
end
# MSB is set, read next byte
defp unpack_obj_head(<<1::1, type::3, obj_num::bitstring-4, rest::binary>>) do
{size, rest} = unpack_obj_size(rest, obj_num)
{type, size, rest}
end
# MSB is not set, calculate size based on acc_num and obj_num
defp unpack_obj_size(<<0::1, obj_num::bitstring-7, rest::binary>>, acc_num) do
with acc <- <<acc_num::bitstring, obj_num::bitstring>>,
len <- bit_size(acc),
<<num::integer-size(len)>> <- acc, do: {num, rest}
end
# MSB is set, read next byte
defp unpack_obj_size(<<1::1, obj_num::bitstring-7, rest::binary>>, acc_num) do
unpack_obj_size(rest, <<acc_num::bitstring, obj_num::bitstring>>)
end
But I’m missing something here, the result size does not match the size object body…
Clearly, I’m doing something wrong here:
with acc <- <<acc_num::bitstring, obj_num::bitstring>>,
len <- bit_size(acc),
# next line is not correct, we need to shift things here
<<num::integer-size(len)>> <- acc, do: {num, rest}
I came across a great article, git clone in Haskell from the bottom up. It states:
The overall length is then:
size0 + size1 + … + sizeN
.
After shifting each part0, 4 + (n-1) * 7
to the left.size0
is the least,sizeN
the most significant part.
Is there a simple way to achieve this with Elixir?