Merkel - A balanced merkle binary hash tree

I wrote a library a couple weeks ago that implements a balanced, merkle binary hash tree.

All this bitcoin talk has gotten me inspired to learn more about merkle trees, which are a beautiful data structure for ensuring data integrity by elegantly chaining the hashes of leaves all the way up to the root and using the root signature as a sort of fingerprint for the content of the leaves. Given n leafs (transactions) they provide a O log (n) way of creating and verify an audit proof

I wrote this not to solve any particular problem, but I figured writing one would help my learning and would be fun.

I noticed that other merkle trees in many other languages either had different requirements e.g. size being a power of 2, or if they were odd had extra orphan nodes; or just plain too cumbersome to read through. I figured that with a balancing heuristic it would make the tree simpler and more flexible at the slight cost of minor updating (re-hashing up to the root and/or re-setting inner node search keys) overhead…

I don’t expect this library to take over the world, but please kindly take a look at it and play with it. You are welcome to formally let me know your feedback via github or can append to this thread. The test coverage is 100% but the tests could be more extensive…

More info is located on the github page

10 Likes

Great work! And I love the naming ā€˜coincidence’ with Germany’s Bundeskanzlerin :stuck_out_tongue_winking_eye:.

One thing I did realize, is that you currently support some of the hashing algorithms that Erlang itself provides. Have you thought about allowing the user to specify a custom hashing function instead? There are quite a few libraries on Hex.pm that provide nice hashing functions of both the cryptographically secure and normal-but-fast kinds.

EDIT Oh, and you listed under ā€˜Future’: ā€œSerialization format?ā€.

Erlang has built-in functions to allow you to serialize/deserialize any built-in datatype with term_to_binary(term) and binary_to_term(binary). (Okay, almost all. The exception being references, pids and other obviously ephemeral structures)

1 Like

Thanks! :grinning:

That’s a good idea. I think it makes sense to support other crypto hash methods beyond the erlang ones like Tiger and Adler32 etc …

I noticed a few others had incorporated such a thing but I left it out for simplicity. I actually noticed two different Elixir approaches - one was via a use macro and another by passing in a lambda of the hash function desired to the constructor. Not sure which way is best. Thoughts?

I would assume the macro version would be more elegant by doing something like below, but honestly wasn’t sure how to pass in a custom function into the library. Usually for macro code injection you are injecting the macro code into the caller context, not the other way around from what I understand.

defmodule Foo do
  use Merkle, &Custom.hash_function/1

  def bar do
    m = Merkel.new...
  end
end

Other wise it would be something like the latter version in the form.
I could use another field in the Merkel struct to track the hash lambda function…

Merkel.new(list, hash_lambda)

As far as the serialization format, I’ve used the term_to_binary type functions in Erlang but this was more of a reference to something called THEX or a tree hash exchange format.

One of the things it defines is a serialization format as well as specifying a different hash function for leaf nodes and inner nodes and also handling incomplete trees. I guess I liked the idea of standardizing the format more than anything else. The site which described it is down at the moment THEX

I first read about it here quux00

The goal would be to generate such a tree from a file and perhaps have this be language agnostic so that people could read a Merkle tree generated in Elixir into something in Go or Haskell or something :slight_smile:

1 Like

For configurable things like in this case the hashing function, I can give you two tips:

  1. You can refer to a function directly, with someone typing indeed &Custom.hash_function/1 in the configuration file.
  2. You can have Merkle.new contain an optional second argument and have it fall back to the configuration setting when it was not passed. This approach I used for instance in the queues/deques library okasaki (relevant part of the code).

You can indeed keep this inside the Merkel struct.


Ah! A language agnostic serialization format. Yeah, that would be cool :slight_smile: .

2 Likes

You’d be surprised how many languages have libraries that can convert the erlang external term format into native data structures!

5 Likes

3 Likes

Yep! Even javascript! The ETF is really easy to parse. :slight_smile:

1 Like

Thanks again for the tips, you folks rock! Yes etf is super cool actually. I’m super impressed someone wrote an okasaki library for Elixir also!!! No more having to read it in ML…

I ended up making a few changes given the feedback. Happy playing!..

Interesting threads on serialization…
https://medium.com/@niamtokik/serialization-series-do-you-speak-erlang-etf-or-bert-part-1-ff70096b50c0

1 Like