Merkel - A balanced merkle binary hash tree

I wrote a library a couple weeks ago that implements a balanced, merkle binary hash tree.

All this bitcoin talk has gotten me inspired to learn more about merkle trees, which are a beautiful data structure for ensuring data integrity by elegantly chaining the hashes of leaves all the way up to the root and using the root signature as a sort of fingerprint for the content of the leaves. Given n leafs (transactions) they provide a O log (n) way of creating and verify an audit proof

I wrote this not to solve any particular problem, but I figured writing one would help my learning and would be fun.

I noticed that other merkle trees in many other languages either had different requirements e.g. size being a power of 2, or if they were odd had extra orphan nodes; or just plain too cumbersome to read through. I figured that with a balancing heuristic it would make the tree simpler and more flexible at the slight cost of minor updating (re-hashing up to the root and/or re-setting inner node search keys) overheadā€¦

I donā€™t expect this library to take over the world, but please kindly take a look at it and play with it. You are welcome to formally let me know your feedback via github or can append to this thread. The test coverage is 100% but the tests could be more extensiveā€¦

More info is located on the github page

10 Likes

Great work! And I love the naming ā€˜coincidenceā€™ with Germanyā€™s Bundeskanzlerin :stuck_out_tongue_winking_eye:.

One thing I did realize, is that you currently support some of the hashing algorithms that Erlang itself provides. Have you thought about allowing the user to specify a custom hashing function instead? There are quite a few libraries on Hex.pm that provide nice hashing functions of both the cryptographically secure and normal-but-fast kinds.

EDIT Oh, and you listed under ā€˜Futureā€™: ā€œSerialization format?ā€.

Erlang has built-in functions to allow you to serialize/deserialize any built-in datatype with term_to_binary(term) and binary_to_term(binary). (Okay, almost all. The exception being references, pids and other obviously ephemeral structures)

1 Like

Thanks! :grinning:

Thatā€™s a good idea. I think it makes sense to support other crypto hash methods beyond the erlang ones like Tiger and Adler32 etc ā€¦

I noticed a few others had incorporated such a thing but I left it out for simplicity. I actually noticed two different Elixir approaches - one was via a use macro and another by passing in a lambda of the hash function desired to the constructor. Not sure which way is best. Thoughts?

I would assume the macro version would be more elegant by doing something like below, but honestly wasnā€™t sure how to pass in a custom function into the library. Usually for macro code injection you are injecting the macro code into the caller context, not the other way around from what I understand.

defmodule Foo do
  use Merkle, &Custom.hash_function/1

  def bar do
    m = Merkel.new...
  end
end

Other wise it would be something like the latter version in the form.
I could use another field in the Merkel struct to track the hash lambda functionā€¦

Merkel.new(list, hash_lambda)

As far as the serialization format, Iā€™ve used the term_to_binary type functions in Erlang but this was more of a reference to something called THEX or a tree hash exchange format.

One of the things it defines is a serialization format as well as specifying a different hash function for leaf nodes and inner nodes and also handling incomplete trees. I guess I liked the idea of standardizing the format more than anything else. The site which described it is down at the moment THEX

I first read about it here quux00

The goal would be to generate such a tree from a file and perhaps have this be language agnostic so that people could read a Merkle tree generated in Elixir into something in Go or Haskell or something :slight_smile:

1 Like

For configurable things like in this case the hashing function, I can give you two tips:

  1. You can refer to a function directly, with someone typing indeed &Custom.hash_function/1 in the configuration file.
  2. You can have Merkle.new contain an optional second argument and have it fall back to the configuration setting when it was not passed. This approach I used for instance in the queues/deques library okasaki (relevant part of the code).

You can indeed keep this inside the Merkel struct.


Ah! A language agnostic serialization format. Yeah, that would be cool :slight_smile: .

2 Likes

Youā€™d be surprised how many languages have libraries that can convert the erlang external term format into native data structures!

5 Likes

3 Likes

Yep! Even javascript! The ETF is really easy to parse. :slight_smile:

1 Like

Thanks again for the tips, you folks rock! Yes etf is super cool actually. Iā€™m super impressed someone wrote an okasaki library for Elixir also!!! No more having to read it in MLā€¦

I ended up making a few changes given the feedback. Happy playing!..

Interesting threads on serializationā€¦
https://medium.com/@niamtokik/serialization-series-do-you-speak-erlang-etf-or-bert-part-1-ff70096b50c0

1 Like