I wrote a library a couple weeks ago that implements a balanced, merkle binary hash tree.
All this bitcoin talk has gotten me inspired to learn more about merkle trees, which are a beautiful data structure for ensuring data integrity by elegantly chaining the hashes of leaves all the way up to the root and using the root signature as a sort of fingerprint for the content of the leaves. Given n leafs (transactions) they provide a O log (n) way of creating and verify an audit proof
I wrote this not to solve any particular problem, but I figured writing one would help my learning and would be fun.
I noticed that other merkle trees in many other languages either had different requirements e.g. size being a power of 2, or if they were odd had extra orphan nodes; or just plain too cumbersome to read through. I figured that with a balancing heuristic it would make the tree simpler and more flexible at the slight cost of minor updating (re-hashing up to the root and/or re-setting inner node search keys) overhead…
I don’t expect this library to take over the world, but please kindly take a look at it and play with it. You are welcome to formally let me know your feedback via github or can append to this thread. The test coverage is 100% but the tests could be more extensive…
Great work! And I love the naming ‘coincidence’ with Germany’s Bundeskanzlerin .
One thing I did realize, is that you currently support some of the hashing algorithms that Erlang itself provides. Have you thought about allowing the user to specify a custom hashing function instead? There are quite a few libraries on Hex.pm that provide nice hashing functions of both the cryptographically secure and normal-but-fast kinds.
EDIT Oh, and you listed under ‘Future’: “Serialization format?”.
Erlang has built-in functions to allow you to serialize/deserialize any built-in datatype with term_to_binary(term) and binary_to_term(binary). (Okay, almost all. The exception being references, pids and other obviously ephemeral structures)
That’s a good idea. I think it makes sense to support other crypto hash methods beyond the erlang ones like Tiger and Adler32 etc …
I noticed a few others had incorporated such a thing but I left it out for simplicity. I actually noticed two different Elixir approaches - one was via a use macro and another by passing in a lambda of the hash function desired to the constructor. Not sure which way is best. Thoughts?
I would assume the macro version would be more elegant by doing something like below, but honestly wasn’t sure how to pass in a custom function into the library. Usually for macro code injection you are injecting the macro code into the caller context, not the other way around from what I understand.
defmodule Foo do
use Merkle, &Custom.hash_function/1
def bar do
m = Merkel.new...
Other wise it would be something like the latter version in the form.
I could use another field in the Merkel struct to track the hash lambda function…
As far as the serialization format, I’ve used the term_to_binary type functions in Erlang but this was more of a reference to something called THEX or a tree hash exchange format.
One of the things it defines is a serialization format as well as specifying a different hash function for leaf nodes and inner nodes and also handling incomplete trees. I guess I liked the idea of standardizing the format more than anything else. The site which described it is down at the moment THEX
The goal would be to generate such a tree from a file and perhaps have this be language agnostic so that people could read a Merkle tree generated in Elixir into something in Go or Haskell or something
For configurable things like in this case the hashing function, I can give you two tips:
You can refer to a function directly, with someone typing indeed &Custom.hash_function/1 in the configuration file.
You can have Merkle.new contain an optional second argument and have it fall back to the configuration setting when it was not passed. This approach I used for instance in the queues/deques library okasaki (relevant part of the code).
You can indeed keep this inside the Merkel struct.
Ah! A language agnostic serialization format. Yeah, that would be cool .