The Distributed Hash Tree

Qqwy · May 2, 2016, 11:31am

All right. This is something that I came up with about half a year ago, and since then it got delayed over and over again.

Now, with multiple people here talking about possibly implementing a Blockchain or a censorship resistant protocol, I’ve decided that it is better to present the unfinished paper of the Distributed Hash Tree to the world now, than it is to possibly never publish it.

The Paper

For people wanting to read everything, here is the white paper I have written. I do not think that the paper is entirely done, but it might be better to get feedback now.
The Distributed Hash Tree - White Paper v0.6.2

Earlier versions:
v0.5.0

Tl;Dr:

A Distributed Hash Tree(DH3) is a distributed network that allows in-band content discovery. It is built upon the concept of the Distributed Hash Table, which has seen a lot of usage lately.

In some ways it is similar to a Blockchain, with as major advantages that not the whole state needs to be copied to all parties in the network, and that it is not necessary to spin up a new cryptocurrency to keep it running. The major drawback is that the resulting consensus is tree-shaped, where siblings are unordered.

This makes it highly useful for things like distributed bulletin-board systems, distributed voting systems or distributed content discovery systems.

What is next?

I have created a proof-of-concept implementation in Ruby/Rails a few months back. This implementation was made to while simultaneously figuring out how all moving parts would work together, and therefore it is very ad-hoc. It will also break under anything but the tiniest loads.

I believe that Elixir is a very good candidate language to write a DH3 implementation in.
I will be creating a DH3-implementation over the next few months, but I would be very happy if you’d like to join me. After all, more eyes percieve more and I am certain that there are smarter people than me amongst you.

All right!

This is the first time I’ve publicized this idea, so any feedback, questions, ideas etc. is more than welcome.

georgeslandry · May 2, 2016, 8:55pm

I read your white paper. I think its genius. i recommend you look at vector clocks

Qqwy · May 3, 2016, 4:18pm

Thank you!

Vector Clocks are certainly interesting, although I don’t immediately see how they might be useful in a distributed system where an unknown (varying) number of nodes collaborate. Please do elaborate!

georgeslandry · May 3, 2016, 4:34pm

They used that principle in the Phoenix framework to build a feature called presence which seems to address some of the technical issues you mentioned.

Phoenix Presence

Phoenix Presence is an upcoming feature in Phoenix 1.2 which brings support for registering process information on a topic and replicating this information transparently across a cluster. Its simplest use-case would be showing which users are currently online in an application, but we’re excited about other lower-level uses such as service discovery. This feature at first seems simple and mundane, but most libraries fail to properly tackle it and those that do introduce extra dependencies without solving edge-cases. Not so with Phoenix.

georgeslandry · May 4, 2016, 12:20am

Also I available anytime you wish to discuss. Im very interested in this project but I am quite new to elixir and coding in general.

uranther · May 6, 2016, 4:14pm

Thank you for sharing the white paper! It was not immediately obvious to me that Distributed Hash Table = DHT and Distributed Hash Tree = DH3, but this certainly looks like a valuable evolution of the concept. Is it of your own making? Based on research from others? I did not find a Wikipedia entry so it must be new.

I do have some feedback for the paper. I was a little confused about the in-band and out-of-band discovery modes, and I wish you would elaborate there to make it absolutely clear this distinction between DHT and DH3. An example would help illustrate this problem which DH3 is solving. Perhaps some history of DHT would be appropriate, and why Kademlia was created and what it is. Has DHT been insufficient in other ways? Also, do you have any expected performance characteristics of DH3?

I would like to contribute to this project. I suggest forking the MLDHT project and start adding the DH3 pieces.

Qqwy · May 6, 2016, 10:32pm

Thank you for your reply! This is some very useful feedback. To answer some of your questions:

This is indeed a new concept that I have come up with. Of course, it is based on papers and works of other people that will in the final version of the paper be properly listed. Most notably the Kademlia DHT concept, the Merkle tree property, (EC)DSA and Bcrypt.
I wholeheartedly agree that a clarification of ‘in-band’ would be good, as well as some history off DHTs.
As for insufficiently in other areas: The main thing I can think of is the Sybil attack (which is something that is itself not elaborated on in the paper, but should, I think). A solution for this has also been presented in the paper.
I do not have any concrete expected performance characteristics; Much of it is dependent on the actual constants that end up being chosen (how much servers a value is replicated on, for instance, as well as how often servers should re-send stored values). It is a performance<->integrity trade-off. I expect it to be slower than systems like e.g. Riak (but less complex and able to use in neteorks with untrusted peers), but much faster, less resource (storage space and cpu cycles) intensive and again less complex than Blockchain-based systems. DHTs are able to load-balance very well, so this would be the same for DH3s.

Thank you for linking to the MLDHT Elixir library. I was not aware that there was already a DHT-implementation in Elixir, so this is great (both making implementing the DT3 easier as well as helping the MLDHT out).

Qqwy · May 6, 2016, 10:49pm

Great that you are interested! All feedback and discussion is welcome, a s there will probably enough work on the project once it starts, even for less experienced coders.

Based on yiur tips about Vector Clocks and Phoenix Prescence, I came across Alexander Songes talk about CRDTs (concurrent replicated data types) which have some overlap with the DH3.

I will type more specifics about that tomorrow once I am not bound by the limits of this mobile phone touch screen keyboard.

Have a wonderful day,

~Qqwy/Wiebe-Marten

georgeslandry · May 7, 2016, 1:03am

great I look forward to it.

georgeslandry · May 7, 2016, 11:45am

I think forking MLDHT is a great idea. The mainline DHT network would be a great asset for bootstrapping peers. I think we should just fork it so that our users can pass additional metadata over exist infrastructure.

Qqwy · May 7, 2016, 12:48pm

What we should not do, however, is to combine the actual network with the Mainline DHT. Because the DH3 has different requirements in the performance <-> integrity trade-off (in the DH3, integrity is more important than performance) , and because keys are not content-addressible as I believe is the case in the Mainline DHT, it is better to not strive to combine the networks.

Qqwy · May 7, 2016, 7:11pm

I took some time today to rewrite some sections of the paper, by using the feedback that has been obtained. The new version can be read here.

georgeslandry · May 7, 2016, 7:14pm

Hey thanks for this. I will read it ASAP. in the mean time, I hope you take a closer look at the phoenix framework. I think they have done a lot of work for us.

Here is an interesting video if you have time: https://www.youtube.com/watch?v=E6RTBlO9TZQ

georgeslandry · May 9, 2016, 7:18pm

Hey I have a question about how you mitigate against sybil attacks. I’m not sure how you plan to make it “computationally expensive”. I wonder if someone who has more powerful/accelerated hardware could overcome this defence (if they really wanted to). Perhaps some kind of reputation score might also work. Something like measuring uptime, past reliability metrics, etc… reliability data gathered by each node could be analyzed individually rather than collectively. this would allow for a more robust network with higher performance and lower resources would need to be allocated to verification.

georgeslandry · May 9, 2016, 7:20pm

Another question is about root nodes. what happens when they go offline?

georgeslandry · May 9, 2016, 10:48pm

Another attack vector you didn’t mention is IP address blocking by a firewall or national firewall policy. therefore it is important that this network have some compartmentalization. Though I think it would take while before anyone went to the trouble of blocking all dynamic IP addresses in popular DHT networks. it could still happen. So unfortunately this means that we will still need an out of band solution for bootstrapping peers. I suggest email because the collateral damage of turning off email would be too high. Another option would be image steganography this would allow users bootstrap peers via any medium that transmits images.

Qqwy · May 10, 2016, 6:57am

The ability to ‘clog’ the area around a single key you want to block (e.g. launch a successful Sybil attack) hinges on two things:

being able to calculate Server-IDs that are the closest to this key.
Therefore also: network size. (if there are only 20 nodes it is not hard to find 30 Server-IDs that are close to some key you want to ‘clog’, but if there are 2000 then there probably will be some that are closer)

By making the calculation computationally expensive, it is possible to make it unfeasible to calculate many similar keys: BCrypt with a cost parameter of 10 takes multiple seconds to compute on a normal personal computer. Even on dedicated servers that are only tasked with computing BCrypt hashes, it will take months if not years to have a reasonable chance to find enough Server-IDs that are close to the wanted key.

It is slightly confusing because the term node can both refer to an element in the tree and to a server running the DH3 application. root tree nodes never go offline by themselves, as they are saved on multiple places on the network (and are, just like all (key-> value) pairs in the network re-broadcasted ever so often to ensure they will be stored even when part of the servers they were saved on move offline).

There is no real concept of a root server. There will be a few servers that are known in the source so new people starting the application can bootstrap to these, but which ones these are might (and will) change over time, as well as people having the ability to manually alter this list.

Yes, it is totally possible to block IP-addresses using firewalls on a personal, organisatorial or Internet-Service-Provider level. However, as there are always new servers joining, old ones leaving and some servers changing their IP ever so often, the network would change faster than blacklisting would allow.

But yes: It is possible that new servers have trouble being bootstrapped for this reason, so yes: it is necessary to allow people to manually add one or multiple known servers. As to how this information would best reach new server-owners is something that the DH3 itself should not care about.

What would be a possibility, by the way, is to have multiple ways of connecting to other servers at some point; Not only UDP, but also TCP/IP and through a varying range of ports. This is also something that can greatly be expanded upon in the future (it is not necessary in the MVP) but it will make it much harder to block access through e.g. port-disabling.

georgeslandry · May 10, 2016, 11:26pm

Thanks for this. I know that the following are more intuitive questions.

How will the encryption overhead affect bandwidth performance?
How do think the performance of this network would compare to Bit Torrent?
What would the delays be on a live stream broadcast?
How is the tree sharded to reduce storage requirements for each node?
When you say “server” how is that different than “nodes” and is it necessary have more than 1 class of nodes in the network?

I agree that compromises need to be made for the MVP but these are things we need to anticipate.

Qqwy · May 11, 2016, 8:41am

The extra sent metadata of the DH3 will increase bandwith-usage. But, assuming that sent data is larger than this metadata, I don’t see this as a large problem.

As to how it will affect the bandwidth usage, I have no idea, because the feelings of bandwidth usages are unknown to me .

This is somewhat of a trick question, as the Mainline DHT in BitTorrent is only used to find out what peers locally have a certain torrent; the downloading is done by connecting to these peers directly without using the DHT. Nevertheless, DHTs are fairly fast. (I do not have a source to quote on that. I’d be very happy to read a paper on this, if someone can find one) That said, writing/reading data is by no means ‘real-time’. The DH3 is very much the C and P out of the CAP-theorem.

Because a broadcast is inherently centralized (there is one source creating data, and broadcasting it to others), this is something that cannot be solved well by any distributed system. There are solutions like Peercasting, but these have their own problems.

Another problem with streaming is that a lot of data is generated, and often the value of this data is mostly its freshness value, i.e. it is not very useful to keep this data persistently for the rest of time.

Tl;Dr: The DH3 is not suited for (live) streaming. The DH3 would be suited to tell everyone that a stream starts for whomever wants to listen in at location X, where all further traffic will be handled by location X in an application-specific manner.

Servers store the data that is closest to them. For safety, this data is not saved at a single location but at the k closest servers, where k is an application-specific constant (20 in the original Kademlia specification). This means that data is stored at least at 20 locations in the network. A larger network size means that servers need to store less data per server (but probably more data is produced as well at that time)

I use the word server to specify a computer(node) running the DH3 program that communicates with other machines computers the DH3 program.

I use the word tree-element to specify an element(node) in a tree.

The word node is ambiguous as it might refer to either of these terms, so I try to not use the word as to prevent confusion.

No, there is only a single kind of server in the network.

I hope that clears some things up.

Thank you for your questions!

georgeslandry · May 11, 2016, 3:06pm

Hey thanks again.

Let me give you a bit more context about where I’m coming from with this.

I just finished beating my head again the performance limitations of the bit message protocol. 4 minute round trips for messages and the computationally expensive proof of work scheme that prevents SPAM also means that file sharing over this network requires 15 minutes per megabyte and 100% of your CPU… The protocol does not have the performance required to be useful and thus gain traction. Something that cannot gain traction isn’t very interesting to commercial sources of funding. Therefore, Bit message may have been interesting, but not something that could become relevant.

You mentioned having a proof of concept in rails. Do you have any performance data? I would be more than happy to run tests with you on this.