Nested Set in elixir

Hello everybody, I was trying to implement hierarchical tree structure in Elixir. At first I implemented a tree of processes, linked by parent, children properties. It was fast, even with about 50_000 nodes, but not very functional.

Then I implemented a map data structure which looks like a DB record %{id: %{id, parent_id, lft, rgt…}}

It tooks 37 minutes for the same 60_000. humm, updating lft and rgt is time consuming after each insert. But I don’t care too much about slow writing, I care about fast reading.

With Map, order is not maintained over 32 records.

With Keyword, atoms are not garbage collected.

My Question : What is the recommanded way to deal with tree/nested set?

spawn processes for each node? (quick)
use Map? (order not maintained, slow)
use Keyword? (atom consuming, slow)
Forget about trees?

I made a sample repo for the data struct here : https://github.com/kokolegorille/nested_set

Thanks for taking time

1 Like

Maybe gb_trees http://erlang.org/doc/man/gb_trees.html?

Example in elixir code https://github.com/meh/elixir-datastructures/blob/master/lib/data/dictionary/balanced_tree.ex

3 Likes

Thanks, I should have thought erlang has already done it before :slight_smile:

The main problem with building a nested tree from e.g. maps is that insertion of a child of some element will take O(log n). If you already know that each node has an unique ID (either because they are incrementing integers, or because they are an UUID), then you can instead go for a ‘flat’ approach in which you have a map where the keys are the nodes IDs, and each node contains a parent_id field that points to another node.

Note that it is possible to store any kind of graph in this kind of flat representation, including graphs with cycles. There is nothing preventing this from happening, so care needs to be taken while inserting/updating.

building such a tree now takes O(n) instead of O(n log n). However, finding all children of a certain parent will take O(n), since we need to check the parent IDs for all nodes.


As for your questions:

Correct. But if you build a ‘nested set’, then order should not matter.

Correct. It is unfortunate that Elixir does not allow Keyword to be used with string keys. I do not know the rationale behind this restriction.

This is definitely not the right way, since spawning a process for each node will take memory, and your data structure is a ‘flat data’ one, without any kind of automation.

I think using maps remains the best way. Either a flat map or a nested map in which the child nodes are part of the parent (From the example in your first post it seems that you are doing the opposite, where the parent nodes are part of the child. I believe this would create n! nodes in the internal representation, which would explain why it is so slow).

Maybe interesting to you is that I built Tensor, the sparse vector/matrix/tensor library on top of sparse rose trees, in other words: nested maps:

Tensor.new([[[1,2],[3,4]],[[5,6],[7,8]]], [2, 2,2]) |> IO.inspect(structs: false)
%{__struct__: Tensor,
  contents: %{0 => %{0 => %{0 => 1, 1 => 2}, 1 => %{0 => 3, 1 => 4}},
    1 => %{0 => %{0 => 5, 1 => 6}, 1 => %{0 => 7, 1 => 8}}},
  dimensions: [2, 2, 2], identity: 0}
3 Likes

Hi Qqwy, Thank You for your answer.

But maybe I should have mentionned I care about descendants and ancestors of a given node, not about parent children relationship. Adding lft and rgt allows me to query in one filter. But it needs to be ordered by lft to gave the correct answer.

I use the same metadata as in http://we-rc.com/blog/2015/07/19/nested-set-model-practical-examples-part-i and the same kind of insert, add, delete… except it happens with an Enum.filter instead of sql_query.

By the way, I looked at Tensor, which is very nice for what I am doing.

To gave a little context, this is the process I try to solve.

transform flat text file containing all opening moves for the game of go into tokens with lexer. (about 60_000 nodes)
I> parse those tokens into a hierarchical tree of nodes, each nodes represents one or more actions on the board…
I> validate each action with an elixir struct, I published this on hex here https://github.com/kokolegorille/go

File |> Tokens |> Tree/Node |> Actions |> Board

While all this flow is working, I try to optimize point 2.

I could parse with yacc, but i am to noob to understand how to add graph metadata while parsing with yacc.
I could play with processes,
I could use Mnesia, but Mnesia works best with records, not map
I could use Keyword

I feel none of those fit completely… But as I am new to Elixir, I learn a lot in the failure process.

But thanks to your reply, I will try to solve this with map.

Tensor can help me represent the board as a Matrix, and do rotation as needed. That is really cool!

1 Like

I feel functionaly stupid trying to reproduce database behaviour with FP.

There is no need to store lft and rgt because it is db related, to avoid n queries problem.

But that is not the case in FP, recursion is optimized, so querying descendants can be done recursively, with speed.

I finally load 60_000 nodes in 4.5 seconds, and I can quickly retrieve ancestors, or descendants.

From 37 minutes to ~4 seconds, that is nice!

I am still trying hard to unlearn technics I use with other paradigms, and use more FP thinking.

Thanks for support.

1 Like

Erlang also offers graphs based on ETS (digraph module), and sets of sets (sofs). digraph is quite approachable and digraph_utils has a nice collection of algorithms. They’re not really functional, though.

It breaks the erlang style of keywords (which I love) to be able to give them a more ruby’ish syntax. Some comparisons between erlang/elixir keywords (we really really need the table extension for discourse…), this is all in elixir syntax:

Erlang            | Elixir
-------------------------------------
[{:blah, 42}]     | [blah: 42]
[{:blah, true}]   | [blah: true]
[:blah]           | [blah: true] # have to expand it as it is unrepresentable as Elixir KWL, but is valid Erlang KWL
[{"blah", 42]     | [{"blah", 42}] # unrepresentable as Elixir KWL, but is valid Erlang KWL

I’d prefer something like this for Elixir KWLists that give the same power as Erlang KWLists and for uniformity with the Map syntax:

Erlang            | Elixir
-------------------------------------
[{:blah, 42}]     | [blah: 42]
[{:blah, 42}]     | [:blah => 42]
[{:blah, true}]   | [blah: true]
[:blah]           | [blah: true]
[:blah]           | [:blah]
[{"blah", 42]     | ["blah" => 42]

However yes, this is how I would represent most trees in functional languages in most styles, otherwise I’d probably use tuples or so…

1 Like