Map with ordered keys

I’ve done a lot of GitHub searching but I can’t find an erlang/elixir data structure similar to Java’s SortedMap.

I’ve been using ETS :sortedset but the performance hasn’t been sufficient for my needs. Anyone got any suggestions ?

Need to do iterations and multiple lookups …

1 Like

Have you considered using this library?

5 Likes

There is an Erlang gb_tree that is built in. For ordered iteration, there is an iterator/1, and to_list/1 can be used with Enum I think. gb_sets is also available.

Sadly it’s interface isn’t compatible with Elixir pipes, because the data structure is the last argument but an Elixir wrapper shouldn’t be too hard.

6 Likes

If you need custom ordering, consider AVLTree

3 Likes

I’m not sure from the project docs it’s doing what I need.

Had a look. I’ll throw in a couple of examples now and maybe benchmark later.

Create new :

:gb_trees.empty

Populate with 10 elements :
gb = 1..10 |> Enum.reduce(:gb_trees.empty, fn x,acc -> :gb_trees.insert(x,x,acc) end)

Iterating elements :
{k,v,next} = :gb_trees.next(iter)

I will mention I started looking at native code to solve my issue (need high performance). Here’s what I also considered… After some investigation, I came across klib library (pure C, license X11/MIT) and it’s KBtree ordered map implementation. Considering wrapping it with a NIF.

If you need the high performance, you could still leverage ets for direct access and use a separate data structure just for iteration of the keys, but that does get cumbersome.

Nifs are tricky :slight_smile: you can take a look at this (very similar) nif I write to wrap a C data structure (also a single header lib). If you need to load a huge amount of items into that tree consider using a dirty nif as not to block the whole VM while it scans the tree. That also comes with overhead of it’s own though.

This probably calls for a generic Rustler-based Elixir library that bridges various Rust data structures to Elixir code. In your case that’d be Rust’s BTreeMap.

Discord have a sorted set library with bindings to Rust – not a map but still, in case you need this one as well. You can also just take a look at Discord’s libraries in general, they are quite excellent.

3 Likes

Yeah everyone speaks well of Rust for native code - I haven’t taken the time to learn it so worried about the cognitive load but I do know C/C++ . Should probably watch Scroggin’s talk again but I’m spending so much time on beam/infrastructure I’ve not much chance to learn a new language. Excuses. Excuses.

1 Like

Thanks @mpope some good points there. https://github.com/mpope9/exor_filter/blob/master/c_src/xor_filter_nif.c yeah good example of resource allocation/deallocation - at least it’s reasonably straightforward in BEAM compared to say for example writing a Java native extension. Good point about the dirty NIF.

1 Like

Can you share the use case for an ordered map? I super curious!

Typical use case would be to match people looking to rent or let property by geographic area. Perform a periodic rollup across all users. Remove users who had already been matched. Have tried with ETS but was only able to process 80,000 per second. Personal challenge is to find a way to match 200,000.

The BEAM languages and Rust are a natural pair – both emphasize safety and never actually crashing (in terms of an OS process).

I know what it is to be very busy and to have an eternally huge backlog. So definitely not excuses as you said, we just have only so much focused creative time per day.

But my recommendation stands for when you have the time. Rustler works very well.

Can you explain the problem a bit more? I’m not sure I understand how an map with ordered keys will help with this exactly. Maybe there are other data structures that would be suited better?

1 Like

Need to continually perform the following operation:

  1. Iterate through the list of objects, (ordered by key (monotonically incrementing integer))
    1.1 for each object:
    1.1.1 Iterate again through the list of objects, make a list of objects which can be matched with the current object
  2. delete the current object, delete the list of matched objects (both operations by key)
    2.1 send a message with the object and it’s matches to another system
    2.1 GOTO 1

Am I understanding it correctly like this:

If you have structure like this

objects = [1: a, 2: c, 3: x, 4: c, 5:c, 6: a, 7: d]

Your algorithm loops over each element, finds 1:a and searches for all a (6 in this case).
Delete 1 and 6 and send a message to some process?
And then repeat this for the remaining entries? Starting with 2: c?

Is this correct?

If it’s correct, then can’t you sort the objects by the items first and then do the other actions on them, so sorted like: `a: 1, 6 - c: 2, 4, 5 - x: 3 - d: 7]?

This brings the logarithmic complexity down from an O(n2) to an O(n log n) operation and should be much faster

  1. Are the keys the order of requests, and it’s desired to process in FIFO order?

  2. Does the list of matches also need to be returned in FIFO order?

You may also consider indexing your objects on every write, based on the property that binds distinct groups of them together across the ordered data structure.

This will increment memory usage and make each individual write more expensive - but not much so, if the right data structures are leveraged (e.g. a gb_tree for the ordered bit and a Map of lists for indexing stuff.)

The coordinated management of the two data structures (the main one plus the index) can be kept behind an API to keep it simple to use and reason about.

A theoretical example

  • The main data structure, a gb_tree with:
    [{0, %{user: :a}}, {1, %{user: b}}, {2, %{user: z}}, {3, %{user: a}}
  • The index, a Map of lists with:
    %{a: [0, 3], b: [1], c: [2]

You then iterate over the gb_tree until you find something you want to delete/process; then, to find out which other entries you need to delete, you consult the index rather than iterating over all the entries.

Then, once you’ve finished processing that particular group of objects, you can continue iterating where you left of (gb_tree does provide a function for this.)

All that it’s left is to write the bookkeeping code necessarily to ensure the two structures remain consistent with each other for every write.

2 Likes