Arrays - Fast and versatile arrays with swappable implementations

Qqwy · May 27, 2022, 3:26pm

I’ve been doing some more work on ArraysRRBVector:

Now that there is a reasonably simple way to call an Elixir function from within Rust (c.f. RustlerElixirFun: repo, topic), it seemed sensible to try this out by implementing a new version of map (as in Enum.map, but keeping it as an array) for this native array datastructure.

The new approach iterates over the leaf nodes in the reduced-radix-binary tree, which (when fully filled, which all except the last one are), each contain 64 elements. This means that:

We call into Elixir only ~~O(log_64(n))~~ times if you have an array with n elements. EDIT: Actually, this is incorrect as we do not drill down to a single leaf node, but rather iterate over all leaves. Thus, we still run in O(n/64) = O(n) time. Still a significant reduction of function call overhead though.
We only convert between Rust datastructures and Elixir datastructures 64 elements at a time. This greatly reduces memory usage if you have very big vectors (as we do not need to create side-by-side copies of the full vector anymore).

I have not done any benchmarks yet. I’m pretty sure that it is still not as fast as staying in Elixir the whole time (when e.g. using an ErlangArray, MapArray or Aja.Vector), but it definitely is faster than the old approach of turning the whole vector fully into a list, mapping over that in Elixir, and then turning it back.

Most importantly, it might be a nice example for people of you to use RustlerElixirFun to call Elixir code from inside Rust in their own projects.