Best practices for NIF complex data structures

Vapebringer · February 10, 2022, 8:25pm

Hello

Currently working on an existing erlang project that does quite a bit of processing on some large data structures (graph where the nodes are balanced trees) within some hot code paths. We have a pretty good setup with gb_trees but are searching to see if we can improve this code more.

This code is ingrained in current business logic (kind of a mess tbh), but we’ve profiled and found that it’s within a hot loop in a hot spot of our code, and that we might be able to reduce memory pressure and reduce the time in this loop by utilizing a slightly more performant structure.

In other languages, I would write a native extension, however that seems to not be very popular whenever I’ve searched for strategies in BEAM systems.

Here’s what I’ve gathered might be a good approach for this:
A NIF can be implemented that returns a pointer to the data structure, and we’d need to create all the methods to operate on the structure we would need to expose to our user code.
Because of timing requirements in NIFs, it might be best to write this as a dirty NIF classified as a CPU bound job, rather than adding lots of complex logic to deal with what I assume is a “pause and continue” method of execution where our NIF allows itself to be paused.

Are there any resources for this? Seeing elixir on leetcode almost made me think that someone would have already created something like this just to game that, but searching “NIF” on hex.pm I see a lot of stateless functions and libraries, not much in the case of typical container+logic object data structures (think tries, graphs, etc).