Serde_rustler - Serde Serializer and Deserializer for your Rust/Rustler NIFs

sunny-g · June 9, 2019, 5:42am

Hi everyone!

After having spent some time with Rust and Elixir, a few weeks ago I set out to learn more about NIFs and benchmarking. There’s a Rust library I use a lot called serde, which essentially defines traits and types for serializing and deserializing between native Rust types and some encoding format like JSON, CBOR, Protocol Buffers, etc. While serde_eetf exists to translate between Rust and the Erlang term format, it still requires that you to convert those binaries into terms.

So instead, I wrapped the Rust NIF library rustler with some Serde traits and out came serde_rustler, which you can use in your Rust NIFs to natively convert your Rust types into Elixir terms and vice versa! It’s also on crates.io and docs.rs, though the documentation is a little sparse.

The encoding and decoding benchmarks look extremely promising (like, insanely, unbelievably promising), and given I haven’t published much in either Rust or Elixir, I’d be really grateful if anyone could point out how to reconfigure the benchmarks or otherwise improve the code, write better tests, or polish the API.

svilen · June 12, 2019, 9:46pm

I learnt some Rust recently as well as I’m also interested in how to get started with NIFs. If you could add some examples, a guide or even a short tutorial, that would be super useful!

OvermindDL1 · June 12, 2019, 10:55pm

serde isn’t just a common (de)serializer for rust, it is overwhelmingly used in the rust ecosystem. ^.^

Very cool creation of this library though, very very nice!

I don’t suppose you could add normal erlang’s term_to_binary/binary_to_term to the encode/decode benchmarks? They are not actually that fast so I’d like to see how they rank among everything as it is?

sunny-g · June 13, 2019, 2:40am

I’m too lazy to put up a blog post, but I can walk you thru specific files that might help get you started (and someone please correct me if I get some specific detail wrong):

lib.rs: This is the Rust half of the NIF. Things to note:
- rustler_export_nifs!: this is the macro that creates the exposed functions to Elixir, where the first parameter specifies the full Elixir module atom name of the Elixir-half of the NIF, and the second defines the a list of exported functions, where SchedulerFlags::DirtyCpu signifies that this function should run in a dirty scheduler.
- the function signatures of readme and transcode: these are the function signatures of all basic NIFs.
serde_rustler_tests.ex: This is the Elixir half of the NIF. By default, all functions should return/throw NIF an error b/c the BEAM has yet to replace these functions with the real NIFs (which wont happen until compile time)
readme.ex uses the NIF created in the previous file to run the readme function from lib.rs and uses a simple doctest to assert it’s correctness. Notice the other modules defined in that file as they define Records and Structs that map directly to types defined in…
types.rs: these are just some enums and structs to test serialization and deserialization against. Note that they all derive Serialize and Deserialize as those traits define the serialization and deserialization behaviour and are required by serde. Also note the few #[serde(rename = "Elixir...")] annotations - these tell serde to rename these fields or types during serialization to this full name, b/c doing so allows serde to create atoms for those names (b/c those atoms already exist) rather than the default of creating bitstrings; the right atom names are required by Elixir to directly map these types to Records and Structs (instead of tuples and maps).
serde_rustler_tests_test.exs and test.rs define the actual tests, and lastly
benchmarks.exs define the Benchee benchmarks jobs that produced the aforementioned results.

Hope that helps!

sunny-g · June 13, 2019, 9:53pm

I can and I will! I’ll update the thread when they’re done.

@svilen forgot to tag you in the previous response.

edescourtis · August 27, 2019, 2:53pm

@sunny-g
Okay, so some feedback on serde_rustler. I am in a situation where I want to replace JSON serialization in a JSONRPC protocol over HTTP with Erlang Terms in a NIF. Unfortunately serde_rustler is not well suited for this very common use case because the terms produced are awkward to work with (tagged tuples etc). Ideally, we need a serializer that is equivalent to serializing Rust as JSON and decoding that same JSON in Elixir and not this strange format. Why? Because most repositories I work with are already doing quite a bit of polishing of the types for JSON serialization. For example for use in JSONRPC calls. When it becomes too heavy to do the JSONRPC over HTTP or a TCP protocol we immediately are tempted to turn to something like Rustler to avoid the network and serialization overhead. Now if the serialization was equivalent to the JSON one I could simply swap out the JSONRPC implementation with Jason.decode to Rustler plus some serializer and be done with it. But unfortunately, this forces me to change all of my code to read the structure. Furthermore, I have situations where the structure is simply passed through my system and should be treated as opaque but now as a result of the serialization scheme I cannot treat it as opaque because the structure has changed from what is expected by the downstream system.

Thanks for all your hard work. Please see this as constructive criticism.

Qqwy · August 27, 2019, 3:41pm

To be honest, I am happy that JSON-style (de)serialization is not the default, because JSON is less expressive (has less different kinds of types) than both Elixir and Rust.

In your specific case, what about doing the JSON deserialization in Rust (using a Rust JSON decoder), and then transforming the resulting Rust structs into the format serde_rustler/Elixir expect?

edescourtis · August 27, 2019, 3:49pm

JSON serialization is precisely what I am trying to avoid because of the overhead of having to look at every byte (versus term construction which doesn’t require looking at all the bytes). The main issue here is that the NIF is here to address performance issues from things like JSON and HTTP and JSONRPC.

Unfortunately, that is half of the use cases I encounter where a Rust NIF is actually useful. Basically the use cases fall into two categories either you are trying to link into a large project or you are doing a simple binding to an external library. But to be honest for the latter case I typically go directly to C for the NIF implementation since the complexity doesn’t justify using Rust in the majority of cases.

I am talking from a pragmatic point of view. I actually am using these tools and I see these shortcomings. The fact is serde_rustler creates more problems for me than it solves.

sunny-g · August 27, 2019, 6:16pm

@edescourtis So to make sure I understand your use case, you have a JSON-RPC server sending and receiving JSON, and you would ideally want serde_rustler to decode JSON into Erlang terms and vice versa?

Unfortunately serde_rustler is not well suited for this very common use case because the terms produced are awkward to work with (tagged tuples etc).

As-is, all the library enables is producing a specific, known-structure-at-compile-time Rust value for a given Elixir term and producing an Elixir-term equivalent for that value, at which point you can use serde attributes to configure exactly what the mapping between your Elixir terms and Rust values should be. A JSON-specific library could be written as a thin wrapper around serde_json, serde_transcode and serde_rustler, similar to what I did here for the JSON tests, but I haven’t gotten around to it yet.

Regardless, can you provide some data structure examples (or preferably code snippets) to illustrate this shortcoming?

JSON serialization is precisely what I am trying to avoid because of the overhead of having to look at every byte (versus term construction which doesn’t require looking at all the bytes). The main issue here is that the NIF is here to address performance issues from things like JSON and HTTP and JSONRPC.

But this library only addresses performance issues for NIFs performing JSON (or any kind of) (de)serialization, so it’s still unclear to me what exactly is not working for you.

Thanks for all your hard work. Please see this as constructive criticism.

Appreciate it, and I totally do but I’d still like to know what I can/should change, so please provide more details

sunny-g · August 27, 2019, 6:40pm

@edescourtis Also, I expect that if you’re seeing more tagged tuples than expected, it’s possibly because where serde_json would normally serialize struct Rgb(u8, u8, u8) as [u8, u8, u8], serde_rustler tries to preserve as much information about the Rust value by default, opting instead to serialize the value as {:Rgb, u8, u8, u8} (both as a tuple and with it’s tag as an atom, aka as a tagged tuple/Record).

I made this choice for newtype structs + variants and tuples + tuple structs + variants deliberately, but am up for discussion to change it.

OvermindDL1 · August 27, 2019, 6:57pm

Shouldn’t be too hard to add an attribute to allow people to define how to serialize something back and forth, either custom serialization or some pre-built ones, such as for struct representations or so?

edescourtis · August 27, 2019, 8:24pm

@sunny-g

@edescourtis So to make sure I understand your use case, you have a JSON-RPC server sending and receiving JSON, and you would ideally want serde_rustler to decode JSON into Erlang terms and vice versa?

No, I would simply like the structure to be essentially the same as for the JSON serialization. Meaning if I used the JSON serializer (serde_json) and then run Jason.decode/1 I would get the same value as getting an Erlang term from serde_rustler.

Regardless, can you provide some data structure examples (or preferably code snippets) to illustrate this shortcoming?

Rust struct Rgb { r: u8, g: u8, b: u8 } Elixir %Rgb{ r: u8, g: u8, b: u8 }
Rust struct Rgb { r: u8, g: u8, b: u8 } JSON {"r": u8, "g": u8, "b": u8}
Rust struct Millimeters(u8) Elixir {:Millimeters, u8}
Rust struct Millimeters(u8) JSON u8

The main issue here is when things get big and nested.

In my view it would be way better if it looked like:

Rust struct Rgb { r: u8, g: u8, b: u8 } Elixir %{r: u8, g: u8, b: u8}
Rust struct Millimeters(u8) Elixir u8

But this library only addresses performance issues for NIFs performing JSON (or any kind of) (de)serialization, so it’s still unclear to me what exactly is not working for you.

I was just explaining the common use case. A NIF avoids interprocess boundaries and network boundaries, Erlang terms avoid high serialization and deserialization overhead (this is where serde_rustler plays a role).

That said I am not saying that serde_rustler should not be able to annotate types the way it does (maybe there are use cases for that). I am simply saying that for most use cases I care about it creates problems significant enough I can’t use it at all.

edescourtis · August 27, 2019, 8:49pm

I made this choice for newtype structs + variants and tuples + tuple structs + variants deliberately, but am up for discussion to change it.

I think it would be beneficial to have both and let the developer choose which one to use.

sunny-g · September 4, 2019, 3:03am

@edescourtis

Shouldn’t be too hard to add an attribute to allow people to define how to serialize something back and forth, either custom serialization or some pre-built ones, such as for struct representations or so?

Rust struct Rgb { r: u8, g: u8, b: u8 } Elixir %Rgb{ r: u8, g: u8, b: u8 }
Rust struct Rgb { r: u8, g: u8, b: u8 } JSON {"r": u8, "g": u8, "b": u8}
Rust struct Millimeters(u8) Elixir {:Millimeters, u8}
Rust struct Millimeters(u8) JSON u8

I think it would be beneficial to have both and let the developer choose which one to use.

I believe there is already an opt-in fix, one I mentioned earlier - attaching serde attributes to your types to dictate how they should be serialized or deserialized (or, barring that, implementing Serialize and Deserialize manually for those types).

So the newtype and braced single-field structs should be tagged as such:

#[derive(Serialize)]
#[serde(transparent)]
struct Millimeters(u8);

and any enums should look like:

#[derive(Serialize)]
#[serde(untagged)]
enum Coordinates {
  TwoD(u8, u8),
  ThreeD(u8, u8, u8),
}

Let me know if that covers most of the problems. If not, I’ll look into adding a macro or two.

edescourtis · September 18, 2019, 5:00pm

Yes, I could do that if it was my code. Unfortunately, most of the time it is to integrate some code I didn’t write with some Elixir project I am working on by creating a nif library (which includes the foreign code as a submodule). The main issue I have with manually implementing Serialize and Deserialize is that these types change and cause my library to break often. This is not good for maintenance and produces a fragile environment where versions have to be matched closely and changes need to be made constantly. I would like to avoid that if possible. If someone else who owns the foreign codebase is maintaining some existing json serialization it would be nice to piggyback on top of that.