NIF vs Port Driver

cone10 · July 7, 2021, 11:12am

Hi folks, I’m a tad surprised to find no debate on using port drivers vs NIFs. Perhaps I am using the wrong search words; if so, I’d much appreciate pointers.

Can you please comment on the following?

As far as I understand, port drivers and NIFs are both in-process, so a crash can kill both. Yes?
I don’t understand why the Erlang community makes such a fuss about the dangers of linked in native code, esp now that there are dirty schedulers. Yes, the process can crash, but so what? 99% of Erlang code out there isn’t running in an MRI or space shuttle. All python extensions are natively linked, and they get on just fine; crashes rarely happen, and when they do, it is fine to restart it manually or some external infrastructure. That’s what Amazon and Whatsapp do anyway. There is no need to be so skittish. Comments?
Port drivers are like python’s ctypes. They allow you to do all the wrapping of C data types in Erlang space, and then you can directly call a dynamically loaded function. Is this understanding correct?
Port drivers are always asynchronous. NIFS are synchronous by default. Yes?
Is there a reason why one should prefer NIFs over port drivers or vice-versa? Can the latter’s functions be selectively scheduled on a dirty scheduler? One reason I can think of is that with port drivers, I cannot call C++ code (because I don’t want to deal with mangled names).

Any tips/tricks/advice/pushback most appreciated. Thank you much.

dimitarvp · July 7, 2021, 12:22pm

Not really, a Port dying will simply send an event to a monitoring process on the Erlang side. Many people use this technique to restart intermittently failing external tools, with great success.

Apparently the Erlang/Elixir world disagrees with you not caring if the BEAM VM will crash?

Can’t speak for everyone but for 5+ years with Elixir I’ve only had a handful of full VM crashes and they were all the fault of external C libraries or tools. The BEAM VM’s promise of being rock solid does deliver so I see no reason to deliberately sabotage that with native code that can crash at any time.

If I were to do that then why would I need Erlang/Elixir in the first place? Better off go write your thing in a more mainstream language where you can get tons more help on StackOverflow / Reddit / Twitter / everywhere else.

The BEAM VM – and thus, Erlang/Elixir and many other BEAM languages – has an unique value offering and people choose the ecosystem for that offering, and stick to its advantages.

I don’t remember all of it but I’d say not exactly. A Port is basically a pipe with which you communicate with something external. Proper (de)serialization is IMO not guaranteed and depends on both sides. But I might be wrong here, somebody else should chime in.

Yes. I’ve myself made a Rust NIF that does asynchronous work and returns a handle to you which you can poll in the future. But that’s a workaround because yes, NIFs are always synchronous.

Ports are what you can think of a tool through which you communicate with another program using the UNIX | pipe or special files that contain inputs and outputs. So yep, they are async by nature, although you can still block on reading them.

For the things I am working on I prefer NIFs due to performance. I only create and use Rust ones because Rust is also almost crash-proof and is a natural pair with the BEAM VM.

People usually use Ports when they need to interface with an external program whose functionality you cannot easily (or at all) duplicate in Erlang/Elixir.

So at least in my experience – which is not universal or all-encompassing, of course – Ports are mostly used when you don’t have much other choice. I’d usually work really hard to get the job done with a Rust NIF first.

ausimian · July 7, 2021, 12:39pm

I think port ‘drivers’ are distinct from ports and are linked into the vm, exposing a driver-like interface, typically for handling i/o via the driver interface. They are expected to complete their callbacks quickly and are asynchronous by nature.

A simple driver is substantially more complex than a simple nif.

The erlang documentation has explanations and examples but I’m on mobile and don’t have the links sorry

ausimian · July 7, 2021, 12:43pm

I should add that when I say in ‘linked into the vm’ I should have said ‘runs in the same address space’. Most third party port drivers would be dynamically linked, but it is possible to recompile the vm statically against your port driver.

dimitarvp · July 7, 2021, 12:50pm

Yeah, I knew I was missing part of the picture. Thanks for chiming in!

ausimian · July 7, 2021, 1:21pm

Just to respond to your questions more directly…

Yes they can both crash the VM.
I think it’s a fair point that under current deployment practices (containers etc), restarting the whole vm is perfectly feasible, and ultimately required anyway. There’s a couple of other considerations though. First, the runtime guarantees are weakened e.g. if I have an erlang vm instance hosting many thousands of client connections, I very much value the ‘error containment’ model the beam offers. Second, the cognitive load of development goes up once I have native code running in-process. If I can reasonably avoid that, I will.
I can’t comment on this too much, I don’t know python that well. Port drivers are more about wrapping c-types though in the sense that they are active/reactive components of your vm, not ‘just’ data wrappers.
Yes.
In principle, providing the exported symbols of your port driver are not mangled (i.e. extern “C”), you should be able to write your port driver in c++, but take extra care to ensure c++ exceptions do not escape from your driver’s entry-points.

cone10 · July 7, 2021, 1:24pm

Thank you for your response.

Not really, a Port dying will simply send an event to a monitoring process on the Erlang side. Many people use this technique to restart intermittently failing external tools, with great success.

No, that’s a port not a port driver (also called c_portdriver).

Can’t speak for everyone but for 5+ years with Elixir I’ve only had a handful of full VM crashes and they were all the fault of external C libraries or tools. The BEAM VM’s promise of being rock solid does deliver so I see no reason to deliberately sabotage that with native code that can crash at any time.

Anecdotally, I have used Python with a wide variety of libraries over the past 22 years, and can’t remember being plagued with crashes. Millions of people use opencv, TensorFlow etc. with Python, but aren’t swept off by the fear of the python VM going down. As I said, most work is non-critical in the larger scheme of things, and crashes are rare for solid libraries.

As for performance, I don’t see any data on port_drivers vs NIFs, but I suspect (like you say), the latter should perform better.

I am wrapping opencv, so alas, Rustler is not an option. I was trying to reduce the amount of boiler plate code and thought that the port-driver option was more succinct, but I’m now convinced that it isn’t for this project, because it is more suited to C than it is to C++. With port-driver, one will directly have to do with name mangling. ugh.

cone10 · July 7, 2021, 1:37pm

Thanks for your response.

The first part doesn’t worry me at all. If I have a rock-solid well-tested C library, to me calling it natively is no different from relying on the built-in TCP driver, for example.

I am interested in hearing about the second part though. Assuming that you need the native code for one reason or the other, is it not better to have it in-process than in a separate process. I grant you that one does have to keep an eye on potentially long-running functions, for which there is always the explicit option of specifying dirty schedulers.

ausimian · July 7, 2021, 1:53pm

It depends . When making this choice (in-process or out), you’d have to weigh up:

The stability of the native code
The extent to which it will need dirty scheduling
The runtime cost of running it out of proc (# call frequency, size of data in/out)
The relative developments costs of in-proc / out-of-proc.

fwiw, I guess my personal default position would be to look at running it out-of-proc, maybe as a c_node, but it’s entirely possible that considering the factors above, I’d switch to in-proc.

cone10 · July 7, 2021, 2:19pm

The stability of the native code

Oh yes. I wouldn’t last too long in that project if I am forced to use a non-solid base

The extent to which it will need dirty scheduling

It’s just a bit-flag difference, so it isn’t a big decision.

The runtime cost of running it out of proc (# call frequency, size of data in/out)

For me, it is the friction of writing an external C-node and the amount of wrapper boilerplate. But I have never done it, so perhaps the coding friction isn’t much.

The relative developments costs of in-proc / out-of-proc.

Is there that much of a difference in development? As with the prev. answer, it seems to me that there is less boilerplate, if anything in in-proc. The out-proc version has all the advantages of micro-services (independent development and deployment, better monitorability) at the expense of speed.

Thanks for engaging and making me think aloud.