Nerves RPi5 Hailo8 M.2 AI module support

alvises · December 16, 2024, 4:00pm

Hi everyone,

I’d like this thread to serve as a common place to track the ongoing development of Hailo8 M.2 support on Nerves. I know some community members are actively working on this integration.

For those unfamiliar, the Hailo8 is an AI processor, offering high-performance, low-power machine learning acceleration. It connects via PCIe to the RPi5 and is ideal for applications requiring efficient neural network inference.

With the hailo-driver branch of the official Nerves Raspberry Pi 5 system repository (nerves_system_rpi5), the Hailo8 device is successfully recognized when connected via PCIe. I think it needs some firmware file though.
Additionally, Gus is working on integrating HailoRT (Hailo’s runtime SDK) into Nerves. Here’s the repo GitHub - gworkman/hailo_rpi5 at add_hailo

D4no0 · December 16, 2024, 4:04pm

What kind of applications we talk about? Training models on this hardware or using it to serve pre-trained models?

alvises · December 16, 2024, 4:07pm

Mostly inference, I think it should be great for things like vision, robotics etc… Yolov8n runs at 55fps (~8ms inference time) on it (compared to MacBook Air M3 which runs the same model at ~14ms).

D4no0 · December 16, 2024, 4:10pm

Damn, really nice! It’s great that hardware for this became so accessible, 5 years ago this kind of technology would have costed you an arm and a leg.

alvises · December 16, 2024, 4:12pm

Absolutely agree! I’ve just added a picture to the first post to highlight how compact the form factor is.

joevandyk · June 25, 2025, 11:59pm

Any updates on this? I’m looking at doing a computer vision project (trying to setup cameras to detect certain activities) and wondering if this would be useful.

lawik · June 26, 2025, 3:55pm

It exists and works. Kind of an early version but Paulo and Vittoria took the efforts Cocoa, @gus and I put into making the driver work, swore at it, cursed our names, made it work again, gave us credit and then implemented the Elixir parts.

Then they made a yolo model work with some help from @alvises.

They showed it all off in Stockholm recently.

Their repo:

Most news like this lands in the Nerves Newsletter even when I don’t remember to find all places it has been discussed to update on the situation

alfredfriedrich · June 30, 2025, 8:41pm

Any chance that the talk was recorded and will be published? I see the keynotes are available on YT, but no further recordings.

lawik · July 1, 2025, 6:08am

They usually release keynotes early and then talks at some schedule. Will check if they can push it up the queue since it is a bit timely.

fhunleth · July 8, 2025, 9:49pm

The Hailo AI accelerators are now supported in Nerves out-of-the-box on Raspberry Pi 5’s with the AI hat or M.2 AI module. The nx_hailo repository has been updated to reference it and there’s a short mention in the nerves_system_rpi5 release notes.

lawik · July 9, 2025, 5:45am

I think they should be usable with a Pi4 as well. Though I think it requires a different thing for the PCI/PCIe bus. I believe Seeed Studio had some charts about performance for that.

Probably not worth shipping by default though.

alvises · September 5, 2025, 11:33am

Just made this screencast showing how to run object detection models on Nerves + Rpi5 + Hailo8L with the NxHailo library.

alfredfriedrich · November 29, 2025, 10:37pm

I recreated your demo from the last Nerves EU Meetup, using your livebooks as reference. I am running Phoenix on top of it instead of livebook to display the camera picture and I am getting realtime images from the camera if I run it without inference.

But with inference (which takes around 75ms) the delay adds up too much to be of any use.

Any idea on how to debug this further? My setup is pretty much the same, calling NxHailo.Hailo.infer which calls the NIF which calls the C source.

lawik · November 30, 2025, 8:59am

Does this mean you are performing the inference before displaying the image? It sounds like it.

There are many ways of making these things more or less performant. I would try a fairly simple one which is to display the image immediately and do the inference in parallel. It may trail a bit but you get the image immediately. And Livebook can draw bounding boxes for you.

If you can pull the image from the camera at a smaller resolution than what you display for inference that would reduce the amount of processing. That will also matter assuming the camera can keep up or deliver multiple streams.

The next thing I would look at is reducing any extra copies for the inference part. I believe evision can pull the data directly into an Nx data structure. Hopefully that is already being used. I haven’t looked at the details.

alfredfriedrich · November 30, 2025, 7:52pm

Hallo lawik ,

thanks you for your reply. Yes, the LiveView recieves a message with the camera frame (640x360@25fps) and processes it with the NIF. The detected objects are the drawn onto the image frame, base64 encoded and then assigned to the source in the LiveView.

For now I have implemented frame dropping, and this gets me around 11-13 fps.

I also tried to display the frames without processing, when it already processes the frame before, instead of dropping it, but this resulted in a flicker effect and was not watchable.

And yes, Evision.Mat.to_nx() is used before the Hailo code is called and I also timed the 3 steps preproc (resize, convert to Nx,..), inference and postproc (draw boxes) and inference takes around 70ms here.

I am not sure I understand fully your suggestion regarding displaying the unprocessed frame immediately and assigning the processed frame after inference has completed, as there would have been already another (newer) frame be displayed. With 25 fps I have 40ms max for converting, inference and displaying a single frame.

Right now 10+ fps is a good starting point for me, but I will look more into the Hailo docs and sources and also trying different resolutions and checking my Nx setup.

Edit: using the yolo8s model instead of the yolo8m gives me around 18fps (I am using the Hailo-8L chip).

lawik · December 1, 2025, 10:12pm

You could perform inference at ~14 fps with that level of performance. You can still show video at a high level of fps.

But only grab a new frame for inference when the inference has completed on the previous one. If you use LiveView or another side-channel to control bounding boxes in the UI you don’t have to draw them on the image and you can show the unprocessed image but update bounding boxes at ~14 fps while the video rolls at a normal frame rate.

This means the bounding boxes shown are up to 70ms out of sync but I’m not sure that’s even noticeable. So don’t delay the image, only delay the inference result. Let me know if I’m not making sense

Certainly better if you can run it faster. But I don’t know the inference bits particularly well.

alfredfriedrich · December 15, 2025, 7:41pm

Hello Lars,

sorry I did not reply earlier, to be honest I did not understand it fully, and my RPi / HailoAI project got interrupted by YASP - Yet Another Side Project

I re-read your post just now, and suddenly you make perfectly sense to me. I mean I’m not sure if it fits for a fast-moving/changing video, but I want to try out at least your idea and see how it performs. Regarding your idea of displaying a high resolution picture than using for inference, this xould also be dynamically adjusted at runtime, e.g. if the performance drops below 10fps, I could switch to a lower resolution. I like that!

It will be some time till I can tinker with it again, I will post about it in the forum when I finally tried it out.

Thank you again for the clarification and the ideas.

Vidar · February 3, 2026, 12:27am

I made a setup with the Raspberry 5 and the Hailo-8 (26 tops) including discovery and control of a GigE camera. I get about 14 frames per second interference rate using 640x640 RGB8 resolution with YOLOv8n.

The heavy frame processing and encoding are all Rust NIFs that run in parallel: Receiving the GigE camera packets and color converting them before assembly into full frames directly inside a ring buffer. No copying of anything. The ring buffer then feeds three other processes: 1. A 24fps encoder MJpeg stream with low latency which uses the pictures as feedback for a physical control setup. 2. A 24fps Rust H264 encoder that feeds a Membrane HLS web stream (with higher latency) for remote user viewing. And 3. Whenever the interference asks for it a frame is grabbed from the ring buffer, resized, and turned into an NX tensor for interference.

All user facing UI and controls are Phoenix Liveview based. The 24fps video rates could certainly be faster, but we have no need for more and less power use and heat is positive.