Rockchip 3588 Nerves thread

I’m putting this in its own thread as that makes more sense.

The little prehistory here is that I have a RPI5 system with Hailo 8 working, but the non AI/ video part is not quite where we want it. The Rockchip 3588 processor, with SBCs often compared to the RPI5, seems like a more capable beast. It has twice the number of processor cores, but more importantly it has a built in NPU and very capable VPU. It also have other specialized hardware which might be of interest such as GPU, jpeg encoding and decoding, 48MPx ISP with functions like [auto focus, HDR, RAW conversion, lens corrections], 8K hdmi output, 4K hdmi input, more audio functions than I’ll ever need, various IO, and then some. So I got a Radxa 5T version, which also comes in an industrial version, with the full intent of running Nerves on it in anger.

Getting this up and booting with Nerves was quite straight forward. There are two different kernel options:

  1. The manufacturers with most hardware working fine but with a custom older kernel (6.1) and many proprietary blobs for drivers. (The NPU driver itself is actually open source). Issues I ran into with this one was getting WiFi working (the Radxa 5T SBC I got has a newer chip than 6.1), and getting a GPU accelerated browser for a kiosk mode using the Mali610 GPU. I backported the WiFi so that was fine, but getting a GPU accelerated browser was not so easy.

  2. I then tested the other alternative, an open source 6.18 kernel with patches for the 3588 made by Collabora with Mesa3D graphics, Rocket driver and Teflon TFlite delegate. I believe many of these or maybe all (?) are now in mainline Linux. Anyway, getting Nerves booted was troublefree, and I also got a GPU accelerated Chromium running in kiosk mode fairly easily. Headphone sound is a bottomless mystery to me though so that is not working yet.

    The problem with this is the NPU and VPU rely on the open source drivers and kernel to work. Teflon is young and has basically just implemented the operations needed to run their test Mobilenet (if I remember correctly). I’ve written more in another thread about trying to get these open source alternatives to work with Yolo 8, and in the end I basically explored the NPU registers and poked them directly. (The LUT was surprising as both Rocket and the official documentation had the number of LUT values wrong. Further, rather than being an actual LUT table it is more like a look up interpolated graph). In the end I found that the Rocket NPU initialization does not seem to support all that is needed for all operations on the NPU, and poking registers directly was hardly a good solution. So not really a way forward.

At that point I thought I could take the original Rockchip open source NPU driver and patch that into something that would work with the open source 6.18 kernel. But it seems I can’t do that without also having to change the 6.18 kernel (which is adopted to Rocket I presume). That actually makes sense as Rockchip themselves also had to fork the their kernel from the mainline to get their combination working. But if I changed the 6.18 kernel then other parts depending on it might accidentally break now or in the future. And all this was just for the NPU… I also wanted the VPU working, and the open source alternative is not full operation there either. A better plan was needed.

So, back to square one. Instead of trying to wrestle the Rockchip NPU and VPU drivers to fit with a 6.18 open source kernel the better plan was to keep the older Rockchip 6.1 kernel with the hardware working. And then somehow massage the Mali610 GPU driver blob into providing browsers with GPU acceleration.

Internetting I discovered that the key issue had been narrowed down to some missing communication between the browser and the renderer. And someone had even made a fix! But after applying the fix the speed was just 20 fps in the browser. The hook fix did make the GPU work, but it solved it by copying frames from the GPU to the CPU and then to memory for the renderer. Hence the low speed.

The better solution would be a zero frame copying by using references instead. So far that seems to work. The browser is now GPU accelerated at around 60 fps. Video playback in the browser should be accelerated by the specialized VPU though, so that is next on the list.

So far so promising.

4 Likes

Nice to see someone actually experiment with RK3588. I did create a nerves system for rock 5B+ too and since I’m mainly interested in the VPU my only option for the kernel is rockchip 6.1 since as far as I understood there’s no drivers in the mainline kernel.

I also got an issue with the wifi and I included an out of tree driver in the buildroot config, the issue is I still need to load the kernel module manually otherwise the wifi is not detected. Did you manage to actually make it work without manual intervention ?

I hope to get some time to check the GPU & NPU.

1 Like

You can track mainline kernel progress here. VPU is being worked on 7.0-rc1 has h264 and h265 decoding enabled VP8 and AV1 were done previously but AV1 need some IOMMU work still. VP9 is in progress. Below the table is link to collabora WIP kernel with all the pending changes merged already so for the time being it may be the best one to use.

NPU also has some cleanup work pending

Is any of your nerves systems publicly available. I got couple of rock 5B+ boards since my project needs VPU both encoding and decoding as well as NPU. I am planning to make nerves system that is updated as new things are merged but it would be nice to just take one as a starting point.

1 Like

It boots just fine with WiFi enabled. I have the login hardcoded as laziness right now, but I also made a menu in the kiosk for searching for networks and adding, changing or saving the WiFi.

I spent an entire week with the Collabora, Rocket and Teflon as I did want to use the most up to date open source alternative. But I also want both an accelerated browser and full NPU working that just did not seem to be there yet. Hence the choice to stick with working hardware in the official Rockchip kernel and rather massage missing bits to work with that.

Having a go at the VPU functionality now by installing Gstreamer for testing. Hopefully also get it working through v4l2. We will see.

The nerves system is available here: GitHub - gBillal/nerves_system_rock_5b_plus: Nerves base image for Radxa Rock 5B+

3 Likes

Thanks for the link.

After a week I got the Yolo 8 working on the NPU by dropping Teflon and sidestepping Rocket for everything except initialization. I had working LUT activations, working multi-surfaces, working fused operations and interleaved outputs. But as that clean up link says there are hardcoded bits that was even in the initialization of the NPU which forced many operations to the CPU. I got down to 3-4 fps with many operations forced to the CPU. The official NPU driver should get about 30-40 fps for the same.

Hopefully Rocket and Teflon will get there, but I need this working soon, and there still seems to be some way to go.

I plan to share a few versions once I’ve tested them more. A clean boot one, a kiosk one, and one with the all the hardware I got working by then in it.

Thanks for that. There seems to be some parts there I haven’t gotten to yet so that should be helpful.

Did you get the headphone sound working? Mine currently just make hizzing sounds at me..

I just discovered that the headphone sound suddenly works. As I didn’t actually test for that lately I can only assume that the unexpected fix came along with the change of kernel. I’ll take it either way.

VPU also working now. 200+ frames per second transcoding some 4K h264 video to 1280x720 h265 format including tone mapping.

The 3588 have an ISP which, if I read the specs right, can do at least 2 x 30 pictures per second and apply various frame/ picture treatments and conversions. It would be interesting to link that up for frame processing right after a video input and before AI processing.

NPU is also working. Yolo v8s 640x640 is currently at about 16 fps, but not really optimized and with Whisper (sometimes) running too. Still some cleaning, polishing and fanciful language needed. But progress for sure.

1 Like

Is this in the form of just buildroot or do you have a prototype Nerves system?

I’d love to see pre-release or current state systems. They don’t have to be tidy or have compiled releases. Whatever progress you have would be super to be able to look at :slight_smile:

I am very tempted to pull my Orange Pi 5 Plus off the shelf. It has an HDMI Input even..

1 Like

I’ve got a Radxa 5T (also with HDMI input) running Phoenix Liveview with a slightly patched Webkit/ Cog for kiosk based GUI. I tried Chromium earlier with the 6.18 Collabora kernel and that seemed more responsive, but I thought Cog might be more suited for this. I might have been wrong.

  • It also got VLC player for testing HDMI input later and somewhat overlapping Gstreamer for video work and video acceleration inside Cog.
  • A page with system monitor for good measure at this prototyping stage.
  • Wifi connection has a page with the normal discover networks and connect with password. The password is stored until manually deleted or changed.
  • The VPU got a test page with some downloaded video samples in various sizes and formats and a simply GUI for transcoding with FPS reporting at the end.
  • And two buttons for testing sound on speakers and headphones respectively.

For testing Yolo and Whisper I figured a Youtube page would provide plenty of varied testing grounds. Which is true but with DRM protection making it a bit more complicated. Current status are buttons for activating Yolo and Whisper on the videos. Yolo got the typical rectangle outlines and text with category and confidence on the videos. I had to go to Yolo 8 with FP16 as the precision of the converted 8 bit was off for some reason. Yolo dropped to about 10 fps. Whisper is confirmed to work (sometimes) in the background, but with a quite delayed start. It seems I’ve also broken the earlier scrolling transcription overlay somehow, so that is the focus right now. I want an almost live transcription so still some way to go on that.

Looking at that juicy ISP next I think, but that also seems less documented. But being able to process frames for contrast, color and sharpness before the AI stage seems quite attractive indeed. It might also be interesting to abuse the lens correction into making only the center of the frames sharp with gradual drop off towards the edges.

Later I will test and or wrestle the I2C and PWM outputs with various sensors and stepper motor drivers, and maybe the MIPI camera and display. I’m not in the right location for that though right now.

This test setup just passed 1GB in size, but if you want to have a look at this stage you are more than welcome. I’ve probably trampled all over normal Nerves conventions with sheer ignorant joy! Not sure if the setup it is directly transferable to the Orange version but it should be close.

The shelf is no place for a perfectly good Orange Pi 5 Plus. :slight_smile:

1 Like

But you have the separation of a nerves system and a nerves application/firmware project? I’d love to try all of it but most interested in having the foundation of a nerves system :slight_smile:

The base layer and the application layer are separated. Setup/ drivers for the GPU, NPU, VPU and VintageNet are in the base layer. There might be a few dead remnants of from earlier Collabora and NPU poking adventures. Adjust to taste.

I’ve packed up a zip file with the base layer to make a small standalone package. Running a isolated test build now to make sure it is a clean standalone.

1 Like

Too much proprietary stuff for a git repo?

Not really as this is just the test setup with somewhat representative test stand-ins for now. I’ve found an old git account I can use.

The base layer only build failed at first, so the clean separation wasn’t clean enough it seems. And while I was at it I thought I would replace some Radxa specific boot files with mainline buildroot/ linux sources so there is less special Radxa sauce. That didn’t go too smoothly at first, and then Radxa had gone and changed their kernel under my feet so the link was dead.

Anyway, I’ve tried to cut back to what it takes to build the basic from basic source and avoid blobs as much necessary. Those necessary are linked to. I did a test build and it boots up, so classical it Works on my machine. I haven’t tested the GPU NPU and VPU on this one, but the same setup do work on the larger test build. It can be found at GitHub - BadBeta/base_radxa_rock_5t_nerves_GPU_NPU_VPU: Base Elixir Nerves build for Radxa Rock 5T with RK3588 GPU NPU and VPU setup

1 Like

There were some leftover after the NPU exploration adventure and the changing of boot files messed up boot. Both should be fixed now in a fresh git push. Apart from the flashing to the emmc, which require Rockchip tools, it should be the normal mix nerves commands for building and flashing. (Although I’ve only tested with the emmc so here is too hoping!)

1 Like

Awesome. Thanks for sharing. Gotta find some time to see what it would take to get on the Orange Pi :slight_smile:

Added a confirmed working HDMI input today, and some info on how to boot from Micro SD instead of Emmc.

1 Like

Updated with ISPs, a 40 IO pin setup with [I2C, SPI, PWM, ADC, serial, IO], PDM microphone contact, MIPI cameras, MIPI display, various camera models compiled into the kernel, more video encoding and decoding codecs.

I will not be able to check all until I’m back at the electronics and have some time, but it does boot, and all I can check works.

1 Like

With the exception of the cryptographic, compression and voice detection hardware I think most is covered.

I’ve started on an Elixir library to provide a layer on top of the hardware so I have access to all from Elixir land. Using circuits is a given for some IO functions, but for the things like the ISP, NPU and VPU it is less obvious how to make them Elixirish and user friendly. (Gstreamer and ffmpeg haunts me!).

The ISPs have a recipe you decide up front for each. From there it will apply that recipe to every Bayer Raw image or frame of video stream you apply it. Thus to use it later just a call to ISP(which one) seems fine to me.

For the NPU it is more complicated as there are so many different uses and models. But for inference calls on frames in a video stream I think it makes sense with a pass through piping where the just NPU grabs a copy as often as it can. For the RGA, which does overlay among other things, there need to be just a reference to the code to be used. The VPU output is basically just output size and codec. So that can be inline.

Thus for taking a video source like a CSI camera, doing some ISP noise reduction and contrast, NPU Yolo inference, adding rectangles overlay over detected objects, and using the VPU to transcode to 1280x720 h265 output would look like:

Video.from({:v4l2, “dev/video0”})
|> ISP.apply(1)
|> NPU.infer(Yolo8, objectpos_buffer)
|> RGA.overlay(rectangles, objectpos_buffer)
|> VPU.transcode(“/data/out.mp4”, :h265, {1280,720})

I think this looks ok and readable while still saying what is going on. It also makes it possible to keep track of which accelerated hardware is being used. They have capacity limits so that helps to not accidentally use the same hardware twice. (All the above is accelerated hardware only. There should hardly be any CPU activity at all).

The ISP call could maybe be improved by aliasing the ISP number with a sensible name describing that it does like ISP.apply(noisereduction_contrast), but then the coherence of that alias would have to manually tracked to match the recipe used. It might end up misleading instead. And harder to keep track of used hardware.

The various kinds of NPU models with all their different inputs and outputs is a bit of a headache to do cleanly though. Or maybe I’m overthinking it?

1 Like

Wow, you are tooling this thing up!

Would that example pipeline just be passing memory pointers or controlling a NIF? Or are you actually commanding the devices from Elixir?

For pipelines like this I think Membrane is an answer but you can always build a Membrane pipeline on top of a good base library but you can’t always build the tightest possible thing if you only have Membrane elements.

I think the Nx ecosystem has a fair number of references for how they do pointers and on-device stuff with NIFs and such to avoid copying back and forth. It has been a topic in ADBC + Explorer, in Pythonx + Nx and probably in EXLA/EMLX.

But essentially the Elixir code ends up passing around some kind of opaque reference that represents a device and memory location that is owned by a NIF typically.

The telling things when to pull a frame and where to throw it to can be pretty good in Elixir from what I’ve seen and having access to that from Elixir is ideal. But being able to tell it to just go at full clip is probably a good thing to have as well.