YOLO - Real-Time Object Detection Simplified

alvises · December 6, 2024, 6:37pm

Hello everyone!

I’m excited to introduce my first Elixir library: YOLO, a library designed to make real-time object detection accessible and efficient within the Elixir ecosystem. Whether you’re working on a hobby project or a production-grade application, this library provides a simple way to integrate the power of YOLO (You Only Look Once) object detection.

What is YOLO?

YoloV8x

YOLO is a state-of-the-art system for detecting objects in images or videos. It is widely used for applications like monitoring, automation, and robotics due to its balance of speed and accuracy. YOLO enables developers to use YOLO models seamlessly in Elixir, with a focus on ease of use and extensibility.

Key Features

Speed: Optimized for real-time performance, processing an image. with the YoloV8n model, to a list of detected objects in just 38ms on a MacBook Air M3 using EXLA and the companion library YoloFastNMS.
Ease of Use: Get started with just a two function calls to load models and detect objects.
Extensibility: Built around a YOLO.Model behavior, supporting YOLOv8 models and paving the way for future models or custom extensions.
NIF Optimization: For those needing ultra-fast post-processing, an optional Rust NIF (YoloFastNMS) speeds up Non-Maximum Suppression by ~100x compared to the internal YOLO.NMS implementation using Elixir and Nx.

How to Get Started

Begin by generating the ONNX model using the provided Python script. Here’s how to do it.
Install the library and call YOLO.load/1 to load model effortlessly.
Load an image and perform object detection with a single call to YOLO.detect/3

It’s that straightforward!

Current Limitations and Future Plans

The current implementation supports YOLOv8 models with a fixed 640x640 input size (even though YOLOv8x6 supports 1280x1280 images) and a fixed 84x8400 output size. This setup handles 80 classes from the COCO dataset and 8400 detections.

The library is designed to be extensible through the YOLO.Model behaviour, allowing other YOLO versions or custom model implementations to be added in the near future.

One of the next goals is to support models with different input and output sizes. This update would allow the library to work with YOLO models trained on other datasets or even custom datasets, making it more flexible and useful.

Links

greven · December 8, 2024, 3:02am

Amazing work. Will for sure try it out. Had plans already to get back into computer vision again and this seems like a great excuse.

alvises · January 24, 2025, 4:53pm

I promised myself to make a quick 5-10 min video showcasing how the YOLO library works… ended up with a 36-minute deep dive! Turns out I get a bit carried away when talking about object detection.

But hey, at least you get to see everything from basic usage to performance optimization, live demos, and future plans. Hope you find it useful despite my complete failure at being concise!

mindreframer · January 26, 2025, 1:22pm

@alvises Is it allowed to use those models in commercial applications? I think YOLO models from ultralytics are GPL licensed, if I’m not mistaken.

There is a newer alternative with a more permissive licence, that might be of interest to you. Sadly it’s not so well documented, so it might be harder to integrate with.

alvises · January 29, 2025, 4:38pm

Thanks for bringing this up - yeah, that’s the issue with Ultralytics licensing. I’ll be adding YOLOX model support (Apache 2 licensed) in the next release thanks to @aspett great PR. I’m just taking the time to study the YOLOX architecture details first.

belgoros · February 5, 2025, 12:08pm

@alvises I wonder if the library can be used as a part of an API end-point in a Phoenix API application that can be hit by a front-end application like Angular for example? If so, what should be taken into consideration?

alvises · February 5, 2025, 2:06pm

Sure, the library can be used for that. However, what you need to consider depends largely on the purpose of the app’s use case and traffic. YOLO can take between 1ms to 1s per image depending on the model and hardware. While it’s optimized and can be accelerated, you’ll need to factor in request volume, hardware capabilities, and performance needs for your specific scenario.

Could you share more details about the app’s purpose and expected usage?

belgoros · February 5, 2025, 2:32pm

@alvises thank you for the response.
Well, the initial change request is just to study a use case to detect an image to be uploaded by Angular (actual frontend choice of the team), the backend is mostly in Java.
I had an idea to smoothly introduce Elixir to the stack
Their initial idea was to estimate the use of one of theses 2 libraries:

tesseact (open-sourced)
Google Vision API (licensed)

So I started looking here and there to bring another brick to the stack and found:

Yolo Elixir
Bumblebee (demoed by @josevalim) almost 2 years ago.

Sure, I’ll have to ask all these questions you previously mentioned.
Ideally, I’d like to present a kind of POC to demonstrate how it is supposed to work in a basic scenario with a simple API request.
Actually I’m here

alvises · February 5, 2025, 3:01pm

Nice! Ping me if you need a help building the POC.

btw, in the next few weeks I hope to release an updates that allows custom models, which could be handy for your proof of concept app

cblavier · April 25, 2025, 8:59pm

Hey Thanks for your library, it looks wonderful!

Can I use it to perform YOLO object segmentation? (to get accurate polygons of the object shape instead of a bounding box)

alvises · April 26, 2025, 10:59am

Thanks for the idea! It’s definitely possible to extend the library to do segmentation, and other things like pose detection, object tracking etc. Can you please create a github issue in the repo suggesting to add segmentation?

cblavier · April 26, 2025, 12:14pm

Support object segmentation · Issue #12 · poeticoding/yolo_elixir · GitHub