herisson
Running facebook's segmentation model using ortex
Hi everyone, I recently took on a personal challenge of running Meta’s Segment Anything Model in Elixir. Since it’s not supported by Bumblebee, I started using Ortex to run ONNX models.
Everything is mostly smooth, but my final masks are a bit distorted. Since it’s my first time using Nx/Ortex/ONNX, I’m struggling to figure out the problem. Could it be an issue with the ONNX model itself?
Here is the outputs I’m getting:
I’ve put my code into a gist so you can check it out.
If you have some time to take a look, I would appreciate it. I’ve run out of ideas ![]()
Thanks!
Marked As Solved
herisson
Got it to work ! Turns out the issue was from the decoder model. I exported it myself to an onnx model and now it works just fine !
Also Liked
kip
I’ve made a modified version of your Livebook to use Image for the image pipeline. I think it simplifies some of the code.
I’ve also made it so the encoder and decoder are downloaded from hugging face using req so the Livebook now works standalone. So here’s the badge!
Thanks for taking the time to help me through my lack of understanding. Its been a good learning experience.
kip
I think I can simplify some of the image pipeline using Image - any chance you can put the decoder model somewhere I can access? (I’m not comfortable regenerating myself).
Update
I cloned the sam repo and followed the instructions to generate an onnx model after downloading the default vit_h checkpoint:
% python scripts/export_onnx_model.py --checkpoint sam_vit_h_4b8939.pth --model-type default --output sam.onnx
Loading model...
Exporting onnx model to sam.onnx...
Model has successfully been run with ONNXRuntime.
But that gives me a single onnx file, not separate encoder/decoder. What am I missing? (he says, clearly illustrating he knows nothing about this domain)
herisson
Yes, their examples aren’t very comprehensive in terms of explanations. From what I understand, SAM is a “two-stage model.”
The first stage takes an image and transforms it into something the next model can understand (image embeddings). This stage, which I refer to as the encoder or vision encoder, is common in many image processing tasks, not unique to SAM. That’s why it’s not included in their export script.
The second stage, which you exported, takes the image embeddings and other inputs to produce the mask (referred to as the decoder).
I can’t upload the decoder to a repository right now as I’m not at home, but you can find the vision encoder here. Since you’ve exported the decoder, you should be set with these two models!
Hope it makes things clearer ![]()
Popular in Questions
Other popular topics
Categories:
Sub Categories:
Forums
Popular Tags
- #ecto
- #liveview
- #troubleshooting
- #learning-elixir
- #deployment
- #library
- #erlang
- #testing
- #genserver
- #mix
- #absinthe
- #remote-other
- #otp
- #plug
- #how-to-question
- #macros
- #postgres
- #channels
- #elixirconf
- #exunit
- #discussion
- #javascript
- #podcasts
- #code-sync
- #onsite
- #dialyzer
- #docker
- #authentication
- #umbrella
- #full-time-contract
- #podcasts-by-brainlid
- #ecto-query
- #elixir-ls
- #phoenix_html
- #iex
- #blog-post
- #graphql
- #genstage
- #ai
- #websockets
- #supervisor
- #advent-of-code
- #elixirconf-us
- #distillery
- #processes
- #forms
- #api
- #metaprogramming
- #security
- #performance










