herisson

herisson

Running facebook's segmentation model using ortex

Hi everyone, I recently took on a personal challenge of running Meta’s Segment Anything Model in Elixir. Since it’s not supported by Bumblebee, I started using Ortex to run ONNX models.

Everything is mostly smooth, but my final masks are a bit distorted. Since it’s my first time using Nx/Ortex/ONNX, I’m struggling to figure out the problem. Could it be an issue with the ONNX model itself?

Here is the outputs I’m getting:

I’ve put my code into a gist so you can check it out.

If you have some time to take a look, I would appreciate it. I’ve run out of ideas :frowning:

Thanks!

Marked As Solved

herisson

herisson

Got it to work ! Turns out the issue was from the decoder model. I exported it myself to an onnx model and now it works just fine !

Also Liked

kip

kip

ex_cldr Core Team

I’ve made a modified version of your Livebook to use Image for the image pipeline. I think it simplifies some of the code.

I’ve also made it so the encoder and decoder are downloaded from hugging face using req so the Livebook now works standalone. So here’s the badge!

Run in Livebook

Thanks for taking the time to help me through my lack of understanding. Its been a good learning experience.

kip

kip

ex_cldr Core Team

I think I can simplify some of the image pipeline using Image - any chance you can put the decoder model somewhere I can access? (I’m not comfortable regenerating myself).

Update

I cloned the sam repo and followed the instructions to generate an onnx model after downloading the default vit_h checkpoint:

% python scripts/export_onnx_model.py --checkpoint sam_vit_h_4b8939.pth --model-type default --output sam.onnx
Loading model...
Exporting onnx model to sam.onnx...
Model has successfully been run with ONNXRuntime.

But that gives me a single onnx file, not separate encoder/decoder. What am I missing? (he says, clearly illustrating he knows nothing about this domain)

herisson

herisson

Yes, their examples aren’t very comprehensive in terms of explanations. From what I understand, SAM is a “two-stage model.”

The first stage takes an image and transforms it into something the next model can understand (image embeddings). This stage, which I refer to as the encoder or vision encoder, is common in many image processing tasks, not unique to SAM. That’s why it’s not included in their export script.

The second stage, which you exported, takes the image embeddings and other inputs to produce the mask (referred to as the decoder).

I can’t upload the decoder to a repository right now as I’m not at home, but you can find the vision encoder here. Since you’ve exported the decoder, you should be set with these two models!

Hope it makes things clearer :slight_smile:

Where Next?

Popular in Questions Top

New
senggen
Erlang/OTP 25 [erts-13.2.2] [source] [64-bit] [smp:8:8] [ds:8:8:10] [async-threads:1] 15:22:35.803 [error] gen_event {lager_file_backend...
New
skosch
To my knowledge, put_in, Map.update etc. all have the one limitation of not automatically creating intermediate keys when needed (for exa...
New
greenz1
I have a phoenix application from which a user can download multiple(5-6) files of size 1MB. I couldn’t find anything related to sending ...
New
shahryarjb
Hello, I have map which I want to convert it to string like this: the map: %{last_name: "tavakkoli", name: "shahryar"} the string I ne...
New
Fl4m3Ph03n1x
About me? ( if you have nothing better to do than reading about some random guy in the internet :stuck_out_tongue: ) Hello all, this is ...
New
jerry
Good day to you all. I have been struggling to get a query involving like and ilike to work. Can anyone assist me on this, please? pro...
New
vonH
When I run the Plug and I recompile I wind up having to use Ctrl C to quit iex and start again. Witht the help of rlwrap I can use the cu...
New
Brian
What is the proper way to load a module from a file in to IEX? In the python world, doing something like this pretty standard: from ....
New
WestKeys
Currently suffering from paralysis by [HTTP client] analysis. This is rather unusual in Elixirland as there tends to be consensus on the ...
New

Other popular topics Top

senggen
Erlang/OTP 25 [erts-13.2.2] [source] [64-bit] [smp:8:8] [ds:8:8:10] [async-threads:1] 15:22:35.803 [error] gen_event {lager_file_backend...
New
TunkShif
This post is an instruction guide to help you setup your Neovim for Elixir development from scratch. It includes general information on h...
274 41539 114
New
ovidiubadita
Hey all, I discovered Elixir and I love it. I always wanted to learn a functional programming and I intended to go for Haskell, but afte...
New
jay1
Why is it that the mnesia database isn’t the most preferred database for use in Elixir/Phoenix?
New
pmjoe
I have a relationship of love and hate with Elixir. Lots of things are just absolutely right, but there are some things that are kind of ...
New
grych
Hi folks, Few months ago I have announced the proof-of-concept of the library to manipulate the browsers DOM objects directly from Elixi...
639 52341 488
New
ashish173
I am using Ecto timestamps with postgres, I can see the timestamps() use the :naive_dateime but for my use case I wanted to store the ti...
New
WestKeys
Currently suffering from paralysis by [HTTP client] analysis. This is rather unusual in Elixirland as there tends to be consensus on the ...
New
dogweather
I wrote this comment on r/haskell, and it’s not popular there. :wink: But I think I’m on to something… Haskell reminds me of Java, and e...
New
svb
Hi! Currently I want to submit a form by pressing the Enter key. However, since my input field is of type “textarea” this is just adds a...
New

We're in Beta

About us Mission Statement