Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
XenovaΒ 
posted an update Jan 19, 2024
Post
Last week, we released πŸ€— Transformers.js v2.14, which added support for SAM (Segment Anything Model).

This means you can now generate high-quality segmentation masks for objects in a scene, directly in your browser! 🀯

Demo (+ source code): Xenova/segment-anything-web
Model: Xenova/slimsam-77-uniform

But how does this differ from Meta's original demo? πŸ€” Didn't that also run in-browser?

Well, in their demo, the image embeddings are computed server-side, then sent to the client for decoding. Trying to do this all client-side would be completely impractical: taking minutes per image! πŸ˜΅β€πŸ’«

That's where SlimSAM comes to the rescue! SlimSAM is a novel SAM compression method, able to shrink the model over 100x (637M β†’ 5.5M params), while still achieving remarkable results!

The best part? You can get started in a few lines of JavaScript code, thanks to Transformers.js! πŸ”₯

// npm i @xenova/transformers
import { SamModel, AutoProcessor, RawImage } from '@xenova/transformers';

// Load model and processor
const model = await SamModel.from_pretrained('Xenova/slimsam-77-uniform');
const processor = await AutoProcessor.from_pretrained('Xenova/slimsam-77-uniform');

// Prepare image and input points
const img_url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/corgi.jpg';
const raw_image = await RawImage.read(img_url);
const input_points = [[[340, 250]]];

// Process inputs and perform mask generation
const inputs = await processor(raw_image, input_points);
const outputs = await model(inputs);

// Post-process masks
const masks = await processor.post_process_masks(outputs.pred_masks, inputs.original_sizes, inputs.reshaped_input_sizes);
console.log(masks);

// Visualize the mask
const image = RawImage.fromTensor(masks[0][0].mul(255));
image.save('mask.png');


I can't wait to see what you build with it! πŸ€—

how to use box prompt?

Β·

Coming soon!

Thanks for this great post!

About the demo:
1/ Where did the photos in vector database were scraped about?
2/ Is the free Supabase sufficient for this demo?

Β·

I assume this is for another demo? See here for more information.

Amazing

Hey, quick question on this. I've been playing around with it and loving it. I wanted to know that if I wanted to take Metas approach and compute the image embeddings server side would I be able to use the normal sam-vit-base on the server alongside xenova\sam-vit-base on the frontend for decoding?

Β·

Hi there! I suppose you could do this with custom configs and specifying the model_file_name (see here). Feel free to open an issue on GitHub and I'll be happy to try provide example code. Alternatively, you can find the .onnx files here, then use onnxruntime-node on the server and onnxruntime-web on the client to load the models. You can use the SamProcessor (provided by transformers.js) to do the pre- and post-processing (see model card for example usage).

How to give text prompt as input? Thanks in advance