codito

codito
ยท

AI & ML interests

None yet

Recent Activity

liked a dataset 6 months ago
nvidia/HelpSteer2
reacted to tomaarsen's post with ๐Ÿ”ฅ 6 months ago
๐Ÿ“ฃ Sentence Transformers v3.2.0 is out, marking the biggest release for inference in 2 years! 2 new backends for embedding models: ONNX (+ optimization & quantization) and OpenVINO, allowing for speedups up to 2x-3x AND Static Embeddings for 500x speedups at 10-20% accuracy cost. 1๏ธโƒฃ ONNX Backend: This backend uses the ONNX Runtime to accelerate model inference on both CPU and GPU, reaching up to 1.4x-3x speedup depending on the precision. We also introduce 2 helper methods for optimizing and quantizing models for (much) faster inference. 2๏ธโƒฃ OpenVINO Backend: This backend uses Intel their OpenVINO instead, outperforming ONNX in some situations on CPU. Usage is as simple as `SentenceTransformer("all-MiniLM-L6-v2", backend="onnx")`. Does your model not have an ONNX or OpenVINO file yet? No worries - it'll be autoexported for you. Thank me later ๐Ÿ˜‰ ๐Ÿ”’ Another major new feature is Static Embeddings: think word embeddings like GLoVe and word2vec, but modernized. Static Embeddings are bags of token embeddings that are summed together to create text embeddings, allowing for lightning-fast embeddings that don't require any neural networks. They're initialized in one of 2 ways: 1๏ธโƒฃ via Model2Vec, a new technique for distilling any Sentence Transformer models into static embeddings. Either via a pre-distilled model with `from_model2vec` or with `from_distillation` where you do the distillation yourself. It'll only take 5 seconds on GPU & 2 minutes on CPU, no dataset needed. 2๏ธโƒฃ Random initialization. This requires finetuning, but finetuning is extremely quick (e.g. I trained with 3 million pairs in 7 minutes). My final model was 6.6% worse than bge-base-en-v1.5, but 500x faster on CPU. Full release notes: https://github.com/UKPLab/sentence-transformers/releases/tag/v3.2.0 Documentation on Speeding up Inference: https://sbert.net/docs/sentence_transformer/usage/efficiency.html
View all activity

Organizations

None yet

codito's activity

reacted to tomaarsen's post with ๐Ÿ”ฅ 6 months ago
view post
Post
7127
๐Ÿ“ฃ Sentence Transformers v3.2.0 is out, marking the biggest release for inference in 2 years! 2 new backends for embedding models: ONNX (+ optimization & quantization) and OpenVINO, allowing for speedups up to 2x-3x AND Static Embeddings for 500x speedups at 10-20% accuracy cost.

1๏ธโƒฃ ONNX Backend: This backend uses the ONNX Runtime to accelerate model inference on both CPU and GPU, reaching up to 1.4x-3x speedup depending on the precision. We also introduce 2 helper methods for optimizing and quantizing models for (much) faster inference.
2๏ธโƒฃ OpenVINO Backend: This backend uses Intel their OpenVINO instead, outperforming ONNX in some situations on CPU.

Usage is as simple as SentenceTransformer("all-MiniLM-L6-v2", backend="onnx"). Does your model not have an ONNX or OpenVINO file yet? No worries - it'll be autoexported for you. Thank me later ๐Ÿ˜‰

๐Ÿ”’ Another major new feature is Static Embeddings: think word embeddings like GLoVe and word2vec, but modernized. Static Embeddings are bags of token embeddings that are summed together to create text embeddings, allowing for lightning-fast embeddings that don't require any neural networks. They're initialized in one of 2 ways:

1๏ธโƒฃ via Model2Vec, a new technique for distilling any Sentence Transformer models into static embeddings. Either via a pre-distilled model with from_model2vec or with from_distillation where you do the distillation yourself. It'll only take 5 seconds on GPU & 2 minutes on CPU, no dataset needed.
2๏ธโƒฃ Random initialization. This requires finetuning, but finetuning is extremely quick (e.g. I trained with 3 million pairs in 7 minutes). My final model was 6.6% worse than bge-base-en-v1.5, but 500x faster on CPU.

Full release notes: https://github.com/UKPLab/sentence-transformers/releases/tag/v3.2.0
Documentation on Speeding up Inference: https://sbert.net/docs/sentence_transformer/usage/efficiency.html
  • 1 reply
ยท
reacted to m-ric's post with ๐Ÿ”ฅ 7 months ago
view post
Post
3398
๐Ÿ”ฅ ๐๐ฐ๐ž๐ง ๐ซ๐ž๐ฅ๐ž๐š๐ฌ๐ž๐ฌ ๐ญ๐ก๐ž๐ข๐ซ ๐Ÿ.๐Ÿ“ ๐Ÿ๐š๐ฆ๐ข๐ฅ๐ฒ ๐จ๐Ÿ ๐ฆ๐จ๐๐ž๐ฅ๐ฌ: ๐๐ž๐ฐ ๐’๐Ž๐“๐€ ๐Ÿ๐จ๐ซ ๐š๐ฅ๐ฅ ๐ฌ๐ข๐ณ๐ž๐ฌ ๐ฎ๐ฉ ๐ญ๐จ ๐Ÿ•๐Ÿ๐!

The Chinese LLM maker just dropped a flurry of different models, ensuring there will be a Qwen SOTA model for every application out there:
Qwen2.5: 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B
Qwen2.5-Coder: 1.5B, 7B, and 32B on the way
Qwen2.5-Math: 1.5B, 7B, and 72B.

And they didn't sleep: the performance is top of the game for each weight category!

๐Š๐ž๐ฒ ๐ข๐ง๐ฌ๐ข๐ ๐ก๐ญ๐ฌ:

๐ŸŒ All models have ๐Ÿญ๐Ÿฎ๐Ÿด๐—ธ ๐˜๐—ผ๐—ธ๐—ฒ๐—ป ๐—ฐ๐—ผ๐—ป๐˜๐—ฒ๐˜…๐˜ ๐—น๐—ฒ๐—ป๐—ด๐˜๐—ต

๐Ÿ“š Models pre-trained on 18T tokens, even longer than the 15T of Llama-3

๐Ÿ’ช The flagship ๐—ค๐˜„๐—ฒ๐—ป๐Ÿฎ.๐Ÿฑ-๐Ÿณ๐Ÿฎ๐—• ๐—ถ๐˜€ ~๐—ฐ๐—ผ๐—บ๐—ฝ๐—ฒ๐˜๐—ถ๐˜๐—ถ๐˜ƒ๐—ฒ ๐˜„๐—ถ๐˜๐—ต ๐—Ÿ๐—น๐—ฎ๐—บ๐—ฎ-๐Ÿฏ.๐Ÿญ-๐Ÿฐ๐Ÿฌ๐Ÿฑ๐—•, ๐—ฎ๐—ป๐—ฑ ๐—ต๐—ฎ๐˜€ ๐—ฎ ๐Ÿฏ-๐Ÿฑ% ๐—บ๐—ฎ๐—ฟ๐—ด๐—ถ๐—ป ๐—ผ๐—ป ๐—Ÿ๐—น๐—ฎ๐—บ๐—ฎ-๐Ÿฏ.๐Ÿญ-๐Ÿณ๐Ÿฌ๐—• ๐—ผ๐—ป ๐—บ๐—ผ๐˜€๐˜ ๐—ฏ๐—ฒ๐—ป๐—ฐ๐—ต๐—บ๐—ฎ๐—ฟ๐—ธ๐˜€.

๐Ÿ‡ซ๐Ÿ‡ท On top of this, it ๐˜๐—ฎ๐—ธ๐—ฒ๐˜€ ๐˜๐—ต๐—ฒ #๐Ÿญ ๐˜€๐—ฝ๐—ผ๐˜ ๐—ผ๐—ป ๐—บ๐˜‚๐—น๐˜๐—ถ๐—น๐—ถ๐—ป๐—ด๐˜‚๐—ฎ๐—น ๐˜๐—ฎ๐˜€๐—ธ๐˜€ so it might become my standard for French

๐Ÿ’ป Qwen2.5-Coder is only 7B but beats competing models up to 33B (DeeSeek-Coder 33B-Instruct). Let's wait for their 32B to come out!

๐Ÿงฎ Qwen2.5-Math sets a new high in the ratio of MATH benchmark score to # of parameters. They trained it by "aggregating more high-quality mathematical data, particularly in Chinese, from web sources, books, and codes across multiple recall cycles."

๐Ÿ“„ Technical report to be released "very soon"

๐Ÿ”“ All models have the most permissive license apache2.0, except the 72B models that have a custom license mentioning "you can use it for free EXCEPT if your product has over 100M users"

๐Ÿค— All models are available on the HF Hub! โžก๏ธ Qwen/qwen25-66e81a666513e518adb90d9e
  • 2 replies
ยท
reacted to gabrielmbmb's post with ๐Ÿ”ฅ 8 months ago
view post
Post
3590
Just dropped magpie-ultra-v0.1! The first open synthetic dataset generated with Llama 3.1 405B. Created with distilabel, it's our most advanced and compute-intensive pipeline to date. We made the GPUs of the cluster go brrrrr ๐Ÿš€

argilla/magpie-ultra-v0.1

Take it a look and tell us what you think! Probably, the models taking the most out of it are smol models ๐Ÿค— We will be improving the dataset in upcoming iterations!