david

quyet7779

AI & ML interests

None yet

Recent Activity

reacted to andito's post with 🔥 28 days ago

Let's go! We are releasing SmolVLM, a smol 2B VLM built for on-device inference that outperforms all models at similar GPU RAM usage and tokens throughputs. - SmolVLM generates tokens 7.5 to 16 times faster than Qwen2-VL! 🤯 - Other models at this size crash a laptop, but SmolVLM comfortably generates 17 tokens/sec on a macbook! 🚀 - SmolVLM can be fine-tuned on a Google collab! Or process millions of documents with a consumer GPU! - SmolVLM even outperforms larger models in video benchmarks, despite not even being trained on videos! Check out more! Demo: https://huggingface.co/spaces/HuggingFaceTB/SmolVLM Blog: https://huggingface.co/blog/smolvlm Model: https://huggingface.co/HuggingFaceTB/SmolVLM-Instruct Fine-tuning script: https://github.com/huggingface/smollm/blob/main/finetuning/Smol_VLM_FT.ipynb

liked a Space about 1 month ago

Qwen/Qwen2.5-Coder-demo

liked a Space about 1 month ago

Kwai-Kolors/Kolors-Virtual-Try-On

View all activity

Organizations

quyet7779's activity

reacted to andito's post with 🔥 28 days ago

Post

3251

Let's go! We are releasing SmolVLM, a smol 2B VLM built for on-device inference that outperforms all models at similar GPU RAM usage and tokens throughputs.

- SmolVLM generates tokens 7.5 to 16 times faster than Qwen2-VL! 🤯
- Other models at this size crash a laptop, but SmolVLM comfortably generates 17 tokens/sec on a macbook! 🚀
- SmolVLM can be fine-tuned on a Google collab! Or process millions of documents with a consumer GPU!
- SmolVLM even outperforms larger models in video benchmarks, despite not even being trained on videos!

Check out more!
Demo: HuggingFaceTB/SmolVLM
Blog: https://huggingface.co/blog/smolvlm
Model: HuggingFaceTB/SmolVLM-Instruct
Fine-tuning script: https://github.com/huggingface/smollm/blob/main/finetuning/Smol_VLM_FT.ipynb

liked 2 Spaces about 1 month ago

Running

365

👁

Qwen2.5 Coder Demo

Running on CPU Upgrade

6.26k

👕

Kolors Virtual Try-On

reacted to merve's post with 🔥 about 1 month ago

Post

4999

OmniVision-968M: a new local VLM for edge devices, fast & small but performant
💨 a new vision language model with 9x less image tokens, super efficient
📖 aligned with DPO for reducing hallucinations
⚡️ Apache 2.0 license 🔥

Demo hf.co/spaces/NexaAIDev/omnivlm-dpo-demo
Model https://huggingface.co/NexaAIDev/omnivision-968M

4 replies

New activity in linhtran92/viet_bud500 3 months ago

Convert to wav file

#10 opened 3 months ago by

quyet7779

updated a model 3 months ago

quyet7779/whisper-small-vi

Updated Sep 13

reacted to mrfakename's post with ❤️ 7 months ago

Post

9854

Introducing StyleTTS 2 detector, an audio classification model to detect StyleTTS 2 vs human-generated content!

Dual-licensed under MIT/Apache 2.0.

Model Weights: mrfakename/styletts2-detector
Spaces: mrfakename/styletts2-detector

2 replies

liked a model 10 months ago

ByteDance/SDXL-Lightning

Text-to-Image • Updated Apr 3 • 130k • 1.95k