Talaviya Bhavik's picture
6

Talaviya Bhavik

talaviyabhavik
Β·

AI & ML interests

LLM.. LLM.. LLM

Recent Activity

liked a dataset about 1 month ago
openreasoner/MATH-APS
View all activity

Organizations

scikit-learn's profile picture Coolfox Labs's profile picture

talaviyabhavik's activity

reacted to prithivMLmods's post with πŸ”₯ about 2 hours ago
view post
Post
1472
o3-Mini and Deepseek R1
Worked out with some famous and weird examples.

πŸ”₯Blog: https://huggingface.co/blog/prithivMLmods/o3-mini-vs-deepseek-r1

Prompt : Using HTML, CSS, and JavaScript in a single HTML file to create a simulation of the solar system. Pay extreme attention to the UI to make it as intuitive as possible. Ensure that every planet appears as a sphere and is labeled with its corresponding name.

example 1: o3 Mini , example 2: Deepseek R1

Q2 : https://huggingface.co/blog/prithivMLmods/o3-mini-vs-deepseek-r1#q2--web-solar-system-explorer
reacted to merve's post with πŸ”₯ 8 days ago
view post
Post
4648
Oof, what a week! πŸ₯΅ So many things have happened, let's recap! merve/jan-24-releases-6793d610774073328eac67a9

Multimodal πŸ’¬
- We have released SmolVLM -- tiniest VLMs that come in 256M and 500M, with it's retrieval models ColSmol for multimodal RAG πŸ’—
- UI-TARS are new models by ByteDance to unlock agentic GUI control 🀯 in 2B, 7B and 72B
- Alibaba DAMO lab released VideoLlama3, new video LMs that come in 2B and 7B
- MiniMaxAI released Minimax-VL-01, where decoder is based on MiniMax-Text-01 456B MoE model with long context
- Dataset: Yale released a new benchmark called MMVU
- Dataset: CAIS released Humanity's Last Exam (HLE) a new challenging MM benchmark

LLMs πŸ“–
- DeepSeek-R1 & DeepSeek-R1-Zero: gigantic 660B reasoning models by DeepSeek, and six distilled dense models, on par with o1 with MIT license! 🀯
- Qwen2.5-Math-PRM: new math models by Qwen in 7B and 72B
- NVIDIA released AceMath and AceInstruct, new family of models and their datasets (SFT and reward ones too!)

Audio πŸ—£οΈ
- Llasa is a new speech synthesis model based on Llama that comes in 1B,3B, and 8B
- TangoFlux is a new audio generation model trained from scratch and aligned with CRPO

Image/Video/3D Generation ⏯️
- Flex.1-alpha is a new 8B pre-trained diffusion model by ostris similar to Flux
- tencent released Hunyuan3D-2, new 3D asset generation from images
Β·
reacted to xiaotianhan's post with πŸš€πŸ‘ 10 months ago
view post
Post
2101
πŸŽ‰ πŸŽ‰ πŸŽ‰ Happy to share our recent work. We noticed that image resolution plays an important role, either in improving multi-modal large language models (MLLM) performance or in Sora style any resolution encoder decoder, we hope this work can help lift restriction of 224x224 resolution limit in ViT.

ViTAR: Vision Transformer with Any Resolution (2403.18361)
  • 2 replies
Β·
reacted to merve's post with πŸ”₯ 10 months ago
reacted to merve's post with ❀️ 10 months ago
view post
Post
2908
SegGPT is a vision generalist on image segmentation, quite like GPTs for computer vision ✨
It comes with the last release of transformers 🎁 Demo and more in this post!
SegGPT is an extension of the Painter, where you speak to images with images: the model takes in an image prompt, transformed version of the image prompt, the actual image you want to see the same transform, and expected to output the transformed image.
SegGPT consists of a vanilla ViT with a decoder on top (linear, conv, linear).
The model is trained on diverse segmentation examples, where they provide example image-mask pairs, the actual input to be segmented, and the decoder head learns to reconstruct the mask output.
This generalizes pretty well!
The authors do not claim state-of-the-art results as the model is mainly used zero-shot and few-shot inference. They also do prompt tuning, where they freeze the parameters of the model and only optimize the image tensor (the input context).
Thanks to πŸ€— transformers you can use this model easily!
See here https://huggingface.co/docs/transformers/en/model_doc/seggpt
I have built an app for you to try it out. I combined SegGPT with Depth Anything Model, so you don't have to upload image mask prompts in your prompt pair πŸ€—
Try it here merve/seggpt-depth-anything
Also check out the collection merve/seggpt-660466a303bc3cd7559d271b