Ahmet

atasoglu

AI & ML interests

NLP, LLMs.

Recent Activity

liked a dataset 2 days ago
open-thoughts/OpenThoughts-114k
liked a model 2 days ago
open-thoughts/OpenThinker-7B
liked a model 6 days ago
deepseek-ai/Janus-Pro-1B
View all activity

Organizations

Blog-explorers's profile picture

atasoglu's activity

reacted to merve's post with πŸš€ 9 days ago
view post
Post
2204
smolagents can see πŸ”₯
we just shipped vision support to smolagents πŸ€— agentic computers FTW

you can now:
πŸ’» let the agent get images dynamically (e.g. agentic web browser)
πŸ“‘ pass images at the init of the agent (e.g. chatting with documents, filling forms automatically etc)
with few LoC change! 🀯
you can use transformers models locally (like Qwen2VL) OR plug-in your favorite multimodal inference provider (gpt-4o, antrophic & co) 🀠

read our blog http://hf.co/blog/smolagents-can-see
reacted to merve's post with πŸ”₯ 10 days ago
view post
Post
4700
Oof, what a week! πŸ₯΅ So many things have happened, let's recap! merve/jan-24-releases-6793d610774073328eac67a9

Multimodal πŸ’¬
- We have released SmolVLM -- tiniest VLMs that come in 256M and 500M, with it's retrieval models ColSmol for multimodal RAG πŸ’—
- UI-TARS are new models by ByteDance to unlock agentic GUI control 🀯 in 2B, 7B and 72B
- Alibaba DAMO lab released VideoLlama3, new video LMs that come in 2B and 7B
- MiniMaxAI released Minimax-VL-01, where decoder is based on MiniMax-Text-01 456B MoE model with long context
- Dataset: Yale released a new benchmark called MMVU
- Dataset: CAIS released Humanity's Last Exam (HLE) a new challenging MM benchmark

LLMs πŸ“–
- DeepSeek-R1 & DeepSeek-R1-Zero: gigantic 660B reasoning models by DeepSeek, and six distilled dense models, on par with o1 with MIT license! 🀯
- Qwen2.5-Math-PRM: new math models by Qwen in 7B and 72B
- NVIDIA released AceMath and AceInstruct, new family of models and their datasets (SFT and reward ones too!)

Audio πŸ—£οΈ
- Llasa is a new speech synthesis model based on Llama that comes in 1B,3B, and 8B
- TangoFlux is a new audio generation model trained from scratch and aligned with CRPO

Image/Video/3D Generation ⏯️
- Flex.1-alpha is a new 8B pre-trained diffusion model by ostris similar to Flux
- tencent released Hunyuan3D-2, new 3D asset generation from images
Β·
upvoted an article 11 days ago
view article
Article

Visual Document Retrieval Goes Multilingual

β€’ 66
upvoted 2 articles 11 days ago
view article
Article

Train 400x faster Static Embedding Models with Sentence Transformers

β€’ 132
view article
Article

Mastering Long Contexts in LLMs with KVPress

By nvidia β€’
β€’ 57