Kaviyarasan V

kaveeshwaran

AI & ML interests

I want to be an AI Developer

Recent Activity

new activity 4 days ago

huggingface/HuggingDiscussions:[FEEDBACK] Notifications

replied to philschmid's post 4 days ago

Gemini 2.5 Flash is here! We excited launch our first hybrid reasoning Gemini model. In Flash 2.5 developer can turn thinking off. **TL;DR:** - 🧠 Controllable "Thinking" with thinking budget with up to 24k token - 🌌 1 Million multimodal input context for text, image, video, audio, and pdf - 🛠️ Function calling, structured output, google search & code execution. - 🏦 $0.15 1M input tokens; $0.6 or $3.5 (thinking on) per million output tokens (thinking tokens are billed as output tokens) - 💡 Knowledge cut of January 2025 - 🚀 Rate limits - Free 10 RPM 500 req/day - 🏅Outperforms 2.0 Flash on every benchmark Try it ⬇️ https://aistudio.google.com/prompts/new_chat?model=gemini-2.5-flash-preview-04-17

reacted to m-ric's post with 🔥 4 days ago

New king of open VLMs: InternVL3 takes Qwen 2.5's crown! 👑 InternVL have been a wildly successful series of model : and the latest iteration has just taken back their crown thanks to their superior, natively multimodal vision training pipeline. ➡️ Most of the vision language models (VLMs) these days are built like Frankenstein : take a good text-only Large Language Model (LLM) backbone, stitch a specific vision transformer (ViT) on top of it. Then the training is sequential 🔢 : 1. Freeze the LLM weights while you train the ViT only to work with the LLM part, then 2. Unfreeze all weights to train all weights in order to work together. 💫 The Shanghai Lab decided to challenge this paradigm and chose this approach that they call "native". For each of their model sizes, they still start from a good LLM (mostly Qwen-2.5 series, did I tell you I'm a huge fan of Qwen? ❤️), and stitch the ViT, but they don't freeze anything : they train all weights together with interleaved text and image understanding data in a single pre-training phase 🎨. They claim it results in more seamless interactions between modalities. And the results prove them right: they took the crown of top VLMs, at nearly all sizes, from their Qwen-2.5 parents. 👑

View all activity

Organizations

None yet

kaveeshwaran's activity

New activity in huggingface/HuggingDiscussions 4 days ago

[FEEDBACK] Notifications

146

#6 opened almost 3 years ago by

victor

replied to philschmid's post 4 days ago

Tech Enthusiast Style
"Whoa, controllable thinking? That’s not just smart — it’s brilliant. Hybrid reasoning just leveled up."

🔹 Developer Style
"Thinking tokens ON or OFF — finally, a model with a switch for my compute bill and creativity at the same time. Gemini 2.5 Flash just rewrote the rules."

🔹 Sassy & Fun
"Gemini 2.5 Flash said: Why think all the time? Take a break. Save money. Stay genius."

🔹 Minimal & Cool
"Multimodal. Million token input. Toggleable thought. Mind. Blown. 💥"

🔹 Product-Led Message
"Controllable cognition with multimodal scale? Gemini 2.5 Flash is the tool we didn’t know we needed. Now it's essential."

🔹 Futuristic Vibe
"AI that knows when to think and when to move fast? Welcome to the age of intelligent restraint."

reacted to m-ric's post with 🔥 4 days ago

Post

2068

New king of open VLMs: InternVL3 takes Qwen 2.5's crown! 👑

InternVL have been a wildly successful series of model : and the latest iteration has just taken back their crown thanks to their superior, natively multimodal vision training pipeline.

➡️ Most of the vision language models (VLMs) these days are built like Frankenstein : take a good text-only Large Language Model (LLM) backbone, stitch a specific vision transformer (ViT) on top of it. Then the training is sequential 🔢 : 1. Freeze the LLM weights while you train the ViT only to work with the LLM part, then 2. Unfreeze all weights to train all weights in order to work together.

💫 The Shanghai Lab decided to challenge this paradigm and chose this approach that they call "native". For each of their model sizes, they still start from a good LLM (mostly Qwen-2.5 series, did I tell you I'm a huge fan of Qwen? ❤️), and stitch the ViT, but they don't freeze anything : they train all weights together with interleaved text and image understanding data in a single pre-training phase 🎨.

They claim it results in more seamless interactions between modalities. And the results prove them right: they took the crown of top VLMs, at nearly all sizes, from their Qwen-2.5 parents. 👑

2 replies

reacted to samuellimabraz's post with 🔥 4 days ago

Post

1567

I recently had the opportunity to present at a Computer Vision Hangout, sharing my journey from autonomous drone competition to fine-tuning Vision-Language Models.

I built an interactive presentation app! Here's a glimpse of the topics:

🚁 Black Bee Drones:
My first steps into CV with Latin America's first autonomous drone team. Covering classical CV techniques (filtering, edge detection), the IMAV 2023 mission (ArUco detection, line following with PID control), and links to demos for OpenCV basics and PID simulation.

🤖 Asimo Foundation:
Using MediaPipe for gesture control of a robotic arm in an educational project.

☕ CafeDL:
Building a small Deep Learning framework from scratch in Java (inspired by Keras, using ND4J) and training a CNN for a QuickDraw-like app.

🏢 Tech4Humans:
Real-world applications, including open-source signature detection and efficient fine-tuning of VLMs for document extraction.

Check out the interactive demos (also embedded in the main app):

1️⃣ CV Hangout App: The main presentation app showcasing my journey.
samuellimabraz/cv-hangout

2️⃣ OpenCV GUI: Real-time demo of CV techniques (filters, color filtering, ArUco) & AI models.
samuellimabraz/opencv-gui

3️⃣ Line Follow PID: Simulation of a PID controller for drone line-following.
samuellimabraz/line-follow-pid

I hope these resources are helpful to someone on their CV learning journey!

published a dataset 4 days ago

kaveeshwaran/face_recog-doc

Updated 4 days ago • 36

reacted to Jaward's post with 🔥 18 days ago

Post

2316

Amazing work👏
Introduces Dream 7B - a discrete diffusion reasoning model, fully opensourced with weights on 🤗
- it outperforms existing non-autoregressive models and matches or beats frontier autoregressive of similar size on reasoning tasks.
Models:
- base: Dream-org/Dream-v0-Base-7B
- SFT: Dream-org/Dream-v0-Instruct-7B
Code: https://github.com/HKUNLP/Dream
Project: https://hkunlp.github.io/blog/2025/dream/

1 reply

reacted to BestWishYsh's post with 🔥 18 days ago

Post

2640

🚨 Hot Take: GPT-4o might NOT be a purely autoregressive model! 🚨

There’s a high chance it has a diffusion head. 🤯 If true, this could be a game-changer for AI architecture. What do you think? 🤔👇

Code: https://github.com/PicoTrex/GPT-ImgEval
Dataset: Yejy53/GPT-ImgEval
Paper: GPT-ImgEval: A Comprehensive Benchmark for Diagnosing GPT4o in Image Generation (2504.02782)

published a model about 2 months ago

kaveeshwaran/distilbert-base-uncased-finetuned-sst-2-english

Updated Feb 25