Kaviyarasan V

kaveeshwaran
·

AI & ML interests

I want to be an AI Developer

Recent Activity

View all activity

Organizations

None yet

kaveeshwaran's activity

New activity in huggingface/HuggingDiscussions 4 days ago

[FEEDBACK] Notifications

15
146
#6 opened almost 3 years ago by
victor
replied to philschmid's post 4 days ago
view reply

Tech Enthusiast Style
"Whoa, controllable thinking? That’s not just smart — it’s brilliant. Hybrid reasoning just leveled up."

🔹 Developer Style
"Thinking tokens ON or OFF — finally, a model with a switch for my compute bill and creativity at the same time. Gemini 2.5 Flash just rewrote the rules."

🔹 Sassy & Fun
"Gemini 2.5 Flash said: Why think all the time? Take a break. Save money. Stay genius."

🔹 Minimal & Cool
"Multimodal. Million token input. Toggleable thought. Mind. Blown. 💥"

🔹 Product-Led Message
"Controllable cognition with multimodal scale? Gemini 2.5 Flash is the tool we didn’t know we needed. Now it's essential."

🔹 Futuristic Vibe
"AI that knows when to think and when to move fast? Welcome to the age of intelligent restraint."

reacted to m-ric's post with 🔥 4 days ago
view post
Post
2068
New king of open VLMs: InternVL3 takes Qwen 2.5's crown! 👑

InternVL have been a wildly successful series of model : and the latest iteration has just taken back their crown thanks to their superior, natively multimodal vision training pipeline.

➡️ Most of the vision language models (VLMs) these days are built like Frankenstein : take a good text-only Large Language Model (LLM) backbone, stitch a specific vision transformer (ViT) on top of it. Then the training is sequential 🔢 : 1. Freeze the LLM weights while you train the ViT only to work with the LLM part, then 2. Unfreeze all weights to train all weights in order to work together.

💫 The Shanghai Lab decided to challenge this paradigm and chose this approach that they call "native". For each of their model sizes, they still start from a good LLM (mostly Qwen-2.5 series, did I tell you I'm a huge fan of Qwen? ❤️), and stitch the ViT, but they don't freeze anything : they train all weights together with interleaved text and image understanding data in a single pre-training phase 🎨.

They claim it results in more seamless interactions between modalities. And the results prove them right: they took the crown of top VLMs, at nearly all sizes, from their Qwen-2.5 parents. 👑
  • 2 replies
·
reacted to samuellimabraz's post with 🔥 4 days ago
view post
Post
1567
I recently had the opportunity to present at a Computer Vision Hangout, sharing my journey from autonomous drone competition to fine-tuning Vision-Language Models.

I built an interactive presentation app! Here's a glimpse of the topics:

🚁 Black Bee Drones:
My first steps into CV with Latin America's first autonomous drone team. Covering classical CV techniques (filtering, edge detection), the IMAV 2023 mission (ArUco detection, line following with PID control), and links to demos for OpenCV basics and PID simulation.

🤖 Asimo Foundation:
Using MediaPipe for gesture control of a robotic arm in an educational project.

☕ CafeDL:
Building a small Deep Learning framework from scratch in Java (inspired by Keras, using ND4J) and training a CNN for a QuickDraw-like app.

🏢 Tech4Humans:
Real-world applications, including open-source signature detection and efficient fine-tuning of VLMs for document extraction.

Check out the interactive demos (also embedded in the main app):

1️⃣ CV Hangout App: The main presentation app showcasing my journey.
samuellimabraz/cv-hangout

2️⃣ OpenCV GUI: Real-time demo of CV techniques (filters, color filtering, ArUco) & AI models.
samuellimabraz/opencv-gui

3️⃣ Line Follow PID: Simulation of a PID controller for drone line-following.
samuellimabraz/line-follow-pid

I hope these resources are helpful to someone on their CV learning journey!
reacted to Jaward's post with 🔥 18 days ago
reacted to BestWishYsh's post with 🔥 18 days ago