10 101 21

Dhruv Diddi

ddiddi

AI & ML interests

None yet

Recent Activity

published a Space about 11 hours ago

GetSoloTech/Solo-Whisper

liked a model 6 days ago

EmergentMethods/Phi-3-mini-4k-instruct-graph

commented on a paper 10 days ago

One-Minute Video Generation with Test-Time Training

View all activity

Organizations

ddiddi's activity

upvoted a paper 10 days ago

One-Minute Video Generation with Test-Time Training

Paper • 2504.05298 • Published 11 days ago • 94

upvoted an article 29 days ago

Article

Transformers.js v3: WebGPU support, new models & tasks, and more…

Oct 22, 2024

• 73

upvoted a collection about 1 month ago

Gemma 3 Release

Collection

24 items • Updated about 4 hours ago • 333

upvoted 7 papers about 1 month ago

LocAgent: Graph-Guided LLM Agents for Code Localization

Paper • 2503.09089 • Published Mar 12 • 9

Benchmarking AI Models in Software Engineering: A Review, Search Tool, and Enhancement Protocol

Paper • 2503.05860 • Published Mar 7 • 9

AnyMoLe: Any Character Motion In-betweening Leveraging Video Diffusion Models

Paper • 2503.08417 • Published Mar 11 • 8

"Principal Components" Enable A New Language of Images

Paper • 2503.08685 • Published Mar 11 • 12

Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia

Paper • 2503.07920 • Published Mar 10 • 97

Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning

Paper • 2503.07572 • Published Mar 10 • 41

LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL

Paper • 2503.07536 • Published Mar 10 • 84

upvoted 2 papers 2 months ago

MedXpertQA: Benchmarking Expert-Level Medical Reasoning and Understanding

Paper • 2501.18362 • Published Jan 30 • 22

s1: Simple test-time scaling

Paper • 2501.19393 • Published Jan 31 • 118

upvoted 4 papers 3 months ago

VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding

Paper • 2501.13106 • Published Jan 22 • 91

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Paper • 2501.12948 • Published Jan 22 • 382

GPS as a Control Signal for Image Generation

Paper • 2501.12390 • Published Jan 21 • 13

Test-Time Preference Optimization: On-the-Fly Alignment via Iterative Textual Feedback

Paper • 2501.12895 • Published Jan 22 • 60

upvoted 4 papers 4 months ago

InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions

Paper • 2412.09596 • Published Dec 12, 2024 • 99