sllhf (SLLHF)

pcuenq

posted an update 6 days ago

Post

2506

👉 What happened in AI in 2025? 👈

We prepared the 2025 version of the HF AI Timeline Grid, highlighting open vs API-based model releases, and allowing you to browse and filter by access, modality, and release type!

Play with it here:
2025-ai-timeline/2025-ai-timeline

Here's my personal quarterly TL;DR:

1️⃣ Q1 — Learning to Reason
Deepseek not only releases a top-notch reasoning model, but shows how to train them and compete with closed frontier models. OpenAI debuts Deep Research.

Significant milestones: DeepSeek R1 & R1-Zero, Qwen 2.5 VL, OpenAI Deep Research, Gemini 2.5 Pro (experimental)

2️⃣ Q2 — Multimodality and Coding
More LLMs embrace multimodality by default, and there's a surge in coding agents. Strong vision, audio, and generative models emerge.

Significant milestones: Llama 4, Qwen 3, Imagen 4, OpenAI Codex, Google Jules, Claude 4

3️⃣ Q3 — "Gold" rush, OpenAI opens up, the community goes bananas
Flagship models get gold in Math olympiads and hard benchmarks. OpenAI releases strong open source models and Google releases the much anticipated nano-banana for image generation and editing. Agentic workflows become commonplace.

Significant milestones: Gemini and OpenAI IMO Gold, gpt-oss, Gemini 2.5 Flash Image, Grok 4, Claude Sonnet 4.5

4️⃣ Q4 — Mistral returns, leaderboard hill-climbing
Mistral is back with updated model families. All labs release impressive models to wrap up the year!

Significant milestones: Claude Opus 4.5, DeepSeek Math V2, FLUX 2, GPT 5.1, Kimi K2 Thinking, Nano Banana Pro, GLM 4.7, Gemini 3, Mistral 3, MiniMax M2.1 🤯

Credits
🙏 NHLOCAL for the source data https://github.com/NHLOCAL/AiTimeline

🫡 @reach-vb for the original idea, design and recipe

🙌 @ariG23498 and yours truly for compiling and verifying the 2025 edition

🥳 Here's to 2026, wishing it becomes the best year ever for open releases and on-device-first use-cases! 🥂

1 reply

·

victor

posted an update 24 days ago

Post

2968

Nvidia is on a roll lately. Nemotron 3 Nano is my new fav local model, but here's the real flex: they published the entire evaluation setup. Configs, prompts, logs, all of it. This is how you do open models 🔥

https://huggingface.co/blog/nvidia/nemotron-3-nano-evaluation-recipe

m-ric

posted an update 3 months ago

Post

916

Tokenization is one of the most important processes in AI - yet many would like to kill it 💀

What's tokenization? The neural networks inside LLMs actually only process numbers, not text: tokenization is the process that makes text readable for them, by converting sentences into lists of numbers.

➡️ For instance, "This is tokenization" would be split into "This | is | token | ization", then each of the parts (tokens) are converted to IDs according to a predefined mapping: for instance "ization" could map to id 2438.
Thus "This is tokenization" can become 1335 | 135 | 2980 | 2438 => now the model can process the sentence!

Most tokenizers today use pre-specified mappings called "vocabularies", generally built about the compression algorithme Byte-Pair Encoding (BPE) that learns from a big corpuses of texts an optimized split to efficiently encode any text from the same distribution into a list token IDs.

🤨 Now, these current tokenizers have flaws.
For instance, the rigidity of their mapping creates losses ; the prime example being that a tokenizer designed for English (thus optimized for tokens like "has", "been", "clock", etc) will not have the right tokens to approach Burmese, thus being terribly inefficient at it.

Many alternative approaches have emerged as a result: for instance "tokenizer-free tokenizers". One that I really liked was "entropy-based": it monitors the stream of text, and trigger a split whenever the entropy increases too much, i.e. when something "surprising" happens.

But this great article argues that tokenizers are a lesser evil. Read and decide for yourself!
https://huggingface.co/blog/catherinearnett/in-defense-of-tokenizers

hfkwr

authored a paper 3 months ago

Understanding Reinforcement Learning for Model Training, and future directions with GRAPE

Paper • 2509.04501 • Published Sep 2, 2025 • 1

m-ric

posted an update 3 months ago

Post

4913

STOP EVERYTHING NOW - we might finally have a radical architecture improvement over Transformers!!! 🚨

A lone scientist just proposed Tiny Recursive Model (TRM), and it is literally the most impressive model that I've seen this year.

➡️ Tiny Recursive Model is 7M parameters
➡️ On ARC-AGI, it beats flagship models like Gemini-2.5-pro

Consider how wild this is: Gemini-2.5-pro must be over 10,000x bigger
and had 1,000 as many authors 😂 (Alexia is alone on the paper)

What's this sorcery?
In short: it's a very tiny Transformers, but it loops over itself at two different frequencies, updating two latent variables: one for the proposed answer and one for the reasoning.

@AlexiaJM started from the paper Hierarchical Reasoning Model, published a few months ago, that already showed breakthrough improvement on AGI for its small size (27M)

Hierarchical Reasoning Model had introduced one main feature:
🔎 Deep supervision
In their model, one part (here one layer) would run at high frequency, and another would be lower frequency, running only every n steps.

They had used a recurrent architecture, where these layers would repeat many times ; but to make it work they had to do many approximations, including not fully backpropagating the loss through all layers.

Alexia studied what was useful and what wasn't, and cleaned the architecture as follows :
Why use a recurrent architecture, when you can just make it a loop?
➡️ She made the network recursive, looping over itself

Why use 2 latent variables ?
➡️ She provides a crystal clear explanation : the one that changes frequently is the reasoning, the one that changes at low frequency is the proposed answer.
➡️ She runs ablation studies to validate that 2 is indeed optimal.

This new setup is a much more elegant way to process reasoning than generating huge chains of tokens as all flagship models currently do.

This might be the breakthrough we've been awaiting for so long!

4 replies

·

lysandre

posted an update 4 months ago

Post

7509

We're kick-starting the process of Transformers v5, with @ArthurZ and @cyrilvallez !

v5 should be significant: we're using it as a milestone for performance optimizations, saner defaults, and a much cleaner code base worthy of 2025.

Fun fact: v4.0.0-rc-1 came out on Nov 19, 2020, nearly five years ago!

6 replies

·

ankits0052

authored 14 papers 5 months ago

MCP-Bench: Benchmarking Tool-Using LLM Agents with Complex Real-World Tasks via MCP Servers

Paper • 2508.20453 • Published Aug 28, 2025 • 63

LoFT: Local Proxy Fine-tuning For Improving Transferability Of Adversarial Attacks Against Large Language Model

Paper • 2310.04445 • Published Oct 2, 2023

Understanding and Mitigating the Label Noise in Pre-training on Downstream Tasks

Paper • 2309.17002 • Published Sep 29, 2023 • 1

Enhancing Retrieval for ESGLLM via ESG-CID -- A Disclosure Content Index Finetuning Dataset for Mapping GRI and ESRS

Paper • 2503.10674 • Published Mar 10, 2025 • 1

Deciphering GunType Hierarchy through Acoustic Analysis of Gunshot Recordings

Paper • 2506.20609 • Published Jun 25, 2025

ProRefine: Inference-time Prompt Refinement with Textual Feedback

Paper • 2506.05305 • Published Jun 5, 2025 • 1

Did You Hear That? Introducing AADG: A Framework for Generating Benchmark Data in Audio Anomaly Detection

Paper • 2410.03904 • Published Oct 4, 2024

Automatic Dataset Construction (ADC): Sample Collection, Data Curation, and Beyond

Paper • 2408.11338 • Published Aug 21, 2024

Harnessing Business and Media Insights with Large Language Models

Paper • 2406.06559 • Published Jun 2, 2024

Audio-visual fine-tuning of audio-only ASR models

Paper • 2312.09369 • Published Dec 14, 2023

SLLHF

AI & ML interests

Recent Activity

Understanding Reinforcement Learning for Model Training, and future directions with GRAPE

MCP-Bench: Benchmarking Tool-Using LLM Agents with Complex Real-World Tasks via MCP Servers

LoFT: Local Proxy Fine-tuning For Improving Transferability Of Adversarial Attacks Against Large Language Model

Understanding and Mitigating the Label Noise in Pre-training on Downstream Tasks

Exploring Domain-Specific Enhancements for a Neural Foley Synthesizer

Conformers are All You Need for Visual Speech Recogntion

Improving Data Efficiency via Curating LLM-Driven Rating Systems

LLM Unlearning via Loss Adjustment with Only Forget Data

Enhancing Retrieval for ESGLLM via ESG-CID -- A Disclosure Content Index Finetuning Dataset for Mapping GRI and ESRS

Deciphering GunType Hierarchy through Acoustic Analysis of Gunshot Recordings

ProRefine: Inference-time Prompt Refinement with Textual Feedback

Did You Hear That? Introducing AADG: A Framework for Generating Benchmark Data in Audio Anomaly Detection

Automatic Dataset Construction (ADC): Sample Collection, Data Curation, and Beyond

Harnessing Business and Media Insights with Large Language Models

Audio-visual fine-tuning of audio-only ASR models

AI & ML interests

Recent Activity

Team members 134

sllhf's activity