Adrien Bufort's picture

Adrien Bufort

Forbu14

AI & ML interests

Deep learning, machine learning, reinforcement learning. @orange

Recent Activity

Organizations

Orange's profile picture

Forbu14's activity

reacted to clem's post with 🚀 about 1 month ago
view post
Post
4518
Six predictions for AI in 2025 (and a review of how my 2024 predictions turned out):

- There will be the first major public protest related to AI
- A big company will see its market cap divided by two or more because of AI
- At least 100,000 personal AI robots will be pre-ordered
- China will start to lead the AI race (as a consequence of leading the open-source AI race).
- There will be big breakthroughs in AI for biology and chemistry.
- We will begin to see the economic and employment growth potential of AI, with 15M AI builders on Hugging Face.

How my predictions for 2024 turned out:

- A hyped AI company will go bankrupt or get acquired for a ridiculously low price
✅ (Inflexion, AdeptAI,...)

- Open-source LLMs will reach the level of the best closed-source LLMs
✅ with QwQ and dozens of others

- Big breakthroughs in AI for video, time-series, biology and chemistry
✅ for video 🔴for time-series, biology and chemistry

- We will talk much more about the cost (monetary and environmental) of AI
✅Monetary 🔴Environmental (😢)

- A popular media will be mostly AI-generated
✅ with NotebookLM by Google

- 10 millions AI builders on Hugging Face leading to no increase of unemployment
🔜currently 7M of AI builders on Hugging Face
·
liked a Space about 2 months ago
reacted to reach-vb's post with 🔥 4 months ago
view post
Post
2856
Less than two days ago Kyutai Labs open sourced Moshi - an ~7.6B on-device Speech to Speech foundation model and Mimi - SoTA streaming speech codec! 🔥

The release includes:

1. Moshiko & Moshika - Moshi finetuned on synthetic data (CC-BY license) ( kyutai/moshi-v01-release-66eaeaf3302bef6bd9ad7acd)
2. Mimi - Streaiming Audio Codec, processes 24 kHz audio, down to a 12.5 Hz representation with a bandwidth of 1.1 kbps (CC-BY license) ( kyutai/mimi)
3. Model checkpoints & Inference codebase written in Rust (Candle), PyTorch & MLX (Apache license) (https://github.com/kyutai-labs/moshi)

How does Moshi work?

1. Moshi processes two audio streams: one for itself and one for the user, with the user's stream coming from audio input and Moshi's stream generated by the model.

2. Along with these audio streams, Moshi predicts text tokens for its speech, enhancing its generation quality.

3. The model uses a small Depth Transformer for codebook dependencies and a large 7B parameter Temporal Transformer for temporal dependencies.

4. The theoretical latency is 160ms, with a practical latency of around 200ms on an L4 GPU.

Model size & inference:

Moshiko/ka are 7.69B param models

bf16 ~16GB VRAM
8-bit ~8GB VRAM
4-bit ~4GB VRAM

You can run inference via Candle 🦀, PyTorch and MLX - based on your hardware.

The Kyutai team, @adefossez @lmz and team are cracked AF, they're bringing some serious firepower to the open source/ science AI scene, looking forward to what's next! 🐐
  • 1 reply
·
reacted to singhsidhukuldeep's post with 😔 4 months ago
view post
Post
953
Reflection-Llama-3.1-70B burst onto the scene, surprising everyone! It claimed to outperform others with its novel Reflection-Tuning technique, promising not just to match but to surpass the likes of Claude 3.5 and GPT-4o, leveraging its 70 billion parameters to redefine what open-source could achieve.

And now, everything is crumbling!

The model's performance metrics, especially its 99.2% accuracy on the high school math dataset GSM 8K, have raised eyebrows. While it looked like a valedictorian, based on the open weights, it hardly performs like one.

The model card in the Transformers behaves as Llama 3 and not 3.1.

While the weights were released publicly, they are having issues aligning with the claims. The tuning has been restarted, and the author claims to upload the updated weights soon!

And the big one: the black-boxed API shared is not at all like the open weights. Even more, when pushed hard, the API endpoint claims to be an LLM by Anthropic!

But you might ask, didn't this model beat Anthropic Claude 3.5? Yes, it did.

So, did Claude 3.5 beat Claude 3.5? No, the benchmark is zero-shot, and the claims are that the results are not under zero-shot but under CoT/few-shot!

And to top it all off, the reflecting back idea is not new. But I don't think that's a big deal.

I took some time to look through everything, and now, once tested, this model looks to be worse than Llama 3.1 70B

I still believe the Reflection-Tuning technique is promising. These are the papers discussing its efficacy:
- "Think Before You Speak: Training Language Models With Pause Tokens"
- "Reflection-Tuning: Data Recycling Improves LLM Instruction-Tuning"

PS: Matt Shumer/@mattshumer_ (Twitter Handle) (Reflection-Llama-3.1-70B creator) is a great researcher. Let's wait for his updated weights!

Great YT video: https://youtu.be/Xtr_Ll_A9ms

Hugging Face Clem Delangue 🤗?
Can you please help here if possible? This will be the pinnacle of open-source!
  • 1 reply
·