Aymeric Roucher

m-ric

AI & ML interests

MLE at Hugging Face ๐Ÿค— LLMs, Agents, RAG, Multimodal.

Articles

Organizations

Posts 48

view post
Post
642
๐Ÿ“œ ๐Ž๐ฅ๐-๐ฌ๐œ๐ก๐จ๐จ๐ฅ ๐‘๐๐๐ฌ ๐œ๐š๐ง ๐š๐œ๐ญ๐ฎ๐š๐ฅ๐ฅ๐ฒ ๐ซ๐ข๐ฏ๐š๐ฅ ๐Ÿ๐š๐ง๐œ๐ฒ ๐ญ๐ซ๐š๐ง๐ฌ๐Ÿ๐จ๐ซ๐ฆ๐ž๐ซ๐ฌ!

Researchers from Mila and Borealis AI just have shown that simplified versions of good old Recurrent Neural Networks (RNNs) can match the performance of today's transformers.

They took a fresh look at LSTMs (from 1997!) and GRUs (from 2014). They stripped these models down to their bare essentials, creating "minLSTM" and "minGRU". The key changes:
โถ Removed dependencies on previous hidden states in the gates
โท Dropped the tanh that had been added to restrict output range in order to avoid vanishing gradients
โธ Ensured outputs are time-independent in scale (not sure I understood that well either, don't worry)

โšก๏ธ As a result, you can use a โ€œparallel scanโ€ algorithm to train these new, minimal RNNs, in parallel, taking 88% more memory but also making them 200x faster than their traditional counterparts for long sequences

๐Ÿ”ฅ The results are mind-blowing! Performance-wise, they go toe-to-toe with Transformers or Mamba.

And for Language Modeling, they need 2.5x fewer training steps than Transformers to reach the same performance! ๐Ÿš€

๐Ÿค” Why does this matter?

By showing there are simpler models with similar performance to transformers, this challenges the narrative that we need advanced architectures for better performance!

๐Ÿ’ฌย Franรงois Chollet wrote in a tweet about this paper:

โ€œThe fact that there are many recent architectures coming from different directions that roughly match Transformers is proof that architectures aren't fundamentally important in the curve-fitting paradigm (aka deep learning)โ€

โ€œCurve-fitting is about embedding a dataset on a curve. The critical factor is the dataset, not the specific hard-coded bells and whistles that constrain the curve's shape.โ€

Itโ€™s the Bitter lesson by Rich Sutton striking again: donโ€™t need fancy thinking architectures, just scale up your model and data!

Read the paper ๐Ÿ‘‰ย  Were RNNs All We Needed? (2410.01201)
view post
Post
820
๐Ÿ‡จ๐Ÿ‡ณโ›ต๏ธ ๅ‡บๆตท: Chinese AI is expanding globally

Fact: Chinese LLMs are heavily underrated, for instance recently the excellent Deepseek-v2.5 or Qwen models.

Luckily for us, @AdinaY just wrote an excellent blog post explaining the Chinese AI ecosystem!

My key takeaways:

Since Google, OpenAI and Anthropic models are not available in China, local companies are fighting for the market. A really good market - AI has much higher penetration there than in the rest of the world, both with companies and individual users!

๐Ÿ’ฐ But since Deepseek heavily cut prices in May 24, this spiraled into a price war that created a cut-throat environment with unsustainably low prices.

๐Ÿ“‹ On top of this, the local regulation is stringent: models must undergo licensing from a local censor (the Cyberspace Administration of China), that for instance requires models to refuse answering certain questions on the CCP. Although this is certainly simpler to implement than certain condition of the European AI Act.

๐Ÿ’ธ If this wasn't enough, VC investment in AI is drying out: By mid-2024, Chinese AI startups raised approximately $4.4 billion, vs $55B for US startups just in Q2 24.

๐Ÿ“ฑ To get profitability companies have shifted from foundational models to model + application, for instance PopAI from [01.AI](http://01.ai/) with millions of users and high profitability.

โ›๏ธ They also try to drill down specific industries: but these niches are also getting crowded.

โžก๏ธ Since their home market is becoming both too crowded and unhospitable, Chinese companies are now going for international market, "Sailing abroad" following the expression consacred for Zheng He's legendary journey in 1500.

There, they'll have to adapt to different infrastructures and regulations, but they have bright prospects for growth!

Read her post ๐Ÿ‘‰ย https://huggingface.co/blog/AdinaY/chinese-ai-global-expansion