Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up

All HF Hub posts

MoritzLaurerย 
posted an update 1 day ago
view post
Post
1895
#phdone - I defended my PhD yesterday! A key lesson: it is amazing how open science and open source can empower beginners with limited resources:

I first learned about instruction-based classifiers like BERT-NLI 3-4 years ago, through the @HuggingFace ZeroShotClassificationPipeline. Digging deeper into this, it was surprisingly easy to find new datasets, newer base models, and reusable fine-tuning scripts on the HF Hub to create my own zeroshot models - although I didn't know much about fine-tuning at the time.

Thanks to the community effect of the Hub, my models were downloaded hundreds of thousands of times after a few months. Seeing my research being useful for people motivated me to improve and upload newer models. Leaving my contact details in the model cards led to academic cooperation and consulting contracts (and eventually my job at HF).

That's the power of open science & open source: learning, sharing, improving, collaborating.

I mean every word in my thesis acknowledgments (screenshot). I'm very grateful to my supervisors @vanatteveldt @CasAndreu @KasperWelbers for their guidance; to @profAndreaRenda and @CEPS_thinktank for enabling me to work part-time during the first year; to @huggingface for creating awesome tools and an awesome platform; and to many others who are not active on social media.

Links to the full thesis and the collection of my most recent models are below.

PS: If someone happens to speak Latin, let me know if my diploma contains some hidden Illuminati code or something :D
  • 2 replies
ยท
YerbaPageย 
posted an update 1 day ago
view post
Post
1673
We propose MGDebugger, a hierarchical bottom-up LLM code debugger ๐Ÿ”ฅ that can fix bugs from low-level syntax errors to high-level algorithmic flaws.

It achieves an โญ๏ธ 18.9% improvement in accuracy over seed generations in HumanEval and a โญ๏ธ 97.6% repair success rate in HumanEvalFix.

Paper available at https://arxiv.org/abs/2410.01215.
Code and demo available at https://github.com/YerbaPage/MGDebugger.
  • 1 reply
ยท
m-ricย 
posted an update about 13 hours ago
view post
Post
584
๐Ÿ“œ ๐Ž๐ฅ๐-๐ฌ๐œ๐ก๐จ๐จ๐ฅ ๐‘๐๐๐ฌ ๐œ๐š๐ง ๐š๐œ๐ญ๐ฎ๐š๐ฅ๐ฅ๐ฒ ๐ซ๐ข๐ฏ๐š๐ฅ ๐Ÿ๐š๐ง๐œ๐ฒ ๐ญ๐ซ๐š๐ง๐ฌ๐Ÿ๐จ๐ซ๐ฆ๐ž๐ซ๐ฌ!

Researchers from Mila and Borealis AI just have shown that simplified versions of good old Recurrent Neural Networks (RNNs) can match the performance of today's transformers.

They took a fresh look at LSTMs (from 1997!) and GRUs (from 2014). They stripped these models down to their bare essentials, creating "minLSTM" and "minGRU". The key changes:
โถ Removed dependencies on previous hidden states in the gates
โท Dropped the tanh that had been added to restrict output range in order to avoid vanishing gradients
โธ Ensured outputs are time-independent in scale (not sure I understood that well either, don't worry)

โšก๏ธ As a result, you can use a โ€œparallel scanโ€ algorithm to train these new, minimal RNNs, in parallel, taking 88% more memory but also making them 200x faster than their traditional counterparts for long sequences

๐Ÿ”ฅ The results are mind-blowing! Performance-wise, they go toe-to-toe with Transformers or Mamba.

And for Language Modeling, they need 2.5x fewer training steps than Transformers to reach the same performance! ๐Ÿš€

๐Ÿค” Why does this matter?

By showing there are simpler models with similar performance to transformers, this challenges the narrative that we need advanced architectures for better performance!

๐Ÿ’ฌย Franรงois Chollet wrote in a tweet about this paper:

โ€œThe fact that there are many recent architectures coming from different directions that roughly match Transformers is proof that architectures aren't fundamentally important in the curve-fitting paradigm (aka deep learning)โ€

โ€œCurve-fitting is about embedding a dataset on a curve. The critical factor is the dataset, not the specific hard-coded bells and whistles that constrain the curve's shape.โ€

Itโ€™s the Bitter lesson by Rich Sutton striking again: donโ€™t need fancy thinking architectures, just scale up your model and data!

Read the paper ๐Ÿ‘‰ย  Were RNNs All We Needed? (2410.01201)
TuringsSolutionsย 
posted an update 1 day ago
view post
Post
1051
Hyperdimensional Computing + Neural Network, tell your friends. To my knowledge, this is a completely novel implementation of HDC+Neural Networks. It would be a direct competitor to Transformers. It is off the charts more computationally efficient than Transformers could ever hope to be (which is why I tested it in the first place). It is far more similar to biological processes. My testing so far shows that it works surprisingly well. One surprise so far from my testing, adding an Attention Mechanism to the model does nothing at all. Weirdest thing. Like 1% performance increase. I guess Attention Is Not All You Need?


I made a Github repository for my Hyperdimensional Computing Neural Network: https://github.com/RichardAragon/HyperDimensionalComputingNeuralNetwork


I made a YouTube video showcasing the model and some of my experiments with it: https://youtu.be/Eg51o519zVM
ยท
louisbrulenaudetย 
posted an update 1 day ago
view post
Post
1157
My biggest release of the year: a series of 7 specialized embedding models for information retrieval within tax documents, is now available for free on Hugging Face ๐Ÿค—

These new models aim to offer an open source alternative for in-domain semantic search from largeย text corpora and will improve RAG systems and context addition for large language models.

Trained on more than 43 million tax tokens derived from semi-synthetic and raw-synthetic data, enriched by various methods (in particular MSFT's evol-instruct by @intfloat ), and corrected by humans, this project is the fruit of hundreds of hours of work and is the culmination of a global effort to open up legal technologies that has only just begun.

A big thank you to Microsoft for Startups for giving me access to state-of-the-art infrastructure to train these models, and to @julien-c , @clem ๐Ÿค—, @thomwolf and the whole HF team for the inference endpoint API and the generous provision of Meta LLama-3.1-70B. Special thanks also to @tomaarsen for his invaluable advice on training embedding models and Loss functions โค๏ธ

Models are available on my personal HF page, into the Lemone-embed collection: louisbrulenaudet/lemone-embed-66fdc24000df732b395df29b
  • 1 reply
ยท
zamalย 
posted an update 1 day ago
view post
Post
828
๐Ÿš€ New Model Release: zamal/Molmo-7B-GPTQ-4bit ๐Ÿš€

Hello lovely community,

zamal/Molmo-7B-GPTQ-4bit model is now available for all! This model has been highly quantized, reducing its size by almost six times. It now occupies significantly less space and vRAM, making it perfect for deployment on resource-constrained devices without compromising performance.

Now we get:
Efficient Performance: Maintains high accuracy while being highly quantized.
Reduced Size: The model size is reduced by nearly six times, optimizing storage and memory usage.
Versatile Application: Ideal for integrating a powerful visual language model into various projects particularly multi rag chains.
Check it out!

  • 1 reply
ยท
mpimentelย 
posted an update 1 day ago
view post
Post
998
Just dropped a new blog post ๐Ÿค—

"Comparing Open-source and Proprietary LLMs in Medical AI"

We put the biggest AI brains ๐Ÿง  to the test on medical tasks ๐Ÿฅ using popular benchmarks such as MedQA.

Closed-source vs. open-source, costs decoded, and performance revealed! We need even more comprehensive benchmarks and evaluations of LLMs in medical tasks ๐Ÿ”ฌ

Read more at: https://huggingface.co/blog/mpimentel/comparing-llms-medical-ai
m-ricย 
posted an update 1 day ago
view post
Post
797
๐Ÿ‡จ๐Ÿ‡ณโ›ต๏ธ ๅ‡บๆตท: Chinese AI is expanding globally

Fact: Chinese LLMs are heavily underrated, for instance recently the excellent Deepseek-v2.5 or Qwen models.

Luckily for us, @AdinaY just wrote an excellent blog post explaining the Chinese AI ecosystem!

My key takeaways:

Since Google, OpenAI and Anthropic models are not available in China, local companies are fighting for the market. A really good market - AI has much higher penetration there than in the rest of the world, both with companies and individual users!

๐Ÿ’ฐ But since Deepseek heavily cut prices in May 24, this spiraled into a price war that created a cut-throat environment with unsustainably low prices.

๐Ÿ“‹ On top of this, the local regulation is stringent: models must undergo licensing from a local censor (the Cyberspace Administration of China), that for instance requires models to refuse answering certain questions on the CCP. Although this is certainly simpler to implement than certain condition of the European AI Act.

๐Ÿ’ธ If this wasn't enough, VC investment in AI is drying out: By mid-2024, Chinese AI startups raised approximately $4.4 billion, vs $55B for US startups just in Q2 24.

๐Ÿ“ฑ To get profitability companies have shifted from foundational models to model + application, for instance PopAI from [01.AI](http://01.ai/) with millions of users and high profitability.

โ›๏ธ They also try to drill down specific industries: but these niches are also getting crowded.

โžก๏ธ Since their home market is becoming both too crowded and unhospitable, Chinese companies are now going for international market, "Sailing abroad" following the expression consacred for Zheng He's legendary journey in 1500.

There, they'll have to adapt to different infrastructures and regulations, but they have bright prospects for growth!

Read her post ๐Ÿ‘‰ย https://huggingface.co/blog/AdinaY/chinese-ai-global-expansion
singhsidhukuldeepย 
posted an update 2 days ago
view post
Post
1067
Good folks at @PyTorch have just released torchao, a game-changing library for native architecture optimization.

-- How torchao Works (They threw the kitchen-sink at it...)

torchao leverages several advanced techniques to optimize PyTorch models, making them faster and more memory-efficient. Here's an overview of its key mechanisms:

Quantization

torchao employs various quantization methods to reduce model size and accelerate inference:

โ€ข Weight-only quantization: Converts model weights to lower precision formats like int4 or int8, significantly reducing memory usage.
โ€ข Dynamic activation quantization: Quantizes activations on-the-fly during inference, balancing performance and accuracy.
โ€ข Automatic quantization: The autoquant function intelligently selects the best quantization strategy for each layer in a model.

Low-bit Datatypes

The library utilizes low-precision datatypes to speed up computations:

โ€ข float8: Enables float8 training for linear layers, offering substantial speedups for large models like LLaMA 3 70B.
โ€ข int4 and int8: Provide options for extreme compression of weights and activations.

Sparsity Techniques

torchao implements sparsity methods to reduce model density:

โ€ข Semi-sparse weights: Combine quantization with sparsity for compute-bound models.

KV Cache Optimization

For transformer-based models, torchao offers KV cache quantization, leading to significant VRAM reductions for long context lengths.

Integration with PyTorch Ecosystem

torchao seamlessly integrates with existing PyTorch tools:

โ€ข Compatible with torch.compile() for additional performance gains.
โ€ข Works with FSDP2 for distributed training scenarios.
โ€ข Supports most PyTorch models available on Hugging Face out-of-the-box.

By combining these techniques, torchao enables developers to significantly improve the performance and efficiency of their PyTorch models with minimal code changes and accuracy impact.
ยท
m-ricย 
posted an update 2 days ago
view post
Post
1026
Emu3: Next-token prediction conquers multimodal tasks ๐Ÿ”ฅ

This is the most important research in months: weโ€™re now very close to having a single architecture to handle all modalities. The folks at Beijing Academy of Artificial Intelligence (BAAI) just released Emu3, a single model that handles text, images, and videos all at once.

๐—ช๐—ต๐—ฎ๐˜'๐˜€ ๐˜๐—ต๐—ฒ ๐—ฏ๐—ถ๐—ด ๐—ฑ๐—ฒ๐—ฎ๐—น?
๐ŸŒŸ Emu3 is the first model to truly unify all these different types of data (text, images, video) using just one simple trick: predicting the next token.
And itโ€™s only 8B, but really strong:
๐Ÿ–ผ๏ธ For image generation, it's matching the best specialized models out there, like SDXL.
๐Ÿ‘๏ธ In vision tasks, it's outperforming top models like LLaVA-1.6-7B, which is a big deal for a model that wasn't specifically designed for this.
๐ŸŽฌ It's the first to nail video generation without using complicated diffusion techniques.

๐—›๐—ผ๐˜„ ๐—ฑ๐—ผ๐—ฒ๐˜€ ๐—ถ๐˜ ๐˜„๐—ผ๐—ฟ๐—ธ?
๐Ÿงฉ Emu3 uses a special tokenizer (SBER-MoVQGAN) to turn images and video clips into sequences of 4,096 tokens.
๐Ÿ”— Then, it treats everything - text, images, and videos - as one long series of tokens to predict.
๐Ÿ”ฎ During training, it just tries to guess the next token, whether that's a word, part of an image, or a video frame.

๐—–๐—ฎ๐˜ƒ๐—ฒ๐—ฎ๐˜๐˜€ ๐—ผ๐—ป ๐˜๐—ต๐—ฒ ๐—ฟ๐—ฒ๐˜€๐˜‚๐—น๐˜๐˜€:
๐Ÿ‘‰ In image generation, Emu3 beats SDXL, but itโ€™s also much bigger (8B vs 3.5B). It would be more difficult to beat the real diffusion GOAT FLUX-dev.
๐Ÿ‘‰ In vision, authors also donโ€™t show a comparison against all the current SOTA models like Qwen-VL or Pixtral.

This approach is exciting because it's simple (next token prediction) and scalable(handles all sorts of data)!

Read the paper ๐Ÿ‘‰ Emu3: Next-Token Prediction is All You Need (2409.18869)