The Fellowship is a network of exceptional people from different backgrounds who contribute to open-source machine learning π§ββοΈπ¦ΈββοΈπ¦Ήπ§ββοΈ
Before 2020, most of the AI field was open and collaborative. For me, that was the key factor that accelerated scientific progress and made the impossible possibleβjust look at the βTβ in ChatGPT, which comes from the Transformer architecture openly shared by Google.
Then came the myth that AI was too dangerous to share, and companies started optimizing for short-term revenue. That led many major AI labs and researchers to stop sharing and collaborating.
With OAI and sama now saying they're willing to share open weights again, we have a real chance to return to a golden age of AI progress and democratizationβpowered by openness and collaboration, in the US and around the world.
This is incredibly exciting. Letβs go, open science and open-source AI!
Very interesting security section by @yjernite@lvwerra@reach-vb@dvilasuero & the team replicating R1. Broadly applicable to most open-source models & some to APIs (but APIs have a lot more additional risks because you're not in control of the underlying system):
βΌοΈSentence Transformers v4.0 is out! You can now train and finetune reranker models with multi-GPU training, bf16 support, loss logging, callbacks & much more. I also prove that finetuning on your domain helps much more than you might think.
1οΈβ£ Reranker Training Refactor Reranker models can now be trained using an extensive trainer with a lot of powerful features: - MultiGPU Training (Data Parallelism (DP) and Distributed Data Parallelism (DDP)) - bf16 training support; loss logging - Evaluation datasets + evaluation loss - Improved callback support + an excellent Weights & Biases integration - Gradient checkpointing, gradient accumulation - Model card generation - Resuming from a training checkpoint without performance loss - Hyperparameter Optimization and much more!
Read my detailed blogpost to learn about the components that make up this new training approach: https://huggingface.co/blog/train-reranker Notably, the release is fully backwards compatible: all deprecations are soft, meaning that they still work but emit a warning informing you how to upgrade.
2οΈβ£ New Reranker Losses - 11 new losses: - 2 traditional losses: BinaryCrossEntropy and CrossEntropy - 2 distillation losses: MSE and MarginMSE - 2 in-batch negatives losses: MNRL (a.k.a. InfoNCE) and CMNRL - 5 learning to rank losses: Lambda, p-ListMLE, ListNet, RankNet, ListMLE
3οΈβ£ New Reranker Documentation - New Training Overview, Loss Overview, API Reference docs - 5 new, 1 refactored training examples docs pages - 13 new, 6 refactored training scripts - Migration guides (2.x -> 3.x, 3.x -> 4.x)
4οΈβ£ Blogpost Alongside the release, I've written a blogpost where I finetune ModernBERT on a generic question-answer dataset. My finetunes easily outperform all general-purpose reranker models, even models 4x as big. Finetuning on your domain is definitely worth it: https://huggingface.co/blog/train-reranker
π Multimodal > Mistral AI released a 24B vision LM, both base and instruction FT versions, sota π₯ (OS) > with IBM we released SmolDocling, a sota 256M document parser with Apache 2.0 license (OS) > SpatialLM is a new vision LM that outputs 3D bounding boxes, comes with 0.5B (QwenVL based) and 1B (Llama based) variants > SkyWork released SkyWork-R1V-38B, new vision reasoning model (OS)
π¬ LLMs > NVIDIA released new Nemotron models in 49B and 8B with their post-training dataset > LG released EXAONE, new reasoning models in 2.4B, 7.8B and 32B > Dataset: Glaive AI released a new reasoning dataset of 22M+ examples > Dataset: NVIDIA released new helpfulness dataset HelpSteer3 > Dataset: OpenManusRL is a new agent dataset based on ReAct framework (OS) > Open-R1 team released OlympicCoder, new competitive coder model in 7B and 32B > Dataset: GeneralThought-430K is a new reasoning dataset (OS)
πΌοΈ Image Generation/Computer Vision > Roboflow released RF-DETR, new real-time sota object detector (OS) π₯ > YOLOE is a new real-time zero-shot object detector with text and visual prompts π₯Ή > Stability AI released Stable Virtual Camera, a new novel view synthesis model > Tencent released Hunyuan3D-2mini, new small and fast 3D asset generation model > ByteDance released InfiniteYou, new realistic photo generation model > StarVector is a new 8B model that generates svg from images > FlexWorld is a new model that expands 3D views (OS)
π€ Audio > Sesame released CSM-1B new speech generation model (OS)
π€ Robotics > NVIDIA released GR00T, new robotics model for generalized reasoning and skills, along with the dataset
An implementation of T5 in PyTorch with UL2 objective optimized for GPGPU for both training and inference thanks to 13 different optimizations. The main one is that we have designed a CUDA kernel to expand the Flash Attention by @tridao with RPE biases and supports other PE such as RoPE, ALiBi or FIRE. The result kernel is 2 times faster than a SPDA implementation. We also use Triton kernels to optimize certain parts of the architecture, such as the cross-entropy and RMSNorm layer.
The various kernels have been carefully built to be compatible with BF16 and torch.compile to go even faster and achieve efficient pretraining.
This methodology enabled us to efficiently pretrain as a proof of concept a FAT5 with 147M parameters in French in a reasonable time (1,461H for 419B tokens), with limited resources (1 A100 i.e. a computational budget of ~ β¬1,900) and a low carbon footprint (13.5kg eq CO2).
The model's weights are also available on Hugging Face: CATIE-AQ/FAT5-small. Not very useful in practice, it's a PoC and not an instructed model (it's planned for later).
All the code is available on GitHub if you want to pretrain your own model in your own language or for a specific domain: https://github.com/catie-aq/flashT5 β
Ending by indicating that was a joint project with @BorisAlbar at hf.co/CATIE-AQ.
We just crossed 1,500,000 public models on Hugging Face (and 500k spaces, 330k datasets, 50k papers). One new repository is created every 15 seconds. Congratulations all!
An assembly of 18 European companies, labs, and universities have banded together to launch πͺπΊ EuroBERT! It's a state-of-the-art multilingual encoder for 15 European languages, designed to be finetuned for retrieval, classification, etc.
πͺπΊ 15 Languages: English, French, German, Spanish, Chinese, Italian, Russian, Polish, Portuguese, Japanese, Vietnamese, Dutch, Arabic, Turkish, Hindi 3οΈβ£ 3 model sizes: 210M, 610M, and 2.1B parameters - very very useful sizes in my opinion β‘οΈ Sequence length of 8192 tokens! Nice to see these higher sequence lengths for encoders becoming more common. βοΈ Architecture based on Llama, but with bi-directional (non-causal) attention to turn it into an encoder. Flash Attention 2 is supported. π₯ A new Pareto frontier (stronger *and* smaller) for multilingual encoder models π Evaluated against mDeBERTa, mGTE, XLM-RoBERTa for Retrieval, Classification, and Regression (after finetuning for each task separately): EuroBERT punches way above its weight. π Detailed paper with all details, incl. data: FineWeb for English and CulturaX for multilingual data, The Stack v2 and Proof-Pile-2 for code.
The next step is for researchers to build upon the 3 EuroBERT base models and publish strong retrieval, zero-shot classification, etc. models for all to use. I'm very much looking forward to it!
I was chatting with @peakji , one of the cofounders of Manu AI, who told me he was on Hugging Face (very cool!).
He shared an interesting insight which is that agentic capabilities might be more of an alignment problem rather than a foundational capability issue. Similar to the difference between GPT-3 and InstructGPT, some open-source foundation models are simply trained to 'answer everything in one response regardless of the complexity of the question' - after all, that's the user preference in chatbot use cases. Just a bit of post-training on agentic trajectories can make an immediate and dramatic difference.
As a thank you to the community, he shared 100 invite code first-come first serve, just use βHUGGINGFACEβ to get access!