view article Article Improving performance with Arena Learning in post training By satpalsr • 24 days ago • 4
view article Article Perspectives for first principles prompt engineering By KnutJaegersberg • Aug 18 • 16
GoldFinch: High Performance RWKV/Transformer Hybrid with Linear Pre-Fill and Extreme KV-Cache Compression Paper • 2407.12077 • Published Jul 16 • 52
MADLAD400 Models Collection CTranslate2 compatible model files of MADLAD400 models • 4 items • Updated Jul 21 • 2
view article Article Formatting Datasets for Chat Template Compatibility By nroggendorff • Jun 28 • 7
view article Article EU Training Data Transparency: A Proposal for a Sufficiently Detailed Summary 📑📚🖼️🇪🇺 By yjernite • Jul 3 • 8
view article Article Fine-tuning Florence-2 - Microsoft's Cutting-edge Vision Language Models Jun 24 • 169
DataComp-LM: In search of the next generation of training sets for language models Paper • 2406.11794 • Published Jun 17 • 48
view article Article 🦙⚗️ Using Llama3 and distilabel to build fine-tuning datasets By dvilasuero • Jun 4 • 69
view article Article Enhancing Image Model Dreambooth Training Through Effective Captioning: Key Observations By alvdansen • Jun 19 • 17
MobileCLIP Models + DataCompDR Data Collection MobileCLIP: Mobile-friendly image-text models with SOTA zero-shot capabilities. DataCompDR: Improved datasets for training image-text SOTA models. • 22 items • Updated about 23 hours ago • 23
view article Article Training and Finetuning Embedding Models with Sentence Transformers v3 May 28 • 148
PaliGemma Release Collection Pretrained and mix checkpoints for PaliGemma • 16 items • Updated Jul 31 • 136
view article Article Train custom AI models with the trainer API and adapt them to 🤗 By not-lain • Jun 29 • 33
Multilingual Instruction Tuning With Just a Pinch of Multilinguality Paper • 2401.01854 • Published Jan 3 • 10
Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence Paper • 2404.05892 • Published Apr 8 • 31
Transformer-Lite: High-efficiency Deployment of Large Language Models on Mobile Phone GPUs Paper • 2403.20041 • Published Mar 29 • 34
ORPO: Monolithic Preference Optimization without Reference Model Paper • 2403.07691 • Published Mar 12 • 60
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection Paper • 2403.03507 • Published Mar 6 • 182
Efficiently Adapting Pretrained Language Models To New Languages Paper • 2311.05741 • Published Nov 9, 2023 • 11