BERT community

Activity Feed

AI & ML interests

This organization is maintained by the transformers team at Hugging Face and contains the historical (pre-"Hub") BERT checkpoints.

google-bert's activity

lysandreย 
posted an update 13 days ago
view post
Post
5438
SmolVLM-2 and SigLIP-2 are now part of transformers in dedicated releases!

They're added on top of the v4.49.0 release, and can be installed from the following tags: v4.49.0-SmolVLM-2 and v4.49.0-SigLIP-2.

This marks a new beginning for the release process of transformers. For the past five years, we've been doing monthly releases featuring many models (v4.49.0, the latest release, features 9 new architectures).

Starting with SmolVLM-2 & SigLIP2, we'll now additionally release tags supporting new models on a stable branch. These models are therefore directly available for use by installing from the tag itself. These tags will continue to be updated with fixes applied to these models.

Going forward, continue expecting software releases following semantic versioning: v4.50.0 will have ~10 new architectures compared to v4.49.0, as well as a myriad of new features, improvements and bug fixes. Accompanying these software releases, we'll release tags offering brand new models as fast as possible, to make them accessible to all immediately.
  • 1 reply
ยท
julien-cย 
posted an update 3 months ago
view post
Post
10186
After some heated discussion ๐Ÿ”ฅ, we clarify our intent re. storage limits on the Hub

TL;DR:
- public storage is free, and (unless blatant abuse) unlimited. We do ask that you consider upgrading to PRO and/or Enterprise Hub if possible
- private storage is paid above a significant free tier (1TB if you have a paid account, 100GB otherwise)

docs: https://huggingface.co/docs/hub/storage-limits

We optimize our infrastructure continuously to scale our storage for the coming years of growth in Machine learning, to the benefit of the community ๐Ÿ”ฅ

cc: @reach-vb @pierric @victor and the HF team
ยท
julien-cย 
posted an update 3 months ago
view post
Post
3189
wow ๐Ÿ˜ฎ

INTELLECT-1 is the first collaboratively trained 10 billion parameter language model trained from scratch on 1 trillion tokens of English text and code.

PrimeIntellect/INTELLECT-1-Instruct
julien-cย 
posted an update 10 months ago
view post
Post
5213
Hey it was good meeting you yesterday @MaziyarPanahi ๐Ÿ”ฅ

thanks @mishig for setting this up

Let's make the Hub as useful as possible for the community โค๏ธ
  • 1 reply
ยท
osansevieroย 
posted an update 11 months ago
view post
Post
12133
Diaries of Open Source. Part 15 ๐Ÿค—

๐Ÿ•ต๏ธโ€โ™€๏ธIdefics 2 is out, a multimodal open-source model with very nice capabilities
Models, demo, and datasets: HuggingFaceM4/idefics2-661d1971b7c50831dd3ce0fe
Blog: https://hf.co/blog/idefics2

๐Ÿ’พSnowflake released snowflake-arctic-embed, a family of powerful small embedding models
Model: Snowflake/snowflake-arctic-embed-m
Blog: https://www.snowflake.com/blog/introducing-snowflake-arctic-embed-snowflakes-state-of-the-art-text-embedding-family-of-models/

โœจPile-T5, EleutherAI's T5 model trained on 2T tokens
Blog: https://blog.eleuther.ai/pile-t5/
Models: EleutherAI/pile-t5-65a76a0d0022dd270b385a66
GitHub: https://github.com/EleutherAI/improved-t5

๐Ÿค–CodeQwen1.5-7B base and chat models. Models trained on 3T tokens strong benchmark results for code generation, editing and SQL
Blog post: https://qwenlm.github.io/blog/codeqwen1.5/
Demo: Qwen/CodeQwen1.5-7b-Chat-demo
Models: Qwen/CodeQwen1.5-7B and Qwen/CodeQwen1.5-7B-Chat

Misc
๐Ÿฆ‰ DocOwl1.5: Unified Stucture Learning for OCR-free Document Understanding mPLUG/DocOwl
๐Ÿ‘€Cerule - a tiny Vision LM model Tensoic/Cerule-v0.1
ChemLLM - a LLM for chemistry and molecule science โš—๏ธhttps://hf.co/AI4Chem/ChemLLM-7B-Chat-1.5-DPO
Distil Whisper Large
๐Ÿ“New pdf/OCR datasets with 19 samples pixparse/pdf-document-ocr-datasets-660701430b0346f97c4bc628
๐Ÿ”ฅGretel AI high quality text-to-sql synthetic dataset gretelai/synthetic_text_to_sql
ยท
osansevieroย 
posted an update 11 months ago
view post
Post
11247
Diaries of Open Source. Part 14 ๐Ÿค—

๐Ÿ”ฅCohereForAI releases Command R+, an open 104B model with:
- Tool usage capabilities
- Specialized in RAGs
- Multilingual
It's one of the first models to surpass GPT-4 in the lmsys arena, check it out!
Model: CohereForAI/c4ai-command-r-plus
Official demo: https://hf.co/spaces/CohereForAI/c4ai-command-r-plus
Quantized: CohereForAI/c4ai-command-r-plus-4bit

๐ŸŽ‰Google releases a new version of their Gemma instruct models, with improved quality, nicer to converse, and a fancier RL algorithm. The model is similar to Llama 2 70B in the Chat Arena!
Models: google/gemma-release-65d5efbccdbb8c4202ec078b
Try it out in HuggingChat https://hf.co/chat/models/google/gemma-1.1-7b-it

๐Ÿช„VoiceCraft, a speech editing and TTS SOTA open model
Paper: VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild (2403.16973)
Model: pyp1/VoiceCraft

๐Ÿ’ปGoogle released CodeGemma, a family of code generation, completion, and chat models
Blog post: https://hf.co/blog/codegemma
Models: google/codegemma-release-66152ac7b683e2667abdee11
Report: https://storage.googleapis.com/deepmind-media/gemma/codegemma_report.pdf

Misc models:
๐Ÿฆ–T-Rex2, a very powerful object detection model for many applications https://github.com/IDEA-Research/T-Rex
๐Ÿ‘€ CT-RATE : A 3D dataset paired with text reports ibrahimhamamci/CT-RATE
๐Ÿ™Octopus v2: a Gemma-based model trained for Android API - extremely fast, better than Llama+RAG, great results NexaAIDev/Octopus-v2
  • 2 replies
ยท
julien-cย 
posted an update 11 months ago
view post
Post
6912
text-generation-inference (TGI) is now fully open-source again!

Along with text-embeddings-inference.

We just switched both of those repos' license back to Apache 2. ๐Ÿ”ฅ
osansevieroย 
posted an update 11 months ago
view post
Post
2300
Diaries of Open Source. Part 13 ๐Ÿค—

๐ŸคTwo different bitnet 1.5 open-source replications
Original paper: The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits (2402.17764)
1bitllm experiment: https://hf.co/blog/joey00072/experiments-with-bitnet-1-5
NousResearch experiment NousResearch/OLMo-Bitnet-1B

๐ŸฅณTiny and large multimodal models great for embeddings
GitHub: https://github.com/unum-cloud/uform
Encoders: https://hf.co/collections/unum-cloud/multimodal-encoders-660553903617c5297eb16838
ONNX weights: https://hf.co/collections/unum-cloud/uform-vl-english-large-onnx-66055a57c182d846f3bc1949

๐Ÿ“œ SMPLer-X: Expressive Human Pose and Shape Estimation
Project website: https://caizhongang.com/projects/SMPLer-X/
Demo: caizhongang/SMPLer-X
Paper: SMPLer-X: Scaling Up Expressive Human Pose and Shape Estimation (2309.17448)

๐Ÿง™GeoWizard: 3D Geometry Estimation
Project website: https://fuxiao0719.github.io/projects/geowizard/
Demo: lemonaddie/geowizard

Misc models and datasets
- Dolphin-2.8-mistral-7b-v0.2 cognitivecomputations/dolphin-2.8-mistral-7b-v02
- Hermes-2-Pro-11B, a self-frankenmerge 11B variant mattshumer/Hermes-2-Pro-11B
- Large conversational dataset based on Usenet data in the Italian language mii-community/UsenetArchiveIT-conversations
  • 3 replies
ยท
osansevieroย 
posted an update 11 months ago
view post
Post
3556
Diaries of Open Source. Part 12 ๐Ÿค—

๐Ÿš€Alibaba releases Qwen1.5-MoE-A2.7B, an interesting MoE with 2.7B activated parameters and 64 experts
Blog https://qwenlm.github.io/blog/qwen-moe/
Demo: Qwen/qwen1.5-MoE-A2.7B-Chat-demo
Models: https://hf.co/Qwen
GitHub: https://github.com/QwenLM/Qwen1.5

๐ŸŽตVoiceCraft, SOTA speech editing and text to speech
GitHub: https://github.com/jasonppy/VoiceCraft
Model: pyp1/VoiceCraft

๐Ÿ AI21Labs release Jamba, an SSM-Transformer, pretrained MoE which allows a large context window (256K) and high throughput
Blog https://www.ai21.com/blog/announcing-jamba
Model ai21labs/Jamba-v0.1

โœจ Berkeley releases Starling-LM-7B, an RLHF-ed model, and -RM-34B, a Yi-based reward model very good for its size
Starling Beta: Nexusflow/Starling-LM-7B-beta
Starling RM: Nexusflow/Starling-RM-34B

๐Ÿ–ฅ๏ธStability releases Stable Code Instruct 3B, an instruct model for code generation
Blog: https://stability.ai/news/introducing-stable-code-instruct-3b
Demo: stabilityai/stable-code-instruct-3b
Report: https://stability.ai/s/Stable_Code_TechReport_release.pdf

๐Ÿ“šCommon Corpus: the largest public domain dataset for training LLMs
Blog: https://hf.co/blog/Pclanglais/common-corpus
Dataset: https://hf.co/collections/PleIAs/common-corpus-65d46e3ea3980fdcd66a5613

Misc:
โšกGaLore: a very memory-efficient technique that allows pretraining models in consumer GPUs https://hf.co/blog/galore
Moirai
๐Ÿ“ˆMoirai, foundation models for time series forecasting https://hf.co/collections/Salesforce/moirai-10-r-models-65c8d3a94c51428c300e0742
๐Ÿ”ฅ Mistral-ORPO-Capybara-7K, a high-quality Mistral fine-tune using ORPO, a new alignment technique kaist-ai/mistral-orpo-capybara-7k
๐ŸคฏAPISR, an anime super-resolution upscaling model HikariDawn/APISR
ยท
osansevieroย 
posted an update 11 months ago
view post
Post
2082
Diaries of Open Source. Part 11 ๐Ÿš€

๐Ÿš€Databricks release DBRX, potentially the best open access model! A 132B Mixture of Experts with 36B active params and trained on 12 trillion tokens
Blog: https://www.databricks.com/blog/introducing-dbrx-new-state-art-open-llm
Base and instruct models: databricks/dbrx-6601c0852a0cdd3c59f71962
Demo: https://hf.co/spaces/databricks/dbrx-instruct

๐Ÿค1-bit and 2-bit quantization exploration using HQQ+
Blog post: https://mobiusml.github.io/1bit_blog/
Models: https://hf.co/collections/mobiuslabsgmbh/llama2-7b-hqq-6604257a96fc8b9c4e13e0fe
GitHub: https://github.com/mobiusml/hqq

๐Ÿ“šCosmopedia: a large-scale synthetic dataset for pre-training - it includes 25 billion tokens and 30 million files
Dataset: HuggingFaceTB/cosmopedia
Blog: https://hf.co/blog/cosmopedia

โญMini-Gemini: multi-modal VLMs, from 2B to 34B
Models: https://hf.co/collections/YanweiLi/mini-gemini-6603c50b9b43d044171d0854
Paper: Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models (2403.18814)
GitHub: https://github.com/dvlab-research/MiniGemini

๐Ÿ”ฅVILA - On Pre-training for VLMs
Models: Efficient-Large-Model/vila-on-pre-training-for-visual-language-models-65d8022a3a52cd9bcd62698e
Paper: VILA: On Pre-training for Visual Language Models (2312.07533)

Misc
๐Ÿ‘€ FeatUp: a framework for image features at any resolution: mhamilton723/FeatUp FeatUp: A Model-Agnostic Framework for Features at Any Resolution (2403.10516)
๐ŸžColBERTus Maxiums, a colbertialized embedding model mixedbread-ai/mxbai-colbert-large-v1
๐Ÿ–Œ๏ธSemantic Palette, a new drawing paradigm ironjr/SemanticPalette
๐Ÿง‘โ€โš•๏ธHistoGPT, a vision model that generates accurate pathology reports marr-peng-lab/histogpt https://www.medrxiv.org/content/10.1101/2024.03.15.24304211v1
ยท
julien-cย 
posted an update 11 months ago
view post
Post
3183
Very glad to welcome @josefprusa , pioneer of 3D printing and open source hardware, founder of https://www.prusa3d.com/, to the HF Hub ๐Ÿ‘‹

AI applied to 3D printing could be big.
  • 1 reply
ยท
osansevieroย 
posted an update 12 months ago
view post
Post
1625
Diaries of Open Source. Part 10 ๐Ÿš€

๐ŸŒผMarigold-LCM: A super fast SOTA Depth Estimator
Demo: prs-eth/marigold-lcm
Original paper: Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation (2312.02145)
Model: https://hf.co/prs-eth/marigold-lcm-v1-0

๐ŸŒŸQuiet-STaR: A self-teaching technique via internal monologue
Paper: Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking (2403.09629)
GitHub: https://github.com/ezelikman/quiet-star
Tweetutorial: https://twitter.com/ericzelikman/status/1768663835106513041

๐Ÿ–ผ๏ธ WebSight v0.2: A image-to-code dataset containing tailwind CSS, images in screenshots, and more!
Dataset: HuggingFaceM4/WebSight
Paper: Unlocking the conversion of Web Screenshots into HTML Code with the WebSight Dataset (2403.09029)
Blog: https://hf.co/blog/websight

๐Ÿ•ต๏ธAgent-FLAN - effective agent tuning for LLMs
Paper: Agent-FLAN: Designing Data and Methods of Effective Agent Tuning for Large Language Models (2403.12881)
Model: internlm/Agent-FLAN-7b
Dataset: internlm/Agent-FLAN
Website: https://internlm.github.io/Agent-FLAN/

๐Ÿ”ฅHPT, a family of multimodal LLMs from HyperGAI
Blog post: https://hypergai.com/blog/introducing-hpt-a-family-of-leading-multimodal-llms
Model: HyperGAI/HPT
GitHub: https://github.com/hyperGAI/HPT

๐ŸŒModels and datasets around the world
- Tess-70B, a MiQu-70B fine-tune with high-quality data migtissera/Tess-70B-v1.6
- UNI, a model trained on 100 million pathology images from 100k+ slides MahmoodLab/UNI
- CONCH, a VLM trained on 1.17 million pathology image-text pairs MahmoodLab/CONCH
ยท
osansevieroย 
posted an update 12 months ago
view post
Post
3276
Diaries of Open Source. Part 9!

โฐAmazon releases Chronos, a family of models for time series
Base model: amazon/chronos-t5-large
Paper: Chronos: Learning the Language of Time Series (2403.07815)
Models: https://huggingface.co/collections/amazon/chronos-models-65f1791d630a8d57cb718444

๐Ÿ’กORPO Alignment: align without a reference model nor SFT!
Paper: ORPO: Monolithic Preference Optimization without Reference Model (2403.07691)
Models: kaist-ai/orpo-65efef87544ba100aef30013
GitHub: https://github.com/xfactlab/orpo

๐Ÿ‡บ๐Ÿ‡ณCohere releases 250M Wikipedia Embeddings in 300+ languages
Data: Cohere/wikipedia-2023-11-embed-multilingual-v3
Announcement: https://twitter.com/Nils_Reimers/status/1767891859207057618

๐ŸงฌSegmentNT: a LLM for annotating DNA at single nucleotide resolution
Models: InstaDeepAI/segmentnt-65eb4941c57808b4a3fe1319
GitHub repo: https://github.com/instadeepai/nucleotide-transformer
Paper: https://www.biorxiv.org/content/10.1101/2024.03.14.584712v1

๐Ÿš€DynamiCrafter: video generation models for interpolation and looping are out!
Project page: https://doubiiu.github.io/projects/DynamiCrafter/
GitHub: https://github.com/Doubiiu/DynamiCrafter
Demo: Doubiiu/DynamiCrafter_interp_loop

๐Ÿš€Stanford releases Anticipatory Music Transformer:
GitHub: https://github.com/jthickstun/anticipation/
Models: https://hf.co/stanford-crfm
Original blog announcement: https://crfm.stanford.edu/2023/06/16/anticipatory-music-transformer.html
  • 2 replies
ยท
osansevieroย 
posted an update 12 months ago
view post
Post
2535
Diaries of Open Source. Part 8!

๐ŸคฏCRM: Image-to-3D Textured Mesh
Demo: Zhengyi/CRM
Model: Zhengyi/CRM
Project page: https://ml.cs.tsinghua.edu.cn/~zhengyi/CRM/
Paper: CRM: Single Image to 3D Textured Mesh with Convolutional Reconstruction Model (2403.05034)

๐ŸคHalf Quadratic Quantization: super-fast quantization of very large models
Blog post: https://mobiusml.github.io/hqq_blog/
Colab: https://colab.research.google.com/drive/1cG_5R_u9q53Uond7F0JEdliwvoeeaXVN?usp=sharing
Repo: https://github.com/mobiusml/hqq

๐Ÿค—GemMoE -Gemma + MoE
Model: Crystalcareai/GemMoE-Base-Random
Collection: Crystalcareai/gemmoe-65f11f4922af97ebe9943591

๐Ÿ‘€VeCLIP and MOFI, new 0-shot and image retrieval models by Apple, are now open-source!
GitHub: https://github.com/apple/ml-veclip/ and https://github.com/apple/ml-mofi
VeCLIP paper: From Scarcity to Efficiency: Improving CLIP Training via Visual-enriched Captions (2310.07699)
MOFI paper: MOFI: Learning Image Representations from Noisy Entity Annotated Images (2306.07952)

โšกSPIN: Recipe for alignment with very little data
Collection: argilla/dibt-prompt-collective-spin-65ef59062518776024395fc3
Tweetutorial: https://twitter.com/argilla_io/status/1767608154697699455

๐Ÿ‘€ViT Prisma - an interoperability library for vision models
GitHub: https://github.com/soniajoseph/ViT-Prisma

โ˜•OpenLRM: full model and training code are open-sourced
Codebase: https://github.com/3DTopia/OpenLRM
Demo: zxhezexin/OpenLRM
Models: https://huggingface.co/zxhezexin

โš—๏ธOxford releases an extensive PEFT evaluation for bio models
Model: NTaylor/bio-mobilebert-mimic-mp-lora
GitHub: https://github.com/nlpie-research/efficient-ml
Paper: Efficiency at Scale: Investigating the Performance of Diminutive Language Models in Clinical Tasks (2402.10597)

๐ŸŒData and models around the world
Hermes 2 Pro 7B: an upgraded Nous Hermes 2 model with strong function calling and JSON capabilities NousResearch/Hermes-2-Pro-Mistral-7B
Navarasa-2.0โ€Š: Gemma fine-tuned in 15 indian language Telugu-LLM-Labs/navarasa-65f5e6ffdf29f02c6d7767ce
ยท
osansevieroย 
posted an update 12 months ago
view post
Post
1895
Diaries of Open Source. Part 7!

๐Ÿ”ฅSakana releases Evolutionary Model Merge
Blog post: https://sakana.ai/evolutionary-model-merge/
Paper: Evolutionary Optimization of Model Merging Recipes (2403.13187)
Models and demo: https://hf.co/SakanaAI

๐ŸžMixedBread releases new SoTA sentence embedding model
Announcement: https://www.mixedbread.ai/blog/mxbai-embed-large-v1
Model: mixedbread-ai/mxbai-embed-large-v1

๐ŸŽฅVideoMamba, a Mamba-based model for video understanding
Blog: https://hf.co/blog/vladbogo/video-mamba
Demo: OpenGVLab/VideoMamba
Model: OpenGVLab/VideoMamba

๐Ÿ” MathVerse, a visual math benchmark for multimodal LLMs
Paper page: https://mathverse-cuhk.github.io/
Dataset: AI4Math/MathVerse
Paper: MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems? (2403.14624)

๐Ÿง GraphWiz, a family of instruct-tuned LLMs to solve graph problems
Repos: https://hf.co/GraphWiz
Paper: GraphWiz: An Instruction-Following Language Model for Graph Problems (2402.16029)

๐Ÿช†NLLB-SigLIP-MRL: a combination of NLLB and SigLIP trained with Matryoshka representation learning
Model: visheratin/nllb-siglip-mrl-large
Tweet: https://twitter.com/visheratin/status/1766643219909984734?s=46

๐ŸงHDM and ProciGen: Template-free reconstruction of human-object interactions
Paper page: https://virtualhumans.mpi-inf.mpg.de/procigen-hdm/
Demo: xiexh20/HDM-interaction-recon
Models: xiexh20/HDM-models

๐ŸŒŽModels and data around the world
EagleX 7B, multi-lingual RNN-based model https://hf.co/spaces/recursal/EagleX-7B-1.7T-Gradio-Demo
Tamil LLM mervinpraison/tamil-large-language-model-7b-v1.0
  • 2 replies
ยท
osansevieroย 
posted an update 12 months ago
view post
Post
1922
Diaries of Open Source. Part 6!

๐ŸŽ๏ธxAI releases Grok-1, a 314B MoE
Blog: https://x.ai/blog/grok-os
GH repo: https://github.com/xai-org/grok-1
Model: xai-org/grok-1

๐Ÿ•บMusicLang, a model for controllable music generation
Demo: musiclang/musiclang-predict
GH repo: https://github.com/musiclang/musiclang_predict

๐Ÿ”ฌBioT5: a family of models for biology and chemical text tasks
Base model: QizhiPei/biot5-base
Model for molecule captioning and design: QizhiPei/biot5-base-mol2text and QizhiPei/biot5-base-text2mol
GH Repo: https://github.com/QizhiPei/BioT5
Paper: BioT5: Enriching Cross-modal Integration in Biology with Chemical Knowledge and Natural Language Associations (2310.07276)

๐ŸคCheck out the AQLM and QMoE official weights from ISTA-DAS lab
Org: https://hf.co/ISTA-DASLab
Papers: Extreme Compression of Large Language Models via Additive Quantization (2401.06118) and QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models (2310.16795)

๐Ÿš€Community releases
Einstein-v4-7B, a Mistral fine-tune on high-quality data Weyaxi/Einstein-v4-7B
IL-7B, a Misttral fine-tune merge for rheumatology cmcmaster/il_7b
Caselaw Access Project, a collaboration to digitalize 40 million US court decisions from 6.7 million cases from 360 years https://hf.co/datasets/TeraflopAI/Caselaw_Access_Project

๐ŸŒData and models around the world
HPLT Monolingual, a dataset of 75 languages with over 40TB of data HPLT/hplt_monolingual_v1_2
OpenLLM Turkish Benchmarks & Leaderboard malhajar/openllmturkishleadboard-datasets-65e5854490a87c0f2670ec18 and malhajar/OpenLLMTurkishLeaderboard
Occiglot, a collaborative effort for European LLMs with an initial release of 7B models for French, German, Spanish, and Italian occiglot/occiglot-eu5-7b-v01-65dbed502a6348b052695e01
Guftagoo, a Hindi+Hinglish multi-turn conversational dataset Tensoic/gooftagoo
AryaBhatta-Orca-Maths-Hindi dataset GenVRadmin/Aryabhatta-Orca-Maths-Hindi
  • 1 reply
ยท
osansevieroย 
posted an update 12 months ago
view post
Post
Diaries of Open Source. Part 5!

๐ŸคฏContextual KTO Mistral PairRM: this model combines iterative KTO, SnorkelAI DPO dataset, Allenai PairRM for ranking, Mistral for the base model, and is a very strong model with Claude 3 quality on AlpacaEval 2.0
Final model: ContextualAI/Contextual_KTO_Mistral_PairRM
Dataset: snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset
Leaderboard: https://tatsu-lab.github.io/alpaca_eval/
Base model: mistralai/Mistral-7B-Instruct-v0.2

๐Ÿค tinyBenchmarks: Quick and cheap LLM evaluation!
Code: https://github.com/felipemaiapolo/tinyBenchmarks
Paper: tinyBenchmarks: evaluating LLMs with fewer examples (2402.14992)
Data: tinyBenchmarks/tinyMMLU

๐ŸŽจTransformers.js 2.16 includes StableLM, speaker verification and diarization, and better chat templating. Try some fun demos!
- Xenova/video-object-detection
- Xenova/cross-encoder-web
- Xenova/the-tokenizer-playground

๐Ÿดโ€โ˜ ๏ธ Abascus Liberated-Qwen1.5-72B, a Qwen 72B-based model that strongly follows system prompts
Model: abacusai/Liberated-Qwen1.5-72B

๐Ÿ‘€Design2Code: benchmark of webpage screenshots to code
Data: SALT-NLP/Design2Code
Project https://salt-nlp.github.io/Design2Code/
Paper Design2Code: How Far Are We From Automating Front-End Engineering? (2403.03163)

๐ŸŒŽData and models around the world
- One of the biggest Italian datasets https://hf.co/datasets/manalog/UsenetArchiveIT
- IndicLLMSuite: argest Pre-training and Instruction Fine-tuning dataset collection across 22 Indic languages ai4bharat/indicllmsuite-65ee7d225c337fcfa0991707
- Hebrew-Gemma-11B, the best base Hebrew model yam-peleg/Hebrew-Gemma-11B
- Komodo-7B, a family of multiple Indonesian languages LLMs Yellow-AI-NLP/komodo-7b-base

You can find the previous part at https://huggingface.co/posts/osanseviero/127895284909100
osansevieroย 
posted an update 12 months ago
view post
Post
Diaries of Open Source. Part 4!

๐ŸŒCohere and Cohere4AI release Command-R, a 35B model that is multilingual, RAG-optimized, and can manage tools!
Model: CohereForAI/c4ai-command-r-v01
Blog post: https://txt.cohere.com/command-r/

๐Ÿง‘โ€๐ŸณStarChat2: A powerful code model that is conversational
Try it out: HuggingFaceH4/starchat2-playground
Repos: HuggingFaceH4/starchat2-15b-65f068417b330fafad751fce
Training code: https://github.com/huggingface/alignment-handbook/tree/main/recipes/starchat2-15b

๐ŸฒYi-9B: trained on 3 trillion tokens, this english-chinese LLM is quite good and with a very nice detailed report!
Model: 01-ai/Yi-9B
Paper: Yi: Open Foundation Models by 01.AI (2403.04652)

๐Ÿ‹DeepSeek-VL, 1.3B and 7B VLMs
Paper: DeepSeek-VL: Towards Real-World Vision-Language Understanding (2403.05525)
Large model: deepseek-ai/deepseek-vl-7b-chat

โœ๏ธWriter releases OmniACT: a dataset for multimodal agents for desktop and web.
Dataset: Writer/omniact
Paper: OmniACT: A Dataset and Benchmark for Enabling Multimodal Generalist Autonomous Agents for Desktop and Web (2402.17553)

๐ŸŽApple releases MobileCLIP: fast image-text models! https://github.com/apple/ml-mobileclip

๐Ÿฆ™๐Ÿ’ชLlamaGym - fine-tune LLM agents with RL in just a few lines of code! https://github.com/KhoomeiK/LlamaGym

๐Ÿ–ผ๏ธNew multimodal leaderboard ConTextual https://huggingface.co/blog/leaderboard-contextual

๐ŸŽ Design2Code: benchmark for multimodal LLMs for automating front-end development.
Dataset SALT-NLP/Design2Code
Paper Design2Code: How Far Are We From Automating Front-End Engineering? (2403.03163)
Project https://salt-nlp.github.io/Design2Code/

You can find the previous part at https://huggingface.co/posts/osanseviero/633758457910104
julien-cย 
posted an update 12 months ago
osansevieroย 
posted an update 12 months ago
view post
Post
Diaries of Open Source. Part 3! OS goes to the moon!

๐Ÿ’ป OpenCodeInterpreter, a family of very powerful code generation models
Models: m-a-p/opencodeinterpreter-65d312f6f88da990a64da456
Paper: OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement (2402.14658)
Demo m-a-p/OpenCodeInterpreter_demo

๐Ÿ”ท๐Ÿ”ถZephyr 7B Gemma, Gemma fine-tuned with the Zephyr recipe
Model: HuggingFaceH4/zephyr-7b-gemma-v0.1
Demo: HuggingFaceH4/zephyr-7b-gemma-chat
GH Repo: https://github.com/huggingface/alignment-handbook

๐Ÿช†The MixedBread folks released a 2D Matryoshka text embedding model, which means you can dynamically change the embedding size and layer counts
Model: mixedbread-ai/mxbai-embed-2d-large-v1
Release blog post: https://www.mixedbread.ai/blog/mxbai-embed-2d-large-v1

๐Ÿ‹Microsoft released Orca Math, which includes 200K grade school math problems
Dataset: microsoft/orca-math-word-problems-200k

๐ŸฅทIBM silently released Merlinite, a cool model trained on Mixtral-generated synthetic data using a novel LAB method https://huggingface.co/ibm/merlinite-7b

๐ŸŒš Moondream2 - a small vision language model to run on-device!
Model: vikhyatk/moondream2
Demo: vikhyatk/moondream2

๐Ÿ™๏ธCityDreamer: 3D City Generation
Demo: hzxie/city-dreamer
Repo: https://github.com/hzxie/city-dreamer
Model: hzxie/city-dreamer

๐ŸŒML in all languages
Sailor, a family of South-East Asian languages models sail/sailor-language-models-65e19a749f978976f1959825
Samvaad dataset, which includes 140k QA pairs in Hindi, Bengali, Marathi, Tamil, Telugu, Oriya, Punjabi, and Gujarati GenVRadmin/Samvaad-Mixed-Language-2

You can see the previous part at https://huggingface.co/posts/osanseviero/674644082063278
  • 1 reply
ยท