74 78 235

David Berenstein

davidberenstein1957

AI & ML interests

Everything data

Recent Activity

liked a model 1 day ago

vikhyatk/moondream2

liked a Space 1 day ago

dslim/NER

liked a model 2 days ago

HuggingFaceTB/SmolVLM-500M-Instruct

View all activity

Articles

Introducing Observers: AI Observability with Hugging Face datasets through a lightweight SDK

Nov 21, 2024

• 35

How to build a custom text classifier without days of human labeling

Oct 17, 2024

• 55

How to optimize your data labelling project with custom interfaces

Oct 16, 2024

• 18

To what extent are we responsible for our content and how to create safer Spaces?

Aug 30, 2024

• 4

Data Is Better Together: A Look Back and Forward

Jun 20, 2024

• 19

Organizations

davidberenstein1957's activity

upvoted an article 3 days ago

Article

Mastering Long Contexts in LLMs with KVPress

•

3 days ago

• 49

upvoted a collection 5 days ago

Follow The Money

Collection

https://docs.google.com/presentation/d/1heWC_K_vqWmK5W4Un1aK_wY-aywmjmp6di6vPAn3bns/edit?usp=sharing • 4 items • Updated 4 days ago • 1

upvoted 2 articles 5 days ago

Article

Yay! Organizations can now publish blog Articles

•

5 days ago

• 29

Article

Fine-tune ModernBERT for RAG with Synthetic Data

•

5 days ago

• 28

upvoted an article 9 days ago

Article

Gradio spaces are the perfect agent tools\!

•

9 days ago

• 12

upvoted a paper 9 days ago

Towards Best Practices for Open Datasets for LLM Training

Paper • 2501.08365 • Published 11 days ago • 47

upvoted an article 10 days ago

Article

Train 400x faster Static Embedding Models with Sentence Transformers

11 days ago

• 121

upvoted an article 13 days ago

Article

Mastering Tensor Dimensions in Transformers

•

13 days ago

• 39

upvoted an article 15 days ago

Article

Beyond Image Preferences - Rich Human Feedback for Text-to-Image Generation

•

16 days ago

• 13

upvoted an article 18 days ago

Article

Crowd-sourced Open Preference Dataset for Text-to-Image Generation

•

18 days ago

• 18

upvoted an article 22 days ago

Article

Fine-tune a SmolLM on domain-specific synthetic data from a LLM

•

23 days ago

• 31

upvoted an article 24 days ago

Article

Fine-tune ModernBERT for text classification using synthetic data

•

27 days ago

• 26

upvoted a collection 28 days ago

QVQ-72B-Preview

Collection

5 items • Updated Dec 24, 2024 • 7

upvoted 5 collections about 1 month ago

SmolLM2

Collection

State-of-the-art compact LLMs for on-device applications: 1.7B, 360M, 135M • 15 items • Updated Dec 22, 2024 • 207

Synthetic Data Generator

Collection

A collection of tools and datasets related to no-code the Synthetic Data Generation. • 19 items • Updated 5 days ago • 7

Smol but mighty

Collection

A collection of smoll but mighty models • 10 items • Updated 4 days ago • 4

Gradio WebRTC Cookbook ⚡️

Collection

Collection of real-time voice and video demos built with gradio-webrtc custom component • 8 items • Updated Dec 10, 2024 • 17

Lora Land - 27 High-Quality LoRA Adapters

Collection

27 Fine-tuned LoRA Adapters using Mistral-7B. Try them here: https://predibase.com/lora-land • 27 items • Updated Apr 26, 2024 • 4

upvoted a paper about 2 months ago

Self-Instruct: Aligning Language Model with Self Generated Instructions

Paper • 2212.10560 • Published Dec 20, 2022 • 9

upvoted an article about 2 months ago

Article

🐺🐦‍⬛ LLM Comparison/Test: 25 SOTA LLMs (including QwQ) through 59 MMLU-Pro CS benchmark runs

•

Dec 4, 2024

• 76

David Berenstein

AI & ML interests

Recent Activity

Articles

Fine-tune ModernBERT for RAG with Synthetic Data

Fine-tune a SmolLM on domain-specific synthetic data from a LLM

Fine-tune ModernBERT for text classification using synthetic data

Introducing the Synthetic Data Generator - Build Datasets with Natural Language

Open Preference Dataset for Text-to-Image Generation by the 🤗 Community

Let’s make a generation of amazing image generation models

Introducing Observers: AI Observability with Hugging Face datasets through a lightweight SDK

How to build a custom text classifier without days of human labeling

How to optimize your data labelling project with custom interfaces

To what extent are we responsible for our content and how to create safer Spaces?

Data Is Better Together: A Look Back and Forward

Organizations

davidberenstein1957's activity

Mastering Long Contexts in LLMs with KVPress

Yay! Organizations can now publish blog Articles

Fine-tune ModernBERT for RAG with Synthetic Data

Gradio spaces are the perfect agent tools\!

Train 400x faster Static Embedding Models with Sentence Transformers

Mastering Tensor Dimensions in Transformers

Beyond Image Preferences - Rich Human Feedback for Text-to-Image Generation

Crowd-sourced Open Preference Dataset for Text-to-Image Generation

Fine-tune a SmolLM on domain-specific synthetic data from a LLM

Fine-tune ModernBERT for text classification using synthetic data

🐺🐦‍⬛ LLM Comparison/Test: 25 SOTA LLMs (including QwQ) through 59 MMLU-Pro CS benchmark runs