David Berenstein's picture

David Berenstein

davidberenstein1957

AI & ML interests

Everything data

Recent Activity

liked a model 1 day ago
vikhyatk/moondream2
liked a Space 1 day ago
dslim/NER
liked a model 2 days ago
HuggingFaceTB/SmolVLM-500M-Instruct
View all activity

Articles

Organizations

Hugging Face's profile picture SomosNLP's profile picture Tools's profile picture Webhooks Explorers (BETA)'s profile picture Argilla's profile picture Blog-explorers's profile picture Hugging Face TB Research's profile picture Argilla Explorers's profile picture distilabel-internal-testing's profile picture Data Is Better Together's profile picture Social Post Explorers's profile picture argilla-internal-testing's profile picture Dataset Viber's profile picture Argilla Warehouse's profile picture Dataset Tools's profile picture Uplimit's profile picture Data Is Better Together Contributor's profile picture FeeL (Feedback Loop)'s profile picture AI Blueprint's profile picture

davidberenstein1957's activity

posted an update 5 days ago
reacted to AdinaY's post with πŸš€ 5 days ago
view post
Post
2547
What happened yesterday in the Chinese AI community? πŸš€

T2A-01-HD πŸ‘‰ https://hailuo.ai/audio
MiniMax's Text-to-Audio model, now in Hailuo AI, offers 300+ voices in 17+ languages and instant emotional voice cloning.

Tare πŸ‘‰ https://www.trae.ai/
A new coding tool by Bytedance for professional developers, supporting English & Chinese with free access to Claude 3.5 and GPT-4 for a limited time.

DeepSeek-R1 Series πŸ‘‰ deepseek-ai/deepseek-r1-678e1e131c0169c0bc89728d
Open-source reasoning models with MIT license by DeepSeek.

Kimi K 1.5 πŸ‘‰ https://github.com/MoonshotAI/Kimi-k1.5 | https://kimi.ai/
An O1-level multi-modal model by MoonShot AI, utilizing reinforcement learning with long and short-chain-of-thought and supporting up to 128k tokens.

And today…

Hunyuan 3D-2.0 πŸ‘‰ tencent/Hunyuan3D-2
A SoTA 3D synthesis system for high-res textured assets by Tencent Hunyuan , with open weights and code!

Stay tuned for more updates πŸ‘‰ https://huggingface.co/zh-ai-community
reacted to fdaudens's post with ❀️ 5 days ago
view post
Post
1776
Reminder: Don’t. Use. ChatGPT. As. A. Calculator. Seriously. πŸ€–

Loved listening to @sasha on Hard Forkβ€”it really made me think.

A few takeaways that hit home:
- Individual culpability only gets you so far. The real priority: demanding accountability and transparency from companies.
- Evaluate if generative AI is the right tool for certain tasks (like search) before using it.

Curious about the full conversation? https://www.nytimes.com/2025/01/17/podcasts/hardfork-tiktok-rednote-environment.html. Give it a listenβ€”it’s worth it! 🌍
  • 1 reply
Β·
reacted to fdaudens's post with πŸ‘ 5 days ago
view post
Post
1381
πŸ” From instruction-following to creative storytelling, dive into 2024's most impactful AI datasets! These gems are shaping everything from scientific research to video understanding.

Check it out: huggingface/open-source-ai-year-in-review-2024
reacted to meg's post with πŸ”₯ 5 days ago
view post
Post
2924
πŸ’«...And we're live!πŸ’« Seasonal newsletter from ethicsy folks at Hugging Face, exploring the ethics of "AI Agents"
https://huggingface.co/blog/ethics-soc-7
Our analyses found:
- There's a spectrum of "agent"-ness
- *Safety* is a key issue, leading to many other value-based concerns
Read for details & what to do next!
With @evijit , @giadap , and @sasha
posted an update 9 days ago
reacted to burtenshaw's post with πŸš€ 9 days ago
reacted to ariG23498's post with πŸš€ 9 days ago
reacted to Tonic's post with πŸ”₯ 10 days ago
view post
Post
1433
πŸ™‹πŸ»β€β™‚οΈ Hey there folks ,

Facebook AI just released JASCO models that make music stems .

you can try it out here : Tonic/audiocraft

hope you like it
reacted to mlabonne's post with πŸ€—πŸ”₯ 10 days ago
view post
Post
3188
πŸ†• LLM Course 2025 edition!

I updated the LLM Scientist roadmap and added a ton of new information and references. It covers training, datasets, evaluation, quantization, and new trends like test-time compute scaling.

The LLM Course has been incredibly popular (41.3k stars!) and I've been touched to receive many, many messages about how it helped people in their careers.

I know how difficult this stuff can be, so I'm super proud of the impact it had. I want to keep updating it in 2025, especially with the LLM Engineer roadmap.

Thanks everyone, hope you'll enjoy it!

πŸ’» LLM Course: https://huggingface.co/blog/mlabonne/llm-course
reacted to burtenshaw's post with πŸš€πŸ”₯ 11 days ago
view post
Post
37524
We’re launching a FREE and CERTIFIED course on Agents!

We're thrilled to announce the launch of the Hugging Face Agents course on Learn! This interactive, certified course will guide you through building and deploying your own AI agents.

Here's what you'll learn:

- Understanding Agents: We'll break down the fundamentals of AI agents, showing you how they use LLMs to perceive their environment (observations), reason about it (thoughts), and take actions. Think of a smart assistant that can book appointments, answer emails, or even write code based on your instructions.
- Building with Frameworks: You'll dive into popular agent frameworks like LangChain, LlamaIndex and smolagents. These tools provide the building blocks for creating complex agent behaviors.
- Real-World Applications: See how agents are used in practice, from automating SQL queries to generating code and summarizing complex documents.
- Certification: Earn a certification by completing the course modules, implementing a use case, and passing a benchmark assessment. This proves your skills in building and deploying AI agents.
Audience

This course is designed for anyone interested in the future of AI. Whether you're a developer, data scientist, or simply curious about AI, this course will equip you with the knowledge and skills to build your own intelligent agents.

Enroll today and start building the next generation of AI agent applications!

https://bit.ly/hf-learn-agents
Β·
posted an update 12 days ago
replied to davanstrien's post 16 days ago
view reply

Open collaboration is key for democratising AI.

reacted to davanstrien's post with πŸ€β€οΈπŸš€ 16 days ago
view post
Post
2188
The data-is-better-together/fineweb-c dataset is growing!

This week a few more languages have got 1,000 annotations for the educational quality of data from HuggingFaceFW/fineweb-2.

Why should you care?

The quality of pre-training data can have a big impact on the performance of downstream language models trained on that data ( HuggingFaceFW/blogpost-fineweb-v1).

Being able to filter by educational quality is on way of improving the quality of the data you use for training an LLM. Very importantly this approach can also reduce the amount of data needed for pertaining.

Why not use an LLM?

LLMs can be used to annotate educational quality for a subset of data. This data can then be used to train a smaller encoder only model to label the full dataset. However, this may not work well for languages outside of english. This is where fineweb-c (community) comes in.

The community is annotating the educational quality of fineweb2 data. Currently 114 languages have some annotations. These annotations will enable a number of things:

- Evaluate whether an LLM can label the educational quality for texts in that language well
- Directly be used for training quality classifiers
- Help discover other rules and huerisitcs for refining fineweb2 further for different languages.

This week the following languages where done:

Swedish thanks to: @Lauler @AntonVic @ohallstrom @bjarlestam @menbom @Ekgren @apsod

Ukrainian thanks to: @hannayukhymenko @robinhad @realPivo @RabotiahovDmytro @reciprocate

Assamese thanks to: @moyoor97 @Arpanjyoti @nawaf-helmi123 @pahigogoi1 @aelhence @kishorekashyap

Want to learn more: https://huggingface.co/blog/davanstrien/fineweb2-community

Contribute yourself here: data-is-better-together/fineweb-c
  • 1 reply
Β·
posted an update 22 days ago
posted an update 27 days ago