Data Is Better Together Contributor

community
Activity Feed

AI & ML interests

None defined yet.

Recent Activity

data-is-better-together-contributor's activity

sayakpaul 
posted an update 1 day ago
davanstrien 
posted an update 5 days ago
view post
Post
1503
Introducing FineWeb-C 🌐🎓, a community-built dataset for improving language models in ALL languages.

Inspired by FineWeb-Edu the community is labelling the educational quality of texts for many languages.

318 annotators, 32K+ annotations, 12 languages - and growing! 🌍

data-is-better-together/fineweb-c
fdaudens 
posted an update 5 days ago
view post
Post
1129
🔍 From instruction-following to creative storytelling, dive into 2024's most impactful AI datasets! These gems are shaping everything from scientific research to video understanding.

Check it out: huggingface/open-source-ai-year-in-review-2024
prithivMLmods 
posted an update 6 days ago
view post
Post
2031
Qwen2VL Models: Vision and Language Processing 🍉

📍FT; [ Latex OCR, Math Parsing, Text Analogy OCRTest ]

❄️Demo : prithivMLmods/Qwen2-VL-2B . The demo includes the Qwen2VL 2B Base Model.

🎯The space handles documenting content from the input image along with standardized plain text. It includes adjustment tools with over 30 font styles, file formatting support for PDF and DOCX, textual alignments, font size adjustments, and line spacing modifications.

📄PDFs are rendered using the ReportLab software library toolkit.

🧵Models :
+ prithivMLmods/Qwen2-VL-OCR-2B-Instruct
+ prithivMLmods/Qwen2-VL-Ocrtest-2B-Instruct
+ prithivMLmods/Qwen2-VL-Math-Prase-2B-Instruct

🚀Sample Document :
+ https://drive.google.com/file/d/1Hfqqzq4Xc-3eTjbz-jcQY84V5E1YM71E/view?usp=sharing

📦Collection :
+ prithivMLmods/vision-language-models-67639f790e806e1f9799979f

.
.
.
@prithivMLmods 🤗
  • 1 reply
·
burtenshaw 
posted an update 6 days ago
view post
Post
2534
People are flexing their end of year stats, so I made this app to show hub stats in a tidy design!

Thanks @Ameeeee and @jfcalvo for the feature from Argilla!
burtenshaw/recap
  • 1 reply
·
davidberenstein1957 
posted an update 6 days ago
prithivMLmods 
posted an update 7 days ago
view post
Post
3165
🎄 Here Before - Xmas🎅✨

🧑🏻‍🎄Models
+ [ Xmas 2D Illustration ] : strangerzonehf/Flux-Xmas-Illustration-LoRA
+ [ Xmas 3D Art ] : strangerzonehf/Flux-Xmas-3D-LoRA
+ [ Xmas Chocolate ] : strangerzonehf/Flux-Xmas-Chocolate-LoRA
+ [ Xmas Isometric Kit ] : strangerzonehf/Flux-Xmas-Isometric-Kit-LoRA
+ [ Xmas Realpix ] : strangerzonehf/Flux-Xmas-Realpix-LoRA
+ [ Xmas Anime ] : strangerzonehf/Flux-Anime-Xmas-LoRA

❄️Collections
+ [ Xmas Art ] : strangerzonehf/christmas-pack-6758b199487adafaddb68f82
+ [ Stranger Zone Collection ] : prithivMLmods/stranger-zone-collections-org-6737118adcf2cb40d66d0c7e

🥶Page
+ [ Stranger Zone ] : https://huggingface.co/strangerzonehf


.
.
.
@prithivMLmods 🤗
fdaudens 
posted an update 7 days ago
view post
Post
1130
🤝 Want to share your AI models while protecting your work? Licenses are key!

Fascinating to see that nearly 60% of models on the Hub use Apache & MIT licenses.

Explore the viz here: huggingface/open-source-ai-year-in-review-2024
AtAndDev 
posted an update 7 days ago
view post
Post
298
@s3nh Hey man check your discord! Got some news.
  • 4 replies
·
sayakpaul 
posted an update 7 days ago
view post
Post
1537
In the past seven days, the Diffusers team has shipped:

1. Two new video models
2. One new image model
3. Two new quantization backends
4. Three new fine-tuning scripts
5. Multiple fixes and library QoL improvements

Coffee on me if someone can guess 1 - 4 correctly.
  • 1 reply
·
fdaudens 
posted an update 8 days ago
view post
Post
1260
Did a fun experiment: What are the main themes emerging from the 100+ Nieman Journalism Lab predictions for 2025?

I used natural language processing to cluster and map them — really helps spot patterns that weren't obvious when reading predictions one by one. So what will shape journalism next year? A lot of AI and US politics (surprise!), but there's also this horizontal axis that spans from industry strategies to deep reflections on how to talk to the public.

Click any dot to explore the original prediction. What themes surprise/interest you the most?

👉 fdaudens/nieman_lab_2025_predictions_visualization

P.s.: I discovered that Nieman Lab's content is under Creative Commons license!
nataliaElv 
posted an update 8 days ago
view post
Post
1596
If you are still wondering how the FineWeb2 annotations are done, how to follow the guidelines or how Argilla works, this is your video!

I go through a few samples of the FineWeb2 dataset and classify them based on their educational content. Check it out!

https://www.youtube.com/watch?v=_-ORB4WAVGU
davidberenstein1957 
posted an update 9 days ago
view post
Post
4100
Introducing the Synthetic Data Generator, a user-friendly application that takes a no-code approach to creating custom datasets with Large Language Models (LLMs). The best part: A simple step-by-step process, making dataset creation a non-technical breeze, allowing anyone to create datasets and models in minutes and without any code.

Blog: https://huggingface.co/blog/synthetic-data-generator
Space: argilla/synthetic-data-generator
·
fdaudens 
posted an update 10 days ago
prithivMLmods 
posted an update 11 days ago
alielfilali01 
posted an update 12 days ago
view post
Post
3297
Unpopular opinion: Open Source takes courage to do !

Not everyone is brave enough to release what they have done (the way they've done it) to the wild to be judged !
It really requires a high level of "knowing wth are you doing" ! It's kind of a super power !

Cheers to the heroes here who see this!
·
fdaudens 
posted an update 13 days ago