nyuuzyou's picture

nyuuzyou PRO

nyuuzyou

AI & ML interests

None yet

Recent Activity

Organizations

Social Post Explorers's profile picture AI Starter Pack's profile picture

nyuuzyou's activity

reacted to AdinaY's post with 🔥 about 17 hours ago
posted an update about 17 hours ago
view post
Post
371
✈️ Thanks for the interest shown in the FlightAware Photos dataset ( nyuuzyou/flightaware). Seeing its potential, I'm working on expanding it to over 1 million images soon.

---

🎨 Introducing the PaintBerri Hand-Drawn Art Dataset - nyuuzyou/paintberri

A collection of 68,860 digital hand-drawn artworks featuring:

Unique images sourced directly from the paintberri.com online art community.
Rich metadata including creator-provided titles, descriptions, and timestamps.
Image dimensions, thumbnail URLs, and NSFW content flags.
Creator IDs (where available) and unique short identifiers for each piece.

This dataset offers a distinct visual archive capturing diverse styles and subjects from an active online drawing community, suitable for image classification and image-to-text tasks. Opt-out is available for creators wishing to remove their work.
reacted to abidlabs's post with ❤️❤️ about 17 hours ago
view post
Post
837
JOURNEY TO 1 MILLION DEVELOPERS

5 years ago, we launched Gradio as a simple Python library to let researchers at Stanford easily demo computer vision models with a web interface.

Today, Gradio is used by >1 million developers each month to build and share AI web apps. This includes some of the most popular open-source projects of all time, like Automatic1111, Fooocus, Oobabooga’s Text WebUI, Dall-E Mini, and LLaMA-Factory.

How did we get here? How did Gradio keep growing in the very crowded field of open-source Python libraries? I get this question a lot from folks who are building their own open-source libraries. This post distills some of the lessons that I have learned over the past few years:

1. Invest in good primitives, not high-level abstractions
2. Embed virality directly into your library
3. Focus on a (growing) niche
4. Your only roadmap should be rapid iteration
5. Maximize ways users can consume your library's outputs

1. Invest in good primitives, not high-level abstractions

When we first launched Gradio, we offered only one high-level class (gr.Interface), which created a complete web app from a single Python function. We quickly realized that developers wanted to create other kinds of apps (e.g. multi-step workflows, chatbots, streaming applications), but as we started listing out the apps users wanted to build, we realized what we needed to do:

Read the rest here: https://x.com/abidlabs/status/1907886
reacted to clem's post with 🔥 3 days ago
view post
Post
3790
Before 2020, most of the AI field was open and collaborative. For me, that was the key factor that accelerated scientific progress and made the impossible possible—just look at the “T” in ChatGPT, which comes from the Transformer architecture openly shared by Google.

Then came the myth that AI was too dangerous to share, and companies started optimizing for short-term revenue. That led many major AI labs and researchers to stop sharing and collaborating.

With OAI and sama now saying they're willing to share open weights again, we have a real chance to return to a golden age of AI progress and democratization—powered by openness and collaboration, in the US and around the world.

This is incredibly exciting. Let’s go, open science and open-source AI!
·
posted an update 5 days ago
view post
Post
1394
✈️ FlightAware Photos Dataset - nyuuzyou/flightaware

Collection of approximately 197,718 aviation photographs featuring:
- High-quality aircraft images across multiple sizes and formats
- Comprehensive metadata including aircraft registrations, types, and photographer information
- View counts, ratings, and submission timestamps for each photo
- Rich classification data preserving original titles, descriptions, and photographer badges

This dataset offers a unique visual archive of aircraft spanning commercial, military, and private aviation captured by FlightAware's community of photographers under CC BY-NC-SA 3.0 license.
replied to their post 6 days ago
view reply

A lot of popular repositories from major companies haven't gotten Xet support yet, so we just have to wait and see

reacted to Keltezaa's post with 🔥 7 days ago
view post
Post
2149
Dear HF Staff and pro Users.

Why did you remove the "Regen" feature from the ZeroGPU feature?
Is this an error or intended?

I am now limited to 13 images per 24 hrs. Using my space.
When I upgraded to Pro, it was exclusively for the 5x more usage and the faster regen.

The reason I spend my hard earned money on your site was for this feature.
This is totally unacceptable.

########
Other Pro Users please reply and tag others
IF YOU AGREE or DISAGREE.
########
@Always-cheating ,@anonymous111110987654321 ,@Arshili @bedspirit @blackedguy @John6666 ,@DavidBaloches @E-07 ,@f-14 @mindfulpeoples @multimodalart
·
reacted to clem's post with 🤗 7 days ago
view post
Post
2340
What's this cool purple banner haha 😶😶😶
·
reacted to JLouisBiz's post with 🤗 8 days ago
view post
Post
1573
I would like to recommend that everyone consider paying $9 for access through Hugging Face; their services provide so many benefits, it's worth both our attention and gratitude.

Click and go:
https://huggingface.co/subscribe/pro
posted an update 8 days ago
posted an update 9 days ago
view post
Post
1278
📚 Archive of Our Own (AO3) Dataset - nyuuzyou/archiveofourown

Collection of approximately 12.6 million fanfiction works (from 63.2M processed IDs) featuring:
- Full text content from diverse fandoms across television, film, books, anime, and more
- Comprehensive metadata including warnings, relationships, characters, and tags
- Multilingual content with works in 40+ languages though English predominant
- Rich classification data preserving author-created folksonomy and content categorization

P.S. This is the most expensive dataset I've created so far! And also, thank you all for the 100 followers on Hugging Face!
posted an update 11 days ago
view post
Post
2751
I am planning to release *something big* this week, but in the meantime I was bored, so I quickly made a small dataset in as-is format.

📱 Sponsr.ru Dataset - nyuuzyou/sponsr

Collection of 44,138 posts from Sponsr.ru, a Russian content subscription platform featuring:
- Comprehensive metadata including project details, post information, and pricing
- Detailed content categorization with images, videos, and text formats
- Monolingual Russian content from diverse creator projects
reacted to OFT's post with 😔 12 days ago
view post
Post
2623
Today I decided to cancel my PRO subscription for Hugging Face. I had a lot of fun with it but with the current changes to API and allowed limits I think it isn't worth it anymore. So I just turned everything off and cancelled my subscription. It feels like one of these movies scenes where you see an old computerlab and someone putting big white sheets over it and closing the door behind him. I am not going, I am not gone, but watching through the glass window of the door that I just closed.
·
reacted to Quazim0t0's post with 🤗 20 days ago
view post
Post
1115
Thank you to the Open LLM Leaderboard's team for offering it to the community for as long as they did. I only recently joined HF, and it provided a lot of incentive and information to make better models.

Always will remember getting to #112 :D

Anyone have a solid way to test my models privately? Please let me know!

  • 1 reply
·
reacted to BrigitteTousi's post with 🚀 23 days ago
reacted to JingzeShi's post with 🚀 26 days ago
posted an update 26 days ago
view post
Post
2262
🐴 Fimfiction.net Writings Dataset - nyuuzyou/fimfiction

Collection of 815,740+ stories from Fimfiction.net featuring:
- Full story content from diverse fanfiction authors across the platform
- Complete metadata including titles, unique identifiers, and publication details
- Rich structural information preserving story formatting and author notes
- English-language content with diverse writing styles and narrative approaches
  • 1 reply
·
reacted to clem's post with ❤️ 26 days ago
posted an update about 1 month ago
view post
Post
563
🌐 Public MediaWiki Collection Dataset - nyuuzyou/wikis

Collection of 1.66M+ articles from 930 public MediaWiki instances featuring:

- Full article content from diverse public wikis across the internet
- Complete metadata including templates, categories, and section structure
- Rich structural information preserving wiki organization and links
- Multilingual content across 35+ languages including English, Chinese, Spanish, and more
- Regional language variants including US/UK English, Brazilian Portuguese, and Traditional/Simplified Chinese

Key contents:
- 1,662,448 wiki articles with full text
- Extensive metadata including templates, categories, sections
- Internal wikilinks and external reference information
- Cross-domain knowledge spanning multiple topics and fields