62 5 22

nyuuzyou PRO

nyuuzyou

https://ducks.party/donate

AI & ML interests

None yet

Recent Activity

new activity about 20 hours ago

nyuuzyou/paintberri:🚩 Report: Copyright infringement

reacted to AdinaY's post with 🔥 1 day ago

Dolphin 🐬 an open ASR model released by DataOceanAI, one of the biggest AI data provider in China 🔥 ✨ Supports 40 Eastern languages & 22 Chinese dialects ✨ Apache2.0 ✨ With 21.2M hours of data (7.4M open data) Model: https://huggingface.co/DataoceanAI/dolphin-base https://huggingface.co/DataoceanAI/dolphin-small Paper: https://huggingface.co/papers/2503.20212

posted an update 1 day ago

✈️ Thanks for the interest shown in the FlightAware Photos dataset (https://huggingface.co/datasets/nyuuzyou/flightaware). Seeing its potential, I'm working on expanding it to over 1 million images soon. --- 🎨 Introducing the PaintBerri Hand-Drawn Art Dataset - https://huggingface.co/datasets/nyuuzyou/paintberri A collection of 68,860 digital hand-drawn artworks featuring: Unique images sourced directly from the paintberri.com online art community. Rich metadata including creator-provided titles, descriptions, and timestamps. Image dimensions, thumbnail URLs, and NSFW content flags. Creator IDs (where available) and unique short identifiers for each piece. This dataset offers a distinct visual archive capturing diverse styles and subjects from an active online drawing community, suitable for image classification and image-to-text tasks. Opt-out is available for creators wishing to remove their work.

View all activity

Organizations

nyuuzyou's activity

New activity in nyuuzyou/paintberri about 20 hours ago

🚩 Report: Copyright infringement

#6 opened about 21 hours ago by

no-mad

reacted to AdinaY's post with 🔥 1 day ago

Post

2317

Dolphin 🐬 an open ASR model released by DataOceanAI, one of the biggest AI data provider in China 🔥

✨ Supports 40 Eastern languages & 22 Chinese dialects
✨ Apache2.0
✨ With 21.2M hours of data (7.4M open data)

Model:
DataoceanAI/dolphin-base
DataoceanAI/dolphin-small
Paper:
Dolphin: A Large-Scale Automatic Speech Recognition Model for Eastern Languages (2503.20212)

1 reply

posted an update 1 day ago

Post

496

✈️ Thanks for the interest shown in the FlightAware Photos dataset ( nyuuzyou/flightaware). Seeing its potential, I'm working on expanding it to over 1 million images soon.

---

🎨 Introducing the PaintBerri Hand-Drawn Art Dataset - nyuuzyou/paintberri

A collection of 68,860 digital hand-drawn artworks featuring:

Unique images sourced directly from the paintberri.com online art community.
Rich metadata including creator-provided titles, descriptions, and timestamps.
Image dimensions, thumbnail URLs, and NSFW content flags.
Creator IDs (where available) and unique short identifiers for each piece.

This dataset offers a distinct visual archive capturing diverse styles and subjects from an active online drawing community, suitable for image classification and image-to-text tasks. Opt-out is available for creators wishing to remove their work.

reacted to abidlabs's post with ❤️❤️ 1 day ago

Post

1625

JOURNEY TO 1 MILLION DEVELOPERS

5 years ago, we launched Gradio as a simple Python library to let researchers at Stanford easily demo computer vision models with a web interface.

Today, Gradio is used by >1 million developers each month to build and share AI web apps. This includes some of the most popular open-source projects of all time, like Automatic1111, Fooocus, Oobabooga’s Text WebUI, Dall-E Mini, and LLaMA-Factory.

How did we get here? How did Gradio keep growing in the very crowded field of open-source Python libraries? I get this question a lot from folks who are building their own open-source libraries. This post distills some of the lessons that I have learned over the past few years:

1. Invest in good primitives, not high-level abstractions
2. Embed virality directly into your library
3. Focus on a (growing) niche
4. Your only roadmap should be rapid iteration
5. Maximize ways users can consume your library's outputs

1. Invest in good primitives, not high-level abstractions

When we first launched Gradio, we offered only one high-level class (gr.Interface), which created a complete web app from a single Python function. We quickly realized that developers wanted to create other kinds of apps (e.g. multi-step workflows, chatbots, streaming applications), but as we started listing out the apps users wanted to build, we realized what we needed to do:

Read the rest here: https://x.com/abidlabs/status/1907886

updated a dataset 1 day ago

nyuuzyou/paintberri

Updated 1 day ago • 126 • 3

New activity in nyuuzyou/paintberri 1 day ago

Remove this databank

#1 opened 3 days ago by

tossaway

🚩 Report: Copyright infringement

#2 opened 3 days ago by

tossaway

published a dataset 1 day ago

nyuuzyou/paintberri

Updated 1 day ago • 126 • 3

reacted to clem's post with 🔥 4 days ago

Post

3815

Before 2020, most of the AI field was open and collaborative. For me, that was the key factor that accelerated scientific progress and made the impossible possible—just look at the “T” in ChatGPT, which comes from the Transformer architecture openly shared by Google.

Then came the myth that AI was too dangerous to share, and companies started optimizing for short-term revenue. That led many major AI labs and researchers to stop sharing and collaborating.

With OAI and sama now saying they're willing to share open weights again, we have a real chance to return to a golden age of AI progress and democratization—powered by openness and collaboration, in the US and around the world.

This is incredibly exciting. Let’s go, open science and open-source AI!

5 replies

New activity in nyuuzyou/flightaware 5 days ago

[bot] Conversion to Parquet

#1 opened 5 days ago by

parquet-converter

posted an update 5 days ago

Post

1418

✈️ FlightAware Photos Dataset - nyuuzyou/flightaware

Collection of approximately 197,718 aviation photographs featuring:
- High-quality aircraft images across multiple sizes and formats
- Comprehensive metadata including aircraft registrations, types, and photographer information
- View counts, ratings, and submission timestamps for each photo
- Rich classification data preserving original titles, descriptions, and photographer badges

This dataset offers a unique visual archive of aircraft spanning commercial, military, and private aviation captured by FlightAware's community of photographers under CC BY-NC-SA 3.0 license.