AI & ML interests

computer-vision, image-processing, machine-learning, deep-learning

kornia's activity

prithivMLmods 
posted an update 3 days ago
view post
Post
2797
Loaded some domain-specific downstream image classification content moderation models, which is essentially the practice of monitoring and filtering user-generated content on platforms, based on SigLIP-2 Base Patch16 with newly initialized trainable parameters. 🥠

+ Age-Classification-SigLIP2 : prithivMLmods/Age-Classification-SigLIP2
[ Age range classification from 0 to 65+ years ]
+ Facial-Emotion-Detection-SigLIP2 : prithivMLmods/Facial-Emotion-Detection-SigLIP2
[ Designed to classify different facial emotions ]
+ Hand-Gesture-2-Robot : prithivMLmods/Hand-Gesture-2-Robot
[ Human Hand Gesture Classification for Robot Control ]
+ Mature-Content-Detection : prithivMLmods/Mature-Content-Detection
[ Mature [adult] or neutral content categories ]
+ Human-Action-Recognition : prithivMLmods/Human-Action-Recognition
[ Human actions including clapping, sitting, running, and more ]
+ Mirage-Photo-Classifier : prithivMLmods/Mirage-Photo-Classifier
[ Whether an image is real or AI-generated (fake) ]
+ Food-101-93M : prithivMLmods/Food-101-93M
[ Classify food images into one of 101 popular dishes ]
+ Hand-Gesture-19 : prithivMLmods/Hand-Gesture-19
[ Classify hand gesture images into different categories ]
+ Trash-Net : prithivMLmods/Trash-Net
[ Classification of trash into six distinct categories ]
+ Gender-Classifier-Mini : prithivMLmods/Gender-Classifier-Mini
[ Classify images based on gender [Male / Female] ]

🎡Collections :

+ SigLIP2 Content Filters : prithivMLmods/siglip2-content-filters-models-67f001055ec2bed56ca41f6d
+ SigLIP2 Content Filters Datasets [ Deepfake ] : prithivMLmods/siglip2-content-filters-datasets-67ef86ff9c92afd92e0747ed
AtAndDev 
posted an update 3 days ago
view post
Post
2786
Llama 4 is out...
  • 2 replies
·
prithivMLmods 
posted an update 4 days ago
view post
Post
2039
ChatGPT-4o’s image generation goes wild for a week—featuring everything from Studio Ghibli-style art and image colorization to style intermixing. Here are some examples showcasing the generation of highly detailed images from freestyle design templates. Want to know more? Check out the blog 🚀

🔗Blog : https://huggingface.co/blog/prithivMLmods/chatgpt-4o-image-gen
ZennyKenny 
posted an update 9 days ago
view post
Post
2100
A few new Russian-language synthetic datasets. The labelling is good, but some of the syntax and grammar is not great.

Great for Russian-language classification models, probably not great for fine-tuning Russian-langauge text generation.

- Virtual Assistant Query / Responses: ZennyKenny/ru_virtual_assistant_chatgpt_distill
- LLM Query / Responses: ZennyKenny/russian_llm_response_chatgpt_distill

Crazy how much language drift is still an issue, especially given that Russian constitutes nearly 5% of the content on the internet.
awacke1 
posted an update 9 days ago
view post
Post
1400
AI Vision & SFT Titans 🌟 Turns PDFs into text, snaps pics, and births AI art.

awacke1/TorchTransformers-Diffusion-CV-SFT

1. OCR a grocery list or train a titan while sipping coffee? ☕
2. Camera Snap 📷: Capture life’s chaos—your cat’s face or that weird receipt. Proof you’re a spy!
3. OCR 🔍: PDFs beg for mercy as GPT-4o extracts text.
4. Image Gen 🎨: Prompt “neon superhero me”
5. PDF 📄: Double-page OCR Single-page sniping

Build Titans 🌱: Train tiny AI models. 💪Characters🧑‍🎨: Craft quirky heroes.
🎥

prithivMLmods 
posted an update 11 days ago
view post
Post
1835
Luna, the single-speaker text-to-speech model, features a Radio & Atcosim-style sound with a female voice. It offers authentic radio podcast noise and empathetic speech generation, fine-tuned based on Orpheus's Llama-based speech generation state-of-the-art model. 🎙️

+ Model : prithivMLmods/Llama-3B-Mono-Luna
+ Collection : prithivMLmods/clean-radio-mono-voice-67e76fe1b3a87cc3bccef803
+ Reference ft : https://github.com/canopyai/Orpheus-TTS
+ Base Model : canopylabs/orpheus-3b-0.1-ft

I also tried some other clean-voice single-speaker models based on Orpheus. If you're interested, check out the collection.

🔉Try the Mono Luna demo here: http://colab.research.google.com/drive/1K0AAIOKDE5XE0znxXaiiUJvPSpFveteK
·
ZennyKenny 
posted an update 14 days ago
view post
Post
1926
Besides being the coolest named benchmark in the game, HellaSwag is an important measurement of здравый смысль (or common sense) in LLMs.

- More on HellaSwag: https://github.com/rowanz/hellaswag

I spent the afternoon benchmarking YandexGPT Pro 4th Gen, one of the Russian tech giant's premier models.

- Yandex HF Org: yandex
- More on Yandex models: https://yandex.cloud/ru/docs/foundation-models/concepts/yandexgpt/models

The eval notebook is available on GitHub and the resulting dataset is already on the HF Hub!

- Eval Notebook: https://github.com/kghamilton89/ai-explorer/blob/main/yandex-hellaswag/hellaswag-assess.ipynb
- Eval Dataset: ZennyKenny/yandexgptpro_4th_gen-hellaswag

And of course, everyone wants to see the results so have a look at the results in the context of other zero-shot experiments that I was able to find!
  • 2 replies
·
prithivMLmods 
posted an update 14 days ago
view post
Post
1686
Dropping some new Journey Art and Realism adapters for Flux.1-Dev, including Thematic Arts, 2021 Memory Adapters, Thread of Art, Black of Art, and more. For more details, visit the model card on Stranger Zone HF 🤗

+ Black-of-Art-Flux : strangerzonehf/Black-of-Art-Flux
+ Thread-of-Art-Flux : strangerzonehf/Thread-of-Art-Flux
+ 2021-Art-Flux : strangerzonehf/2021-Art-Flux
+ 3d-Station-Toon : strangerzonehf/3d-Station-Toon
+ New-Journey-Art-Flux : strangerzonehf/New-Journey-Art-Flux
+ Casual-Pencil-Pro : strangerzonehf/Casual-Pencil-Pro
+ Realism-H6-Flux : strangerzonehf/Realism-H6-Flux

- Repository Page : strangerzonehf

The best dimensions and inference settings for optimal results are as follows: A resolution of 1280 x 832 with a 3:2 aspect ratio is recommended for the best quality, while 1024 x 1024 with a 1:1 aspect ratio serves as the default option. For inference, the recommended number of steps ranges between 30 and 35 to achieve optimal output.
  • 1 reply
·
prithivMLmods 
posted an update 16 days ago
view post
Post
2596
Dropping Downstream tasks using newly initialized parameters and weights ([classifier.bias & weights]) support domain-specific 𝗶𝗺𝗮𝗴𝗲 𝗰𝗹𝗮𝘀𝘀𝗶𝗳𝗶𝗰𝗮𝘁𝗶𝗼𝗻. Based on siglip2-base-patch16-224 and DomainNet (single-domain, multi-source adaptation), with Fashion-MNIST & More for experimental testing. 🧤☄️

Fashion-Mnist : prithivMLmods/Fashion-Mnist-SigLIP2
Mnist-Digits : prithivMLmods/Mnist-Digits-SigLIP2
Multisource-121 : prithivMLmods/Multisource-121-DomainNet
Painting-126 : prithivMLmods/Painting-126-DomainNet
Sketch-126 : prithivMLmods/Sketch-126-DomainNet
Clipart-126 : prithivMLmods/Clipart-126-DomainNet

Models are trained with different parameter settings for experimental purposes only, with the intent of further development. Refer to the model page below for instructions on running it with Transformers 🤗.

Collection : prithivMLmods/domainnet-0324-67e0e3c934c03cc40c6c8782

Citations : SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features https://arxiv.org/pdf/2502.14786 & Moment Matching for Multi-Source Domain Adaptation : https://arxiv.org/pdf/1812.01754

merve 
posted an update 19 days ago
view post
Post
3830
So many open releases at Hugging Face past week 🤯 recapping all here ⤵️ merve/march-21-releases-67dbe10e185f199e656140ae

👀 Multimodal
> Mistral AI released a 24B vision LM, both base and instruction FT versions, sota 🔥 (OS)
> with IBM we released SmolDocling, a sota 256M document parser with Apache 2.0 license (OS)
> SpatialLM is a new vision LM that outputs 3D bounding boxes, comes with 0.5B (QwenVL based) and 1B (Llama based) variants
> SkyWork released SkyWork-R1V-38B, new vision reasoning model (OS)

💬 LLMs
> NVIDIA released new Nemotron models in 49B and 8B with their post-training dataset
> LG released EXAONE, new reasoning models in 2.4B, 7.8B and 32B
> Dataset: Glaive AI released a new reasoning dataset of 22M+ examples
> Dataset: NVIDIA released new helpfulness dataset HelpSteer3
> Dataset: OpenManusRL is a new agent dataset based on ReAct framework (OS)
> Open-R1 team released OlympicCoder, new competitive coder model in 7B and 32B
> Dataset: GeneralThought-430K is a new reasoning dataset (OS)

🖼️ Image Generation/Computer Vision
> Roboflow released RF-DETR, new real-time sota object detector (OS) 🔥
> YOLOE is a new real-time zero-shot object detector with text and visual prompts 🥹
> Stability AI released Stable Virtual Camera, a new novel view synthesis model
> Tencent released Hunyuan3D-2mini, new small and fast 3D asset generation model
> ByteDance released InfiniteYou, new realistic photo generation model
> StarVector is a new 8B model that generates svg from images
> FlexWorld is a new model that expands 3D views (OS)

🎤 Audio
> Sesame released CSM-1B new speech generation model (OS)

🤖 Robotics
> NVIDIA released GR00T, new robotics model for generalized reasoning and skills, along with the dataset

*OS ones have Apache 2.0 or MIT license
prithivMLmods 
posted an update 20 days ago
view post
Post
2275
Play with Orpheus TTS, a Llama-based Speech-LLM designed for high-quality, empathetic text-to-speech generation. This model has been fine-tuned to deliver human-level speech synthesis 🔥🗣️

👉GitHub [ Demo ] : https://github.com/PRITHIVSAKTHIUR/Orpheus-TTS-Edge

Demo supporting both text-to-speech and text-to-llm responses in speech.

> voice: tara, dan, emma, josh
> emotion: <laugh>, <chuckle>, <sigh>, <cough>, <sniffle>, <groan>, <yawn>, <gasp>.

🥠Orpheus-3b-0.1-ft
Model Page: canopylabs/orpheus-3b-0.1-ft

🥠Orpheus-3b-0.1-ft
Colab Inference Notebook: https://colab.research.google.com/drive/1KhXT56UePPUHhqitJNUxq63k-pQomz3N?usp=sharing

🥠Finetune [ orpheus-3b-0.1-pretrained ]
Resource: https://github.com/canopyai/Orpheus-TTS/tree/main/finetune

🥠Model-releases:
https://canopylabs.ai/model-releases
  • 1 reply
·
AtAndDev 
posted an update 24 days ago
view post
Post
4190
There seems to multiple paid apps shared here that are based on models on hf, but some ppl sell their wrappers as "products" and promote them here. For a long time, hf was the best and only platform to do oss model stuff but with the recent AI website builders anyone can create a product (really crappy ones btw) and try to sell it with no contribution to oss stuff. Please dont do this, or try finetuning the models you use...
Sorry for filling yall feed with this bs but yk...
  • 6 replies
·
prithivMLmods 
posted an update 26 days ago
view post
Post
948
Hey Guys! One Small Announcement 🤗
Stranger Zone now accepts LoRA requests!

✍️Request : https://huggingface.co/spaces/strangerzonehf/Request-LoRA [ or ] https://huggingface.co/spaces/strangerzonehf/Request-LoRA/discussions/1

Page : strangerzonehf

Describe the artistic properties by posting sample images or links to similar images in the request discussion. If the adapters you're asking for are truly creative and safe for work, I'll train and upload the LoRA to the Stranger Zone repo!

Thank you!
AtAndDev 
posted an update 28 days ago
view post
Post
1594
Gemma 3 seems to be really good at human preference. Just waiting for ppl to see it.
prithivMLmods 
posted an update 28 days ago
view post
Post
2497
Gemma-3-4B : Image and Video Inference 🖼️🎥

🧤Space: prithivMLmods/Gemma-3-Multimodal
🥠Git : https://github.com/PRITHIVSAKTHIUR/Gemma-3-Multimodal

@gemma3 : {Tag + Space_+ 'prompt'}
@video-infer : {Tag + Space_+ 'prompt'}

+ Gemma3-4B : google/gemma-3-4b-it
+ By default, it runs : prithivMLmods/Qwen2-VL-OCR-2B-Instruct

Gemma 3 Technical Report : https://storage.googleapis.com/deepmind-media/gemma/Gemma3Report.pdf
  • 1 reply
·
not-lain 
posted an update 28 days ago
prithivMLmods 
posted an update 29 days ago
awacke1 
posted an update 30 days ago
view post
Post
2230
I introduce MIT license

ML Model Specialize Fine Tuner app "SFT Tiny Titans" 🚀

Demo video with source.

Download, train, SFT, and test your models, easy as 1-2-3!
URL: awacke1/TorchTransformers-NLP-CV-SFT
  • 2 replies
·
ZennyKenny 
posted an update about 1 month ago
view post
Post
523
It took me a while, but I've finally got it working: ZennyKenny/note-to-text

Using a Meta LLaMa checkpoint from Unsloth and some help from the HF community, you can capture handwritten notes and convert them into digital format in just a few second.

Really exciting times for AI builders on Hugging Face.
  • 2 replies
·
prithivMLmods 
posted an update about 1 month ago