AI & ML interests

vision , multimedia , gradio, accessibility & cool demos

TeamTonic's activity

prithivMLmodsย 
posted an update about 18 hours ago
view post
Post
1003
Dropping an entire collection of Style Intermixing Adapters on StrangerZone HF โ€” including Realism, Anime, Sketch, Texture-Rich 3D Experimentals, Automotive Concept Images, and LoRA models based on Flux.1, SD 3.5 Turbo/Large, Stable Diffusion XL ๐ŸŽจ

โ•ฐโ”ˆโžคCollection :
โžœ sketch : strangerzonehf/sketch-fav-675ba869c7ceaec7e652ee1c
โžœ sketch2 : strangerzonehf/q-series-sketch-678e3503bf3a661758429717
โžœ automotive : strangerzonehf/automotive-3d-675bb31a491d8c264d45d843
โžœ texture 3d : strangerzonehf/flux-3dxl-engine-674833c14a001d5b1fdb5139
โžœ super 3d : strangerzonehf/super-3d-engine-6743231d69f496df97addd2b
โžœ style mix : strangerzonehf/mixer-engine-673582c9c5939d8aa5bf9533
โžœ realism : strangerzonehf/realism-engine-67343495b6daf0fbdb904cc1

โ•ฐโ”ˆโžคThe Entire Collection :
โžœ flux.1 : prithivMLmods/flux-lora-collections-66dd5908be2206cfaa8519be
โžœ flux-ultimate-lora-collection : strangerzonehf/Flux-Ultimate-LoRA-Collection
โžœ sd 3.5 large / turbo : prithivMLmods/sd-35-large-lora-671b39d7bc2e7f71a446b163
โžœ sdxl : prithivMLmods/sdxl-dev-models-667803a6d5ac75b59110e527

โ•ฐโ”ˆโžคPages :
โžœ page 1: strangerzonehf
โžœ page 2: @prithivMLmods
โžœ demo : prithivMLmods/FLUX-LoRA-DLC

.๐Ÿค—
hesamationย 
posted an update about 21 hours ago
view post
Post
816
OpenAI just released a 34-page practical guide to building agents,

Here's 10 things it teaches us:

1โžœ agents are different from workflows: they are complete autonomous systems that perform tasks on your behalf. many applications use LLMs for workflows, but this is not an agent.

2โžœ use them for tricky stuff: complex decision making, dynamic rules, unstructured data

3โžœ core recipe: each agent has three main components: Model (the brain), Tools, Instructions on how to behave

4โžœ choose the right brain: set up evals to get a baseline performance, use a smart model to see what's possible, gradually downgrade the model for cost and speed

5โžœ tools are key: choose well-defined and tested tools. an agent needs tools to retrieve data and context, and take actions.

6โžœ instruction matters A LOT: be super clear telling the agent its goals, steps, and rules. Vague instructions = unpredictable agent. Be explicit.

7โžœ start simple, then scale: often a single agent with several tools is ok. don't jump to complex multi-agent systems immediately.

8โžœ if you use multi-agents: you can have a "manager" agent directing traffic to specialist agents, or have agents hand off tasks to each other.

9โžœ gaurdrails are a MUST: check user input for weird stuff, make sure the agent isn't about to do something risky, filter out private info, block harmful content. Don't let it run wild.

10โžœ build and plan for humans: start small, test, improve. always have a plan for when the agent gets stuck or is about to do something high-risk.

Download: https://t.co/fJaCkgf7ph
ยท
ZennyKennyย 
posted an update 1 day ago
view post
Post
993
Submitted my first dataset for the Reasoning Datasets Competition! ZennyKenny/TRON-dataset-v.1.0

This dataset is designed to post-train Metareasoning agents, or those agents whose job it is to quickly (and importantly, cheaply) reason through whether it makes sense to launch a full reasoning job or simply use a simple completions job.

There's still plenty of time to join the competition! https://www.bespokelabs.ai/blog/reasoning-datasets-competition

Generation notebook (linked in dataset) is open source and pretty well generalized if I don't say so myself, so you can use it to make your own Metareasoning datasets.

Shoutout to @onekq for his inspiring comment on this topic.
Nymboย 
posted an update 2 days ago
view post
Post
422
gen z boss and a o3-mini
gen z boss and a o3-mini
prithivMLmodsย 
posted an update 2 days ago
view post
Post
2402
Try out the demo for Multimodal OCR featuring the implementation of models including RolmOCR and Qwen2VL OCR. The use case showcases image-text-to-text conversion and video understanding support for the RolmOCR model ! ๐Ÿš€

๐Ÿค—Multimodal OCR Space : prithivMLmods/Multimodal-OCR

๐Ÿ“ฆThe models implemented in this Space are:
+ Qwen2VL OCR : prithivMLmods/Qwen2-VL-OCR-2B-Instruct [ or ]
+ Qwen2VL OCR2 : prithivMLmods/Qwen2-VL-OCR2-2B-Instruct
+ RolmOCR : reducto/RolmOCR

Qwen2VL OCR supports only image-text-to-text in the space.
hesamationย 
posted an update 3 days ago
ZennyKennyย 
posted an update 9 days ago
hesamationย 
posted an update 9 days ago
view post
Post
7347
Google published a 69-page whitepaper on Prompt Engineering and its best practices, a must-read if you are using LLMs in production:
> zero-shot, one-shot, few-shot
> system prompting
> chain-of-thought (CoT)
> ReAct

LINK: https://www.kaggle.com/whitepaper-prompt-engineering
> code prompting
> best practices
prithivMLmodsย 
posted an update 12 days ago
view post
Post
3254
Loaded some domain-specific downstream image classification content moderation models, which is essentially the practice of monitoring and filtering user-generated content on platforms, based on SigLIP-2 Base Patch16 with newly initialized trainable parameters. ๐Ÿฅ 

+ Age-Classification-SigLIP2 : prithivMLmods/Age-Classification-SigLIP2
[ Age range classification from 0 to 65+ years ]
+ Facial-Emotion-Detection-SigLIP2 : prithivMLmods/Facial-Emotion-Detection-SigLIP2
[ Designed to classify different facial emotions ]
+ Hand-Gesture-2-Robot : prithivMLmods/Hand-Gesture-2-Robot
[ Human Hand Gesture Classification for Robot Control ]
+ Mature-Content-Detection : prithivMLmods/Mature-Content-Detection
[ Mature [adult] or neutral content categories ]
+ Vit-Mature-Content-Detection : prithivMLmods/Vit-Mature-Content-Detection
[ Mature [adult] or neutral content categories ft. ViT]
+ Human-Action-Recognition : prithivMLmods/Human-Action-Recognition
[ Human actions including clapping, sitting, running, and more ]
+ Mirage-Photo-Classifier : prithivMLmods/Mirage-Photo-Classifier
[ Whether an image is real or AI-generated (fake) ]
+ Food-101-93M : prithivMLmods/Food-101-93M
[ Classify food images into one of 101 popular dishes ]
+ Hand-Gesture-19 : prithivMLmods/Hand-Gesture-19
[ Classify hand gesture images into different categories ]
+ Trash-Net : prithivMLmods/Trash-Net
[ Classification of trash into six distinct categories ]
+ Gender-Classifier-Mini : prithivMLmods/Gender-Classifier-Mini
[ Classify images based on gender [Male / Female] ]

๐ŸŽกCollections :

+ SigLIP2 Content Filters : prithivMLmods/siglip2-content-filters-models-67f001055ec2bed56ca41f6d
AtAndDevย 
posted an update 12 days ago
view post
Post
2894
Llama 4 is out...
ยท
hesamationย 
posted an update 13 days ago
view post
Post
2828
The best researchers from Yale, Stanford, Google DeepMind, and Microsoft laid out all we know about Agents in a 264-page paper [book],

Here are some of their key findings:

They build a mapping of different agent components, such as perception, memory, and world modelling, to different regions of the human brain and compare them:

- brain is much more energy-efficient
- no genuine experience in agents
- brain learns continuously, agent is static

An agent is broken down to:
- Perception: the agent's input mechanism. can be improved with multi-modality, feedback mechanisms (e.g., human corrections), etc.
- Cognition: learning, reasoning, planning, memory. LLMs are key in this part.
- Action: agent's output and tool use.

Agentic memory is represented as:
- Sensory memory or short-term holding of inputs which is not emphasized much in agents.
- Short-term memory which is the LLM context window
- Long-term memory which is the external storage such as RAG or knowledge graphs.

The memory in agents can be improved and researched in terms of:
- increasing the amount of stored information
- how to retrieve the most relevant info
- combining context-window memory with external memory
- deciding what to forget or update in memory

The agent must simulate or predict the future states of the environment for planning and decision-making.

ai world models are much simpler than the humans' with their causal reasoning (cause-and-effect) or physical intuition.

LLM world models are mostly implicit and embedded.

EMOTIONS are a deep aspect of humans, helping them with social interactions, decision-making, or learning.

Agents must understand emotions to better interact with us.

But rather than encoding the feeling of emotions, they have a surface-level modelling of emotions.

Perception is the process by which an agent receives and interprets raw data from its surroundings.

READ PAPER: Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems (2504.01990)
prithivMLmodsย 
posted an update 13 days ago
view post
Post
2115
ChatGPT-4oโ€™s image generation goes wild for a weekโ€”featuring everything from Studio Ghibli-style art and image colorization to style intermixing. Here are some examples showcasing the generation of highly detailed images from freestyle design templates. Want to know more? Check out the blog ๐Ÿš€

๐Ÿ”—Blog : https://huggingface.co/blog/prithivMLmods/chatgpt-4o-image-gen
hesamationย 
posted an update 17 days ago
view post
Post
2692
What, How, Where, and How Well? This paper reviews test-time scaling methods and all you need to know about them:
> parallel, sequential, hybrid, internal scaling
> how to scale (SFT, RL, search, verification)
> metrics and evals of test-time scaling

๐Ÿ”—paper: What, How, Where, and How Well? A Survey on Test-Time Scaling in Large Language Models (2503.24235)

If you want to learn what inference-time compute scaling is @rasbt has a great blog post on that:
https://magazine.sebastianraschka.com/p/state-of-llm-reasoning-and-inference-scaling
ZennyKennyย 
posted an update 17 days ago
view post
Post
2124
A few new Russian-language synthetic datasets. The labelling is good, but some of the syntax and grammar is not great.

Great for Russian-language classification models, probably not great for fine-tuning Russian-langauge text generation.

- Virtual Assistant Query / Responses: ZennyKenny/ru_virtual_assistant_chatgpt_distill
- LLM Query / Responses: ZennyKenny/russian_llm_response_chatgpt_distill

Crazy how much language drift is still an issue, especially given that Russian constitutes nearly 5% of the content on the internet.
hesamationย 
posted an update 18 days ago
prithivMLmodsย 
posted an update 19 days ago
view post
Post
1871
Luna, the single-speaker text-to-speech model, features a Radio & Atcosim-style sound with a female voice. It offers authentic radio podcast noise and empathetic speech generation, fine-tuned based on Orpheus's Llama-based speech generation state-of-the-art model. ๐ŸŽ™๏ธ

+ Model : prithivMLmods/Llama-3B-Mono-Luna
+ Collection : prithivMLmods/clean-radio-mono-voice-67e76fe1b3a87cc3bccef803
+ Reference ft : https://github.com/canopyai/Orpheus-TTS
+ Base Model : canopylabs/orpheus-3b-0.1-ft

I also tried some other clean-voice single-speaker models based on Orpheus. If you're interested, check out the collection.

๐Ÿ”‰Try the Mono Luna demo here: http://colab.research.google.com/drive/1K0AAIOKDE5XE0znxXaiiUJvPSpFveteK
ยท
ZennyKennyย 
posted an update 23 days ago
view post
Post
1933
Besides being the coolest named benchmark in the game, HellaSwag is an important measurement of ะทะดั€ะฐะฒั‹ะน ัะผั‹ัะปัŒ (or common sense) in LLMs.

- More on HellaSwag: https://github.com/rowanz/hellaswag

I spent the afternoon benchmarking YandexGPT Pro 4th Gen, one of the Russian tech giant's premier models.

- Yandex HF Org: yandex
- More on Yandex models: https://yandex.cloud/ru/docs/foundation-models/concepts/yandexgpt/models

The eval notebook is available on GitHub and the resulting dataset is already on the HF Hub!

- Eval Notebook: https://github.com/kghamilton89/ai-explorer/blob/main/yandex-hellaswag/hellaswag-assess.ipynb
- Eval Dataset: ZennyKenny/yandexgptpro_4th_gen-hellaswag

And of course, everyone wants to see the results so have a look at the results in the context of other zero-shot experiments that I was able to find!
  • 2 replies
ยท
prithivMLmodsย 
posted an update 23 days ago
view post
Post
1707
Dropping some new Journey Art and Realism adapters for Flux.1-Dev, including Thematic Arts, 2021 Memory Adapters, Thread of Art, Black of Art, and more. For more details, visit the model card on Stranger Zone HF ๐Ÿค—

+ Black-of-Art-Flux : strangerzonehf/Black-of-Art-Flux
+ Thread-of-Art-Flux : strangerzonehf/Thread-of-Art-Flux
+ 2021-Art-Flux : strangerzonehf/2021-Art-Flux
+ 3d-Station-Toon : strangerzonehf/3d-Station-Toon
+ New-Journey-Art-Flux : strangerzonehf/New-Journey-Art-Flux
+ Casual-Pencil-Pro : strangerzonehf/Casual-Pencil-Pro
+ Realism-H6-Flux : strangerzonehf/Realism-H6-Flux

- Repository Page : strangerzonehf

The best dimensions and inference settings for optimal results are as follows: A resolution of 1280 x 832 with a 3:2 aspect ratio is recommended for the best quality, while 1024 x 1024 with a 1:1 aspect ratio serves as the default option. For inference, the recommended number of steps ranges between 30 and 35 to achieve optimal output.
  • 1 reply
ยท
prithivMLmodsย 
posted an update 25 days ago
view post
Post
2613
Dropping Downstream tasks using newly initialized parameters and weights ([classifier.bias & weights]) support domain-specific ๐—ถ๐—บ๐—ฎ๐—ด๐—ฒ ๐—ฐ๐—น๐—ฎ๐˜€๐˜€๐—ถ๐—ณ๐—ถ๐—ฐ๐—ฎ๐˜๐—ถ๐—ผ๐—ป. Based on siglip2-base-patch16-224 and DomainNet (single-domain, multi-source adaptation), with Fashion-MNIST & More for experimental testing. ๐Ÿงคโ˜„๏ธ

Fashion-Mnist : prithivMLmods/Fashion-Mnist-SigLIP2
Mnist-Digits : prithivMLmods/Mnist-Digits-SigLIP2
Multisource-121 : prithivMLmods/Multisource-121-DomainNet
Painting-126 : prithivMLmods/Painting-126-DomainNet
Sketch-126 : prithivMLmods/Sketch-126-DomainNet
Clipart-126 : prithivMLmods/Clipart-126-DomainNet

Models are trained with different parameter settings for experimental purposes only, with the intent of further development. Refer to the model page below for instructions on running it with Transformers ๐Ÿค—.

Collection : prithivMLmods/domainnet-0324-67e0e3c934c03cc40c6c8782

Citations : SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features https://arxiv.org/pdf/2502.14786 & Moment Matching for Multi-Source Domain Adaptation : https://arxiv.org/pdf/1812.01754

prithivMLmodsย 
posted an update 29 days ago
view post
Post
2302
Play with Orpheus TTS, a Llama-based Speech-LLM designed for high-quality, empathetic text-to-speech generation. This model has been fine-tuned to deliver human-level speech synthesis ๐Ÿ”ฅ๐Ÿ—ฃ๏ธ

๐Ÿ‘‰GitHub [ Demo ] : https://github.com/PRITHIVSAKTHIUR/Orpheus-TTS-Edge

Demo supporting both text-to-speech and text-to-llm responses in speech.

> voice: tara, dan, emma, josh
> emotion: <laugh>, <chuckle>, <sigh>, <cough>, <sniffle>, <groan>, <yawn>, <gasp>.

๐Ÿฅ Orpheus-3b-0.1-ft
Model Page: canopylabs/orpheus-3b-0.1-ft

๐Ÿฅ Orpheus-3b-0.1-ft
Colab Inference Notebook: https://colab.research.google.com/drive/1KhXT56UePPUHhqitJNUxq63k-pQomz3N?usp=sharing

๐Ÿฅ Finetune [ orpheus-3b-0.1-pretrained ]
Resource: https://github.com/canopyai/Orpheus-TTS/tree/main/finetune

๐Ÿฅ Model-releases:
https://canopylabs.ai/model-releases
  • 1 reply
ยท