DeathGodlike's picture
3

DeathGodlike

DeathGodlike
Β·

AI & ML interests

None yet

Recent Activity

reacted to frimelle's post with πŸ‘ 5 days ago
Seeing AI develop has been a wild ride, from trying to explain why we'd bother to generate a single sentence with a *neural network* to explaining that AI is not a magic, all-knowing box. The recent weeks and months have been a lot of talking about how AI works; to policy makers, to other developers, but also and mainly friends and family without a technical background. Yesterday, the first provisions of the EU AI Act came into force, and one of the the key highlights are the AI literacy requirements for organisations deploying AI systems. This isn't just a box-ticking exercise. Ensuring that employees and stakeholders understand AI systems is crucial for fostering responsible and transparent AI development. From recognising biases to understanding model limitations, AI literacy empowers individuals to engage critically with these technologies and make informed decisions. In the context of Hugging Face, AI literacy has many facets: allowing more people to contribute to AI development, providing courses and documentation to ensuring access is possible, and accessible AI tools that empower users to better understand how AI systems function. This isn't just a regulatory milestone; it’s an opportunity to foster a culture where AI literacy becomes foundational, enabling stakeholders to recognise biases, assess model limitations, and engage critically with technology. Embedding these principles into daily practice, and eventually extending our learnings in AI literacy to the general public, is essential for building trustworthy AI that aligns with societal values.
View all activity

Organizations

None yet

DeathGodlike's activity

reacted to s-emanuilov's post with πŸ”₯ about 2 hours ago
view post
Post
1104
Tutorial πŸ’₯ Training a non-English reasoning model with GRPO and Unsloth

I wanted to share my experiment with training reasoning models in languages other than English/Chinese.

Using Llama 3.1 8B as base, GRPO trainer from trl, and Unsloth optimizations, I got a working prototype in Bulgarian after ~5 hours on an L40S GPU. The approach should work for any language where the base model has some pre-training coverage.

Full code and tutorial here: https://unfoldai.com/reasoning-in-a-non-english-language/

The model itself: s-emanuilov/LLMBG-Llama-3.1-8B-BG-Reasoning-v0.1

I hope this helps anyone looking to build reasoning models in their language.
  • 2 replies
Β·
reacted to ginipick's post with πŸ”₯ 1 day ago
view post
Post
4397
🌟 3D Llama Studio - AI 3D Generation Platform

πŸ“ Project Overview
3D Llama Studio is an all-in-one AI platform that generates high-quality 3D models and stylized images from text or image inputs.

✨ Key Features

Text/Image to 3D Conversion 🎯

Generate 3D models from detailed text descriptions or reference images
Intuitive user interface

Text to Styled Image Generation 🎨

Customizable image generation settings
Adjustable resolution, generation steps, and guidance scale
Supports both English and Korean prompts

πŸ› οΈ Technical Features

Gradio-based web interface
Dark theme UI/UX
Real-time image generation and 3D modeling

πŸ’« Highlights

User-friendly interface
Real-time preview
Random seed generation
High-resolution output support (up to 2048x2048)

🎯 Applications

Product design
Game asset creation
Architectural visualization
Educational 3D content

πŸ”— Try It Now!
Experience 3D Llama Studio:

ginigen/3D-LLAMA

#AI #3DGeneration #MachineLearning #ComputerVision #DeepLearning
reacted to frimelle's post with πŸ‘ 5 days ago
view post
Post
1627
Seeing AI develop has been a wild ride, from trying to explain why we'd bother to generate a single sentence with a *neural network* to explaining that AI is not a magic, all-knowing box. The recent weeks and months have been a lot of talking about how AI works; to policy makers, to other developers, but also and mainly friends and family without a technical background.

Yesterday, the first provisions of the EU AI Act came into force, and one of the the key highlights are the AI literacy requirements for organisations deploying AI systems. This isn't just a box-ticking exercise. Ensuring that employees and stakeholders understand AI systems is crucial for fostering responsible and transparent AI development. From recognising biases to understanding model limitations, AI literacy empowers individuals to engage critically with these technologies and make informed decisions.

In the context of Hugging Face, AI literacy has many facets: allowing more people to contribute to AI development, providing courses and documentation to ensuring access is possible, and accessible AI tools that empower users to better understand how AI systems function. This isn't just a regulatory milestone; it’s an opportunity to foster a culture where AI literacy becomes foundational, enabling stakeholders to recognise biases, assess model limitations, and engage critically with technology.

Embedding these principles into daily practice, and eventually extending our learnings in AI literacy to the general public, is essential for building trustworthy AI that aligns with societal values.
  • 1 reply
Β·
reacted to sometimesanotion's post with πŸ”₯ 6 days ago
view post
Post
2507
I'm just saving today's 14B parameter chart, because big things are about to hit. Lamarck v0.7 has been surpassed by at least two models I know of, and in ways that promise good things to come for the whole scene. I am taking my time to enjoy the progress, and Lamarck v0.8 will come when it's clearly keeping up and keeping its flavor.

There is no one best model for everyone, regardless of these rankings. I aim to make Lamarck good at coding, translating, and rigorously critiquing rhetoric and logic. Always check out the authors' notes on models to see if their intent is close to your use case!
Β·
reacted to sometimesanotion's post with πŸš€ 6 days ago
view post
Post
3143
**Update** Either I had some wrong numbers plugged in to estimate benchmark numbers from comparator, or the benchmark changed. Virtuoso Small v2 at 41.07 average is still very impressive, especially for writing draft copy for business purposes, while Lamarck remains a chatty generalist-reasoning model.

I've felt confident that 14B Qwen finetunes and merges could break the 42.0 average, and Arcee **came close** with https://huggingface.co/arcee-ai/Virtuoso-Small-2. Congratulations to @arcee-ai !

Just two months ago, it was easy to think that 14B had plateaued, that you could have high IFEVAL or high MUSR/MATH/GPQA at 14B, but not both. That barrier is completely shattered. I see a pathway to even better, and Virtuoso Small 2 is a big part of why. Very impressive work. This community would expect no less from Arcee.

Just look at this graph! Keep in mind, my merges here build on the first Virtuoso Small, and *-DS merges build on DeepSeek R1. There are some impressive merges in the pipe!
Β·
reacted to Abhaykoul's post with πŸ‘€ 9 days ago
view post
Post
3746
πŸ”₯ THE WAIT IS OVER... HAI-SER IS HERE! πŸ”₯

Yo fam, this ain't just another AI dropβ€” this is the FUTURE of emotional intelligence! πŸš€

Introducing HAI-SER, powered by Structured Emotional Reasoning (SER), the next-level AI that doesn’t just understand your wordsβ€”it feels you, analyzes your emotions, and helps you navigate life’s toughest moments. πŸ’‘

πŸ’₯ What makes HAI-SER a game-changer?
πŸ”Ή Emotional Vibe Check – Gets the mood, energy, and what’s really going on 🎭
πŸ”Ή Mind-State Analysis – Breaks down your thoughts, beliefs, and patterns 🀯
πŸ”Ή Root Cause Deep-Dive – Unpacks the WHY behind your emotions πŸ’‘
πŸ”Ή Impact Check – Sees how it’s affecting your life and mental health πŸ’”
πŸ”Ή Safety Check – Prioritizes your well-being and crisis management 🚨
πŸ”Ή Healing Game Plan – Custom strategies to help you bounce back πŸ’ͺ
πŸ”Ή Growth Potential – Turns struggles into opportunities for self-improvement πŸ“ˆ
πŸ”Ή How to Approach – Teaches you and others how to communicate and heal 🀝
πŸ”Ή Personalized Response – Not just generic adviceβ€”real talk, tailored to YOU πŸ’―

No more robotic AI responses. No more surface-level advice. HAI-SER gets deep, analyzing emotions with precision and giving real, actionable support.

This ain’t just AIβ€”this is your digital therapist, life coach, and hype squad all in one. Whether it’s mental health, career struggles, relationships, or personal growth, HAI-SER has your back.

πŸš€ The future of emotionally intelligent AI is HERE.
Are you ready? πŸ”₯πŸ’―

HelpingAI/HAI-SER
Β·
reacted to AdinaY's post with πŸ”₯ 12 days ago
reacted to fdaudens's post with πŸ”₯❀️ 13 days ago
view post
Post
8245
Yes, DeepSeek R1's release is impressive. But the real story is what happened in just 7 days after:

- Original release: 8 models, 540K downloads. Just the beginning...

- The community turned those open-weight models into +550 NEW models on Hugging Face. Total downloads? 2.5Mβ€”nearly 5X the originals.

The reason? DeepSeek models are open-weight, letting anyone build on top of them. Interesting to note that the community focused on quantized versions for better efficiency & accessibility. They want models that use less memory, run faster, and are more energy-efficient.

When you empower builders, innovation explodes. For everyone. πŸš€

The most popular community model? @bartowski 's DeepSeek-R1-Distill-Qwen-32B-GGUF version β€” 1M downloads alone.
Β·
reacted to davanstrien's post with πŸ‘€ 13 days ago
view post
Post
1999
🌍 Big step for multilingual AI data!

The Hugging Face community has rated educational content in languages spoken by 1.6 billion people! New additions:
β€’ Japanese
β€’ Italian
β€’ Old High German

Learn more and contribute: https://huggingface.co/blog/davanstrien/fineweb2-community

These ratings can help enhance training data for major world languages.
  • 1 reply
Β·
reacted to onekq's post with πŸ”₯ 19 days ago
view post
Post
4699
πŸ‹DeepSeek πŸ‹ is the real OpenAI 😯
Β·
reacted to alibabasglab's post with πŸ‘ 19 days ago
reacted to tomaarsen's post with πŸ”₯❀️ 26 days ago
view post
Post
4569
🏎️ Today I'm introducing a method to train static embedding models that run 100x to 400x faster on CPU than common embedding models, while retaining 85%+ of the quality! Including 2 fully open models: training scripts, datasets, metrics.

We apply our recipe to train 2 Static Embedding models that we release today! We release:
2️⃣ an English Retrieval model and a general-purpose Multilingual similarity model (e.g. classification, clustering, etc.), both Apache 2.0
🧠 my modern training strategy: ideation -> dataset choice -> implementation -> evaluation
πŸ“œ my training scripts, using the Sentence Transformers library
πŸ“Š my Weights & Biases reports with losses & metrics
πŸ“• my list of 30 training and 13 evaluation datasets

The 2 Static Embedding models have the following properties:
🏎️ Extremely fast, e.g. 107500 sentences per second on a consumer CPU, compared to 270 for 'all-mpnet-base-v2' and 56 for 'gte-large-en-v1.5'
0️⃣ Zero active parameters: No Transformer blocks, no attention, not even a matrix multiplication. Super speed!
πŸ“ No maximum sequence length! Embed texts at any length (note: longer texts may embed worse)
πŸ“ Linear instead of exponential complexity: 2x longer text takes 2x longer, instead of 2.5x or more.
πŸͺ† Matryoshka support: allow you to truncate embeddings with minimal performance loss (e.g. 4x smaller with a 0.56% perf. decrease for English Similarity tasks)

Check out the full blogpost if you'd like to 1) use these lightning-fast models or 2) learn how to train them with consumer-level hardware: https://huggingface.co/blog/static-embeddings

The blogpost contains a lengthy list of possible advancements; I'm very confident that our 2 models are only the tip of the iceberg, and we may be able to get even better performance.

Alternatively, check out the models:
* sentence-transformers/static-retrieval-mrl-en-v1
* sentence-transformers/static-similarity-mrl-multilingual-v1
  • 1 reply
Β·
reacted to prithivMLmods's post with πŸ‘πŸ€— 2 months ago
view post
Post
2659
Milestone for Flux.1 Dev πŸ”₯

πŸ’’The Flux.1 Dev model has crossed 1️⃣0️⃣,0️⃣0️⃣0️⃣ creative public adapters! 🎈
πŸ”— https://huggingface.co/models?other=base_model:adapter:black-forest-labs/FLUX.1-dev

πŸ’’This includes:
- 266 Finetunes
- 19 Quants
- 4 Merges

πŸ’’ Here’s the 10,000th public adapter : 😜
+ strangerzonehf/Flux-3DXL-Partfile-0006

πŸ’’ Page :
+ https://huggingface.co/strangerzonehf

πŸ’’ Collection :
+ prithivMLmods/flux-lora-collections-66dd5908be2206cfaa8519be
reacted to openfree's post with πŸ‘ 4 months ago
view post
Post
3993
MixGen3 is an innovative image generation service that utilizes LoRA (Low-Rank Adaptation) models. Its key features include:

Integration of various LoRA models: Users can explore and select multiple LoRA models through a gallery.
Combination of LoRA models: Up to three LoRA models can be combined to express unique styles and content.
User-friendly interface: An intuitive interface allows for easy model selection, prompt input, and image generation.
Advanced settings: Various options are provided, including image size adjustment, random seed, and advanced configurations.

Main applications of MixGen3:

Content creation
Design and illustration
Marketing and advertising
Education and learning

Value of MixGen3:

Enhancing creativity
Time-saving
Collaboration possibilities
Continuous development

Expected effects:

Increased content diversity
Lowered entry barrier for creation
Improved creativity
Enhanced productivity

MixGen3 is bringing a new wave to the field of image generation by leveraging the advantages of LoRA models. Users can experience the service for free at
https://openfree-mixgen3.hf.space

contacts: [email protected]
  • 1 reply
Β·
reacted to singhsidhukuldeep's post with πŸ‘€ 4 months ago
view post
Post
2164
While Google's Transformer might have introduced "Attention is all you need," Microsoft and Tsinghua University are here with the DIFF Transformer, stating, "Sparse-Attention is all you need."

The DIFF Transformer outperforms traditional Transformers in scaling properties, requiring only about 65% of the model size or training tokens to achieve comparable performance.

The secret sauce? A differential attention mechanism that amplifies focus on relevant context while canceling out noise, leading to sparser and more effective attention patterns.

How?
- It uses two separate softmax attention maps and subtracts them.
- It employs a learnable scalar Ξ» for balancing the attention maps.
- It implements GroupNorm for each attention head independently.
- It is compatible with FlashAttention for efficient computation.

What do you get?
- Superior long-context modeling (up to 64K tokens).
- Enhanced key information retrieval.
- Reduced hallucination in question-answering and summarization tasks.
- More robust in-context learning, less affected by prompt order.
- Mitigation of activation outliers, opening doors for efficient quantization.

Extensive experiments show DIFF Transformer's advantages across various tasks and model sizes, from 830M to 13.1B parameters.

This innovative architecture could be a game-changer for the next generation of LLMs. What are your thoughts on DIFF Transformer's potential impact?
  • 1 reply
Β·
reacted to Felladrin's post with πŸ‘ 4 months ago
view post
Post
3069
MiniSearch is celebrating its 1st birthday! πŸŽ‰

Exactly one year ago, I shared the initial version of this side-project on Hugging Face. Since then, there have been numerous changes under the hood. Nowadays it uses [Web-LLM](https://github.com/mlc-ai/web-llm), [Wllama](https://github.com/ngxson/wllama) and [SearXNG](https://github.com/searxng/searxng). I use it daily as my default search engine and have done my best to make it useful. I hope it's interesting for you too!

HF Space: Felladrin/MiniSearch
Embeddable URL: https://felladrin-minisearch.hf.space
  • 1 reply
Β·
reacted to nyuuzyou's post with ❀️ 4 months ago
view post
Post
1976
πŸŽ“ Introducing Doc4web.ru Documents Dataset - nyuuzyou/doc4web

Dataset highlights:
- 223,739 documents from doc4web.ru, a document hosting platform for students and teachers
- Primarily in Russian, with some English and potentially other languages
- Each entry includes: URL, title, download link, file path, and content (where available)
- Contains original document files in addition to metadata
- Data reflects a wide range of educational topics and materials
- Licensed under Creative Commons Zero (CC0) for unrestricted use

The dataset can be used for analyzing educational content in Russian, text classification tasks, and information retrieval systems. It's also valuable for examining trends in educational materials and document sharing practices in the Russian-speaking academic community. The inclusion of original files allows for in-depth analysis of various document formats and structures.