3 38 44

Brigitte Tousignant

BrigitteTousi

AI & ML interests

None yet

Recent Activity

reacted to merterbak's post with 🔥 about 11 hours ago

Meta has unveiled its Llama 4 🦙 family of models, featuring native multimodality and mixture-of-experts architecture. Two model families are available now: Models🤗: https://huggingface.co/collections/meta-llama/llama-4-67f0c30d9fe03840bc9d0164 Blog Post: https://ai.meta.com/blog/llama-4-multimodal-intelligence/ HF's Blog Post: https://huggingface.co/blog/llama4-release - 🧠 Native Multimodality - Process text and images in a unified architecture - 🔍 Mixture-of-Experts - First Llama models using MoE for incredible efficiency - 📏 Super Long Context - Up to 10M tokens - 🌐 Multilingual Power - Trained on 200 languages with 10x more multilingual tokens than Llama 3 (including over 100 languages with over 1 billion tokens each) 🔹 Llama 4 Scout - 17B active parameters (109B total) - 16 experts architecture - 10M context window - Fits on a single H100 GPU - Beats Gemma 3, Gemini 2.0 Flash-Lite, and Mistral 3.1 🔹 Llama 4 Maverick - 17B active parameters (400B total) - 128 experts architecture - It can fit perfectly on DGX H100(8x H100) - 1M context window - Outperforms GPT-4o and Gemini 2.0 Flash - ELO score of 1417 on LMArena currently second best model on arena 🔹 Llama 4 Behemoth (Coming Soon) - 288B active parameters (2T total) - 16 experts architecture - Teacher model for Scout and Maverick - Outperforms GPT-4.5, Claude Sonnet 3.7, and Gemini 2.0 Pro on STEM benchmarks

upvoted a collection about 11 hours ago

Llama 4

liked a Space 4 days ago

victor/deepsite-gallery

View all activity

Organizations

BrigitteTousi's activity

reacted to merterbak's post with 🔥 about 11 hours ago

Post

646

Meta has unveiled its Llama 4 🦙 family of models, featuring native multimodality and mixture-of-experts architecture. Two model families are available now:
Models🤗: meta-llama/llama-4-67f0c30d9fe03840bc9d0164
Blog Post: https://ai.meta.com/blog/llama-4-multimodal-intelligence/
HF's Blog Post: https://huggingface.co/blog/llama4-release

- 🧠 Native Multimodality - Process text and images in a unified architecture
- 🔍 Mixture-of-Experts - First Llama models using MoE for incredible efficiency
- 📏 Super Long Context - Up to 10M tokens
- 🌐 Multilingual Power - Trained on 200 languages with 10x more multilingual tokens than Llama 3 (including over 100 languages with over 1 billion tokens each)

🔹 Llama 4 Scout
- 17B active parameters (109B total)
- 16 experts architecture
- 10M context window
- Fits on a single H100 GPU
- Beats Gemma 3, Gemini 2.0 Flash-Lite, and Mistral 3.1

🔹 Llama 4 Maverick
- 17B active parameters (400B total)
- 128 experts architecture
- It can fit perfectly on DGX H100(8x H100)
- 1M context window
- Outperforms GPT-4o and Gemini 2.0 Flash
- ELO score of 1417 on LMArena currently second best model on arena

🔹 Llama 4 Behemoth (Coming Soon)
- 288B active parameters (2T total)
- 16 experts architecture
- Teacher model for Scout and Maverick
- Outperforms GPT-4.5, Claude Sonnet 3.7, and Gemini 2.0 Pro on STEM benchmarks

upvoted a collection about 11 hours ago

Llama 4

Collection

Llama 4 release • 10 items • Updated about 11 hours ago • 248

liked a Space 4 days ago

Deepsite Gallery

🐋

Browse apps made with DeepSite

reacted to AdinaY's post with 👀 5 days ago

Post

1874

AutoGLM 沉思💫 FREE AI Agent released by ZhipuAI

✨ Think & Act simultaneously
✨ Based on a fully self-developed stack: GLM-4 for general, GLM-Z1 for inference, and GLM-Z1-Rumination for rumination
✨ Will openly share these models on April 14 🤯

Preview version👉 https://autoglm-research.zhipuai.cn/?channel=autoglm_android

reacted to fdaudens's post with ❤️ 5 days ago

Post

1711

🔥 DeepSeek vibe coding with DeepSite is going viral with awesome projects!

From games to stunning visualizations, 7 wild examples:

📺 AI TV with custom channels and animations https://x.com/_akhaliq/status/1905747381951545647

🚀 Earth to Moon spacecraft journey visualization
Watch this incredible Three.js space simulation with zero external assets:
https://x.com/_akhaliq/status/1905836902533451999

💣 Minesweeper in 2.5 minutes! Built & deployed instantly on DeepSite. Zero setup needed:
https://x.com/cholf5/status/1906031928937218334

🎮 Asked for Game of Life, got a masterpiece. Simple prompt, complex features. See it in action: https://x.com/pbeyssac/status/1906304454824992844

💫 One-shot anime website with perfect UI. DeepSite turned a simple request into a fully-functional anime site: https://x.com/risphereeditor/status/1905961725028913264

📊 10-minute World Indicators Dashboard. Just described what I wanted and got a full interactive dashboard! https://x.com/i/status/1906345214089785634

✨ Ready to build without coding? Imagine it. Build it. Share it! enzostvs/deepsite

reacted to fdaudens's post with 🔥 10 days ago

Post

1927

Want to ramp up your AI skills and start breaking bigger stories? With the Journalists on Hugging Face community, we're launching our first learn-together course!

We'll build AI classifiers that process months of data in minutes. How?

- Work through an interactive version of an excellent course developed by Ben Welsh and Derek Willis
- Share findings and get help in our dedicated community channel
- Build working classifiers you can use in your reporting today

No coding background needed - if you can write a ChatGPT or Claude prompt, you can do this. Journalists are already using these techniques to break stories, from uncovering hidden real estate deals to tracking unusual campaign spending.

Join us—it might give you your next big story!

Thanks to Ben and Derek for letting me adapt their excellent course into this interactive version!

- Check out the course: JournalistsonHF/first-llm-classifier

- Join our Slack community to learn together: https://docs.google.com/forms/d/e/1FAIpQLSfyA7G6Y9q-5hDBSnGc3CFtg9H8fjqKCCuieptXuTqRudGNjQ/viewform

liked a Space 11 days ago

First Llm Classifier

💻

Learn to use LLMS to organize and analyze massive datasets

liked a model 13 days ago

deepseek-ai/DeepSeek-V3-0324

Text Generation • Updated 10 days ago • 155k • • 2.35k

upvoted an article 17 days ago

Article

AI Policy: 🤗 Response to the White House AI Action Plan RFI

18 days ago

• 22

upvoted an article 18 days ago

Article

Xet is on the Hub

19 days ago

• 41

reacted to Kseniase's post with 🔥 20 days ago

Post

7746

15 types of attention mechanisms

Attention mechanisms allow models to dynamically focus on specific parts of their input when performing tasks. In our recent article, we discussed Multi-Head Latent Attention (MLA) in detail and now it's time to summarize other existing types of attention.

Here is a list of 15 types of attention mechanisms used in AI models:

1. Soft attention (Deterministic attention) -> Neural Machine Translation by Jointly Learning to Align and Translate (1409.0473)
Assigns a continuous weight distribution over all parts of the input. It produces a weighted sum of the input using attention weights that sum to 1.

2. Hard attention (Stochastic attention) -> Effective Approaches to Attention-based Neural Machine Translation (1508.04025)
Makes a discrete selection of some part of the input to focus on at each step, rather than attending to everything.

3. Self-attention -> Attention Is All You Need (1706.03762)
Each element in the sequence "looks" at other elements and "decides" how much to borrow from each of them for its new representation.

4. Cross-Attention (Encoder-Decoder attention) -> Cross-Attention is All You Need: Adapting Pretrained Transformers for Machine Translation (2104.08771)
The queries come from one sequence and the keys/values come from another sequence. It allows a model to combine information from two different sources.

5. Multi-Head Attention (MHA) -> Attention Is All You Need (1706.03762)
Multiple attention “heads” are run in parallel. The model computes several attention distributions (heads), each with its own set of learned projections of queries, keys, and values.

6. Multi-Head Latent Attention (MLA) -> DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model (2405.04434)
Extends MHA by incorporating a latent space where attention heads can dynamically learn different latent factors or representations.

7. Memory-Based attention -> End-To-End Memory Networks (1503.08895)
Involves an external memory and uses attention to read from and write to this memory.

See other types in the comments 👇

1 reply

liked a Space 23 days ago

The Distill Template

🌌

Craft Beautiful Blogs

reacted to ginipick's post with 🔥 24 days ago

Post

4113

🌐 GraphMind: Phi-3 Instruct Graph Explorer

✨ Extract and visualize knowledge graphs from any text in multiple languages!

GraphMind is a powerful tool that leverages the capabilities of Phi-3 to transform unstructured text into structured knowledge graphs, helping you understand complex relationships within any content.

ginigen/Graph-Mind

🚀 Key Features

Multi-language Support 🌍: Process text in English, Korean, and many other languages
Instant Visualization 🧩: See extracted entities and relationships in an interactive graph
Entity Recognition 🏷️: Automatically identifies and categorizes named entities
Optimized Performance ⚡: Uses caching to deliver faster results for common examples
Intuitive Interface 👆: Simple design makes complex graph extraction accessible to everyone

💡 Use Cases

Content Analysis: Extract key entities and relationships from articles or documents
Research Assistance: Quickly visualize connections between concepts in research papers
Educational Tool: Help students understand the structure of complex texts
Multilingual Processing: Extract knowledge from content in various languages

🔧 How It Works

Enter any text in the input field
Select a model from the dropdown
Click "Extract & Visualize"
Explore the interactive knowledge graph and entity recognition results

GraphMind bridges the gap between raw text and structured knowledge, making it easier to identify patterns, extract insights, and understand relationships within any content. Try it now and transform how you interact with textual information!
#NLP #KnowledgeGraph #TextAnalysis #Visualization #Phi3 #MultilingualAI

1 reply

replied to burtenshaw's post 24 days ago

brb making a PR to include dog emoji reaction

reacted to burtenshaw's post with 🔥 24 days ago

Post

1875

everybody and their dog is fine-tuning Gemma 3 today, so I thought I'd do a longer post on the tips and sharp edges I find. let's go!

1. has to be install everything form main and nightly. this is what I'm working with to get unsloth and TRL running

git+https://github.com/huggingface/transformers@main
git+https://github.com/huggingface/trl.git@main
bitsandbytes
peft

plus this with --no-deps

git+https://github.com/unslothai/unsloth-zoo.git@nightly
git+https://github.com/unslothai/unsloth.git@nightly

2. will brown's code to turn GSM8k into a reasoning dataset is a nice toy experiment https://gist.github.com/willccbb/4676755236bb08cab5f4e54a0475d6fb

3. with a learning rate of 5e-6 rewards and loss stayed flat for the first 100 or so steps.

4. so far none of my runs have undermined the outputs after 1 epoch. therefore, I'm mainly experimenting with bigger LoRA adapters.

from trl import GRPOConfig

training_args = GRPOConfig(
    learning_rate = 5e-6,
    adam_beta1 = 0.9,
    adam_beta2 = 0.99,
    weight_decay = 0.1,
    warmup_ratio = 0.1,
    lr_scheduler_type = "cosine",
    optim = "adamw_8bit",
    logging_steps = 1,
    per_device_train_batch_size = 2,
    gradient_accumulation_steps = 1,
    num_generations = 2,
    max_prompt_length = 256,
    max_completion_length = 1024 - 256,
    num_train_epochs = 1,
    max_steps = 250,
    save_steps = 250,
    max_grad_norm = 0.1,
    report_to = "none",
)

5. vision fine-tuning isn't available in TRL's GRPOTrainer, so stick to text datasets. but no need to load the model differently in transformers or Unsloth

from transformers import AutoModelForImageTextToText

model = AutoModelForImageTextToText.from_pretrained("google/gemma-3-4b-it)

if you want an introduction to GRPO, check out the reasoning course, it walks you through the algorithm, theory, and implementation in a smooth way.

reasoning-course

2 replies

reacted to fdaudens's post with 🔥 24 days ago

Post

1481

Ever wanted 45 min with one of AI’s most fascinating minds? Was with @thomwolf at HumanX Vegas. Sharing my notes of his Q&A with the press—completely changed how I think about AI’s future:

1️⃣ The next wave of successful AI companies won’t be defined by who has the best model but by who builds the most useful real-world solutions. "We all have engines in our cars, but that’s rarely the only reason we buy one. We expect it to work well, and that’s enough. LLMs will be the same."

2️⃣ Big players are pivoting: "Closed-source companies—OpenAI being the first—have largely shifted from LLM announcements to product announcements."

3️⃣ Open source is changing everything: "DeepSeek was open source AI’s ChatGPT moment. Basically, everyone outside the bubble realized you can get a model for free—and it’s just as good as the paid ones."

4️⃣ Product innovation is being democratized: Take Manus, for example—they built a product on top of Anthropic’s models that’s "actually better than Anthropic’s own product for now, in terms of agents." This proves that anyone can build great products with existing models.

We’re entering a "multi-LLM world," where models are becoming commoditized, and all the tools to build are readily available—just look at the flurry of daily new releases on Hugging Face.

Thom's comparison to the internet era is spot-on: "In the beginning you made a lot of money by making websites... but nowadays the huge internet companies are not the companies that built websites. Like Airbnb, Uber, Facebook, they just use the internet as a medium to make something for real life use cases."

Love to hear your thoughts on this shift!

1 reply

reacted to thomwolf's post with 🔥🚀 25 days ago

Post

2741

We've kept pushing our Open-R1 project, an open initiative to replicate and extend the techniques behind DeepSeek-R1.

And even we were mind-blown by the results we got with this latest model we're releasing: ⚡️OlympicCoder ( open-r1/OlympicCoder-7B and open-r1/OlympicCoder-32B)

It's beating Claude 3.7 on (competitive) programming –a domain Anthropic has been historically really strong at– and it's getting close to o1-mini/R1 on olympiad level coding with just 7B parameters!

And the best part is that we're open-sourcing all about its training dataset, the new IOI benchmark, and more in our Open-R1 progress report #3: https://huggingface.co/blog/open-r1/update-3

Datasets are are releasing:
- open-r1/codeforces
- open-r1/codeforces-cots
- open-r1/ioi
- open-r1/ioi-test-cases
- open-r1/ioi-sample-solutions
- open-r1/ioi-cots
- open-r1/ioi-2024-model-solutions