Mohammed Hamdy's picture

Mohammed Hamdy

mmhamdy

·

AI & ML interests

TechBio | AI4Sci | NLP | Reinforcement Learning

Recent Activity

posted an update 6 days ago

What inspired the Transformer architecture in the "Attention Is All You Need" paper? And how were various ideas combined to create this groundbreaking model? In this lengthy article, I explore the story and the origins of some of the ideas introduced in the paper. We'll explore everything from the fundamental attention mechanism that lies at its heart to the surprisingly simple explanation for its name, Transformer. 💡 Examples of ideas explored in the article: ✅ What was the inspiration for the attention mechanism? ✅ How did we go from attention to self-attention? ✅ Did the team have any other names in mind for the model? and more... I aim to tell the story of Transformers as I would have wanted to read it, and hopefully, one that appeals to others interested in the details of this fascinating idea. This narrative draws from video interviews, lectures, articles, tweets/Xs, and some digging into the literature. I have done my best to be accurate, but errors are possible. If you find inaccuracies or have any additions, please do reach out, and I will gladly make the necessary updates. Read the article: https://huggingface.co/blog/mmhamdy/pandemonium-the-transformers-story

published an article 6 days ago

Pandemonium: The Transformers Story

published an article 10 days ago

Osirian AI: A Call For The Resurrection And Reuse Of Deep Learning Models.

View all activity

Organizations

mmhamdy's activity

liked a model 19 days ago

sesame/csm-1b

Text-to-Speech • Updated 20 days ago • 78.8k • • 1.8k

liked a Space 22 days ago

The Distill Template

Craft Beautiful Blogs

liked 2 models about 1 month ago

ElectricAlexis/NotaGen

Updated Feb 26 • 129

microsoft/wham

Updated Feb 21 • 341 • 250

liked a Space about 1 month ago

The Ultra-Scale Playbook

The ultimate guide to training LLM on large GPU Clusters

liked a model 3 months ago

hexgrad/Kokoro-82M

Text-to-Speech • Updated about 12 hours ago • 1.89M • 3.94k

liked a dataset 3 months ago

HuggingFaceH4/MATH-500

Viewer • Updated Nov 15, 2024 • 500 • 60.5k • 136

liked a model 4 months ago

answerdotai/ModernBERT-base

Fill-Mask • Updated Jan 15 • 3.26M • 816

liked a Space 4 months ago

Scaling test-time compute

Enhance math problem solving by scaling test-time compute

liked a model 4 months ago

CohereForAI/c4ai-command-r7b-12-2024

Text Generation • Updated Feb 20 • 9.44k • 379

liked a Space 4 months ago

Discussion Forum

liked a dataset 4 months ago

CohereForAI/Global-MMLU

Viewer • Updated 16 days ago • 602k • 18.8k • 116

liked a Space 4 months ago

Language Leads Dashboard

View and search languages by lead status

liked 3 datasets 4 months ago

zjunlp/Mol-Instructions

Updated Mar 3, 2024 • 1.08k • 52

AI-MO/NuminaMath-CoT

Viewer • Updated Nov 25, 2024 • 860k • 5.01k • 438

HuggingFaceTB/smoltalk

Viewer • Updated Feb 10 • 2.2M • 6.3k • 318

liked a dataset 6 months ago

KbsdJames/Omni-MATH

Viewer • Updated Oct 12, 2024 • 4.43k • 2.84k • 93

liked a model 8 months ago

HuggingFaceTB/SmolLM-135M-Instruct

Text Generation • Updated Sep 4, 2024 • 34k • • 112

liked a model 10 months ago

fireworks-ai/llama-3-firefunction-v2

Text Generation • Updated Jun 18, 2024 • 1.37k • 145

liked a Space 10 months ago

FineWeb: decanting the web for the finest text data at scale

Generate high-quality web text data for LLM training