Jaward Sesay

Jaward

AI & ML interests

I like to train large deep neural nets too 🧠🤖💥 | First Paper (AutoAgents: A Framework for Automatic Agent Generation) Accepted @ IJCAI 2024 | Role Model Karpathy

Recent Activity

replied to their post 3 days ago

Amazing work👏 Introduces Dream 7B - a discrete diffusion reasoning model, fully opensourced with weights on 🤗 - it outperforms existing non-autoregressive models and matches or beats frontier autoregressive of similar size on reasoning tasks. Models: - base: https://huggingface.co/Dream-org/Dream-v0-Base-7B - SFT: https://huggingface.co/Dream-org/Dream-v0-Instruct-7B Code: https://github.com/HKUNLP/Dream Project: https://hkunlp.github.io/blog/2025/dream/

posted an update 3 days ago

posted an update 5 days ago

Implements from first-principle recently proposed dynamic tanh as alternative to layernorm. Specifically, we trained a nanoGPT (0.8 M params) on tiny shakespeare with conventional layernorm, RMSNorm and dynamic tanh, then compared performances. Observed performance seems to match or is stable for α = 0.5~ 1.5, might outperform if trained longer. Code: https://github.com/Jaykef/ai-algorithms/blob/main/Dynamic_Tanh.ipynb Background music by 周子珺

View all activity

Organizations

Posts 81

Post

2186

Amazing work👏
Introduces Dream 7B - a discrete diffusion reasoning model, fully opensourced with weights on 🤗
- it outperforms existing non-autoregressive models and matches or beats frontier autoregressive of similar size on reasoning tasks.
Models:
- base: Dream-org/Dream-v0-Base-7B
- SFT: Dream-org/Dream-v0-Instruct-7B
Code: https://github.com/HKUNLP/Dream
Project: https://hkunlp.github.io/blog/2025/dream/

View all Posts