Jaward Sesay

Jaward

AI & ML interests

I like to train large deep neural nets too 🧠🤖💥 | First Paper (AutoAgents: A Framework for Automatic Agent Generation) Accepted @ IJCAI 2024 | Role Model Karpathy

Recent Activity

liked a Space about 17 hours ago

tonyassi/voice-clone

upvoted a paper 3 days ago

One-Minute Video Generation with Test-Time Training

upvoted a paper 3 days ago

SmolVLM: Redefining small and efficient multimodal models

View all activity

Organizations

Jaward's activity

liked a Space about 17 hours ago

1.91k

Voice Clone

🗣

Clone voice to say text

upvoted 2 papers 3 days ago

One-Minute Video Generation with Test-Time Training

Paper • 2504.05298 • Published 5 days ago • 90

SmolVLM: Redefining small and efficient multimodal models

Paper • 2504.05299 • Published 5 days ago • 149

replied to their post 8 days ago

i noticed the models and code are not out yet, but they said they will release them shortly

posted an update 8 days ago

Post

2286

Amazing work👏
Introduces Dream 7B - a discrete diffusion reasoning model, fully opensourced with weights on 🤗
- it outperforms existing non-autoregressive models and matches or beats frontier autoregressive of similar size on reasoning tasks.
Models:
- base: Dream-org/Dream-v0-Base-7B
- SFT: Dream-org/Dream-v0-Instruct-7B
Code: https://github.com/HKUNLP/Dream
Project: https://hkunlp.github.io/blog/2025/dream/

1 reply

posted an update 10 days ago

Post

1873

Implements from first-principle recently proposed dynamic tanh as alternative to layernorm. Specifically, we trained a nanoGPT (0.8 M params) on tiny shakespeare with conventional layernorm, RMSNorm and dynamic tanh, then compared performances. Observed performance seems to match or is stable for α = 0.5~ 1.5, might outperform if trained longer.
Code: https://github.com/Jaykef/ai-algorithms/blob/main/Dynamic_Tanh.ipynb
Background music by 周子珺

reacted to clem's post with 🚀 11 days ago

Post

3939

Before 2020, most of the AI field was open and collaborative. For me, that was the key factor that accelerated scientific progress and made the impossible possible—just look at the “T” in ChatGPT, which comes from the Transformer architecture openly shared by Google.

Then came the myth that AI was too dangerous to share, and companies started optimizing for short-term revenue. That led many major AI labs and researchers to stop sharing and collaborating.

With OAI and sama now saying they're willing to share open weights again, we have a real chance to return to a golden age of AI progress and democratization—powered by openness and collaboration, in the US and around the world.

This is incredibly exciting. Let’s go, open science and open-source AI!

5 replies

posted an update 20 days ago

Post

3645

Implemented a custom multimodal GRPO trainer that scales for Small VLMs, supports cpu and gpu with vllm + flash attention. Using SmolVLM-256M-Instruct reference & reward model, wasn’t trained for long btw, still got some sparks of “thinking”:)
Code: https://github.com/Jaykef/ai-algorithms/blob/main/grpo_multimodal_reasoner.ipynb

1 reply

liked a model 22 days ago

manycore-research/SpatialLM-Llama-1B

Text Generation • Updated 22 days ago • 17.7k • 931

posted an update 22 days ago

Post

1757

Finally, the ground truth / AlexNet’s original source code is available to all.
Context: AlexNet had a historic win in the 2012 ImageNet Large Scale Visual Recognition Challenge (ILSVRC), reducing error rate from 26% (previous best) to 15.3%. It’s a deep CNN with 8 layers (5 convolutional + 3 fully connected), pioneering the use of ReLU activations for faster training, dropout for regularization, and GPU acceleration for large-scale learning. This moment marked the beginning of the deep learning revolution, inspiring architectures like VGG, ResNet, and modern transformers.
Code: https://github.com/computerhistory/AlexNet-Source-Code

posted an update 24 days ago

Post

2112

Nvidia brings blue (from starwars droids) to life 🤯, supercute with flawless dexterity and droid voice. It's the result of their colab research with Google DeepMind and Disney, revealed as part of their new opensource physics engine for robotics simulation: NEWTON - which enables robots to learn how to complete complex tasks with greater precision.

ReadMore: https://developer.nvidia.com/blog/announcing-newton-an-open-source-physics-engine-for-robotics-simulation?ncid=so-twit-820797-vt48

upvoted 2 papers 25 days ago

Being-0: A Humanoid Robotic Agent with Vision-Language Models and Modular Skills

Paper • 2503.12533 • Published 27 days ago • 63

ReCamMaster: Camera-Controlled Generative Rendering from A Single Video

Paper • 2503.11647 • Published 29 days ago • 132

upvoted a paper 27 days ago

Transformers without Normalization

Paper • 2503.10622 • Published 30 days ago • 155

liked a model 29 days ago

sesame/csm-1b

Text-to-Speech • Updated 27 days ago • 93k • • 1.84k

posted an update 30 days ago

Post

1922

This is the most exciting of this week’s release for me: Gemini Robotics - A SOTA generalist Vision-Language-Action model that brings intelligence to the physical world. It comes with a verifiable real-world knowledge Embodied Reasoning QA benchmark. Cool part is that the model can be specialized with fast adaptation to new tasks and have such adaptations transferred to new robot embodiment like humanoids. Looking forward to the model and data on hf, it’s about time I go full physical:)
Technical Report: https://storage.googleapis.com/deepmind-media/gemini-robotics/gemini_robotics_report.pdf

updated a model about 1 month ago

Jaward/CodeOptimus-Instruct-Mistral-7B-v0.1.gguf

Updated about 1 month ago • 19 • 1

posted an update about 1 month ago

Post

2021

Super Interesting Paper!
Proposes neural networks (CRNNs) that can learn to produce traveling waves in their hidden state in response to visual stimuli, thus enabling the transfer and integration of spatial information across neural connections. In other words they showed that neural networks have wave-like properties that blends and processes visual information over time, cool seeing a union of AI and physics in this way.
Paper: https://arxiv.org/pdf/2502.06034
Code: https://github.com/KempnerInstitute/traveling-waves-integrate

posted an update about 1 month ago

Post

2588

Lightweight (nanoGPT) implementation of hybrid norm - an intuitive normalization method that combines the strength of both pre-norm (i.e QKV-norm in MHA) and post-norm in the feed-forward network.
Code: https://github.com/Jaykef/ai-algorithms/blob/main/hybrid_normalization.ipynb

upvoted a paper about 1 month ago

HybridNorm: Towards Stable and Efficient Transformer Training via Hybrid Normalization

Paper • 2503.04598 • Published Mar 6 • 18