Open Science

community

AI & ML interests

None defined yet.

Recent Activity

eliebak authored a paper 3 days ago

INTELLECT-1 Technical Report

eliebak authored a paper 3 days ago

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

eliebak authored a paper 9 months ago

Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations

View all activity

open-science's activity

eliebak

authored 2 papers 3 days ago

INTELLECT-1 Technical Report

Paper • 2412.01152 • Published Dec 2, 2024

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

Paper • 2502.02737 • Published 5 days ago • 140

eliebak

posted an update 8 months ago

Post

1607

Wow, impressive 340B model by nvidia with a nice permissive license! 🚀 The technical report is full of insights and seems to use a different learning rate schedule than cosine, probably a variant of WSD. Hope to get more info on that! 👀

nvidia/nemotron-4-340b-666b7ebaf1b3867caf2f1911

eliebak

authored a paper 9 months ago

Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations

Paper • 2405.18392 • Published May 28, 2024 • 12