Jim Lai

grimjim

AI & ML interests

Experimenting primarily with 7B-12B parameter text completion models. Not all models are intended for direct use, but aim for research and/or educational purposes.

Recent Activity

updated a model about 1 hour ago

grimjim/kunoichi-lemon-royale-v2-graft-32K-7B

published a model about 1 hour ago

grimjim/kunoichi-lemon-royale-v2-graft-32K-7B

updated a model 2 days ago

grimjim/Magnolia-v3a-12B

View all activity

Organizations

grimjim's activity

updated a model about 1 hour ago

grimjim/kunoichi-lemon-royale-v2-graft-32K-7B

Text Generation • Updated about 1 hour ago

published a model about 1 hour ago

grimjim/kunoichi-lemon-royale-v2-graft-32K-7B

Text Generation • Updated about 1 hour ago

updated a model 2 days ago

grimjim/Magnolia-v3a-12B

Text Generation • Updated 2 days ago • 10 • 1

published a model 3 days ago

grimjim/Magnolia-v3a-12B

Text Generation • Updated 2 days ago • 10 • 1

updated a model 7 days ago

grimjim/Magnolia-v7-12B

Text Generation • Updated 7 days ago • 5 • 1

published a model 7 days ago

grimjim/Magnolia-v7-12B

Text Generation • Updated 7 days ago • 5 • 1

updated 3 models 7 days ago

published 2 models 7 days ago

grimjim/Magnolia-v6-12B

Text Generation • Updated 7 days ago • 6 • 1

grimjim/MagnaRei-v2-12B

Text Generation • Updated 7 days ago • 7 • 1

published a model 8 days ago

grimjim/MagnaRei-v1-12B

Text Generation • Updated 7 days ago • 6 • 1

posted an update 8 days ago

Post

1486

I recently have been looking at a paper titled "Why Warmup the Learning Rate? Underlying Mechanisms and Improvements", by Dayal Singh Kalra and Maissam Barkeshli, and was struck by "warmup" being analogous to simulated annealing.
https://arxiv.org/abs/2406.09405
Taking the physical analogy further, the "warmup" is a stochastic process to knock the system out of current local minima, allowing easier transition toward newer minima. It works because it reduces "fit" and therefore "friction".