Jim Lai's picture

Jim Lai

grimjim

AI & ML interests

Experimenting primarily with 7B-12B parameter text completion models. Not all models are intended for direct use, but aim for research and/or educational purposes.

Recent Activity

updated a model about 1 hour ago
grimjim/kunoichi-lemon-royale-v2-graft-32K-7B
published a model about 1 hour ago
grimjim/kunoichi-lemon-royale-v2-graft-32K-7B
updated a model 2 days ago
grimjim/Magnolia-v3a-12B
View all activity

Organizations

Social Post Explorers's profile picture Hugging Face Discord Community's profile picture Debased AI's profile picture Anthracite's profile picture Anthracite Core's profile picture

grimjim's activity

posted an update 8 days ago
view post
Post
1486
I recently have been looking at a paper titled "Why Warmup the Learning Rate? Underlying Mechanisms and Improvements", by Dayal Singh Kalra and Maissam Barkeshli, and was struck by "warmup" being analogous to simulated annealing.
https://arxiv.org/abs/2406.09405
Taking the physical analogy further, the "warmup" is a stochastic process to knock the system out of current local minima, allowing easier transition toward newer minima. It works because it reduces "fit" and therefore "friction".
New activity in grimjim/Magnolia-v5a-12B about 1 month ago
New activity in grimjim/PAlign-PAPI-personality_prompt.json-cleaned about 1 month ago

Add task category and link to paper

#2 opened about 1 month ago by
nielsr