GammaCorpus (CoT) Collection The GammaCorpus Dataset Collection for CoT (Chain of Thought) • 1 item • Updated about 11 hours ago • 8
Large Language Models Think Too Fast To Explore Effectively Paper • 2501.18009 • Published 5 days ago • 19
Video Generation models Collection The domain of video generation is booming. Here are the list of selected Open Access video generation (T2V) models. • 14 items • Updated Aug 27, 2024 • 14
Geneva Collection The Geneva Model Collection - Fine-tuned from Mistral Nemo Instruct 2407 (12B) with GammaCorpus v2. • 7 items • Updated about 11 hours ago • 8
story writing favourites Collection Models I personally liked for generating stories in the past. Not a recommendation, many of these are outdated. • 17 items • Updated Nov 11, 2024 • 36
Zurich (14B) Collection The Zurich 14B Model Collection - Fine-tuned from Qwen 2.5 14B Instruct with GammaCorpus v2. • 7 items • Updated about 11 hours ago • 8
Zurich (7B) Collection The Zurich 7B Model Collection - Fine-tuned from Qwen 2.5 7B Instruct with GammaCorpus v2. • 7 items • Updated about 11 hours ago • 7
Zurich Collection The Zurich Model Collection - Fine-tuned from Qwen 2.5 7B Instruct and 14B Instruct with GammaCorpus v2. • 20 items • Updated about 11 hours ago • 7
SFT Models Collection We train a series of SFT models on the high-quality SFT dataset of RLHFlow for research purpose. • 6 items • Updated Nov 3, 2024 • 2
RLHFLow Reward Models Collection Reward models trained by RLHFlow codebase (https://github.com/RLHFlow/RLHF-Reward-Modeling/) • 5 items • Updated Aug 21, 2024 • 2
Online RLHF Collection Datasets, code, and models for online RLHF (i.e., iterative DPO) • 19 items • Updated Jun 12, 2024 • 5
Mixture-of-preference-reward-modeling Collection The mixture of preference datasets used for reward modeling. • 2 items • Updated Apr 29, 2024 • 3
Standard-format-preference-dataset Collection We collect the open-source datasets and process them into the standard format. • 14 items • Updated May 8, 2024 • 24
RLHFlow MATH Process Reward Model Collection This is a collection of datasets and models of process reward modeling. • 15 items • Updated Nov 9, 2024 • 9