Reward-Calibration
updated
HINT-lab/llama3-8b-final-ppo-c-v0.3
Text Generation
• 8B • Updated
HINT-lab/mistral-7b-hermes-crm-skywork
7B • Updated
• 1
HINT-lab/mistral-7b-hermes-cdpo-v0.2
Text Generation
• 7B • Updated
• 1
HINT-lab/mistral-7b-ppo-clean-hermes
Text Generation
• 7B • Updated
• 5
HINT-lab/mistral-7b-ppo-hermes-v0.3
Text Generation
• 7B • Updated
• 1
• 1
HINT-lab/mistral-7b-ppo-m-hermes
Text Generation
• 7B • Updated
• 3
• 1
HINT-lab/llama3-8b-cdpo-v0.2
Text Generation
• 8B • Updated
HINT-lab/llama3-8b-final-ppo-v0.3
Text Generation
• 8B • Updated
• 3
HINT-lab/mistral-7b-hermes-rm-skywork
7B • Updated
• 1
HINT-lab/llama3-8b-final-ppo-m-v0.3
Text Generation
• 8B • Updated
• 5
HINT-lab/llama3-8b-crm-final-v0.1
8B • Updated
HINT-lab/llama3-8b-final-ppo-clean-v0.1
Text Generation
• 8B • Updated
HINT-lab/mistral-7b-hermes-dpo-v0.2
Text Generation
• 7B • Updated
HINT-lab/mistral-7b-ppo-c-hermes
Text Generation
• 7B • Updated
• 4
HINT-lab/llama3-8b-dpo-v0.2
Text Generation
• 8B • Updated