bikalnetomi
/

rlhf-ppo-llama31-8B-Reward-model-lora-r64-bikal

Generated from Trainer

Inference Endpoints

Model card Files Files and versions Community

rlhf-ppo-llama31-8B-Reward-model-lora-r64-bikal

1 contributor

History: 4 commits

bikalnetomi's picture

Update README.md

acafd13 verified 2 months ago