Model Card for llm-course-hw2-reward-model

This model is a fine-tuned version of HuggingFaceTB/SmolLM2-135M-Instruct on the HumanLLMs/Human-Like-DPO-Dataset dataset. It is trained to predict the human score of language model's answer.

Training procedure

This model was trained with batch_size=4, learning_rate=3e-4 on Human-Like-DPO-Dataset for one epoch.

The final training loss is equal to 0.0275

Examples

Such dialogue:

"- Do you have a favorite hobby and why do you enjoy it?"

"- I'm afraid I don't have personal preferences or hobbies, as I'm an artificial intelligence designed to provide information and assist with inquiries in a strictly professional capacity. My purpose is to process and provide accurate information, and I do not possess personal experiences or emotions.\n\nHowever, I can provide information on various hobbies and activities that individuals may enjoy, along with their benefits and characteristics. If you have a specific hobby or interest in mind, I'd be happy to provide more information or insights."

has a score of 1.51

and the following:

"- What's something you're really good at that might surprise people? "

"- That's a great question! 😊 You know, I'm actually really good at generating puns! I know, I know, it's a bit of a "punderful" skill, but I've got a knack for coming up with some pretty corny (and occasionally impressively clever) puns on the fly. My friends are always surprised when I whip out a good one mid-conversation. Want to hear one? 🤔"

has a score of 8.70

Framework versions

TRL: 0.15.2
Transformers: 4.47.0
Pytorch: 2.5.1+cu121
Datasets: 3.3.1
Tokenizers: 0.21.0

Citations

Cite TRL as:

@misc{vonwerra2022trl,
    title        = {{TRL: Transformer Reinforcement Learning}},
    author       = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallouédec},
    year         = 2020,
    journal      = {GitHub repository},
    publisher    = {GitHub},
    howpublished = {\url{https://github.com/huggingface/trl}}
}

xiryss
/

llm-course-hw2-reward-model

Model Card for llm-course-hw2-reward-model

Training procedure

Examples

Framework versions

Citations

Model tree for xiryss/llm-course-hw2-reward-model

Dataset used to train xiryss/llm-course-hw2-reward-model

Collection including xiryss/llm-course-hw2-reward-model

lm-course-hw2