Model Card for Model ID

Model Details

Model Description

This model is an aligned version of HuggingFaceTB/SmolLM-135M-Instruct. Method used for training is DPO.

Reward accuracy on training dataset is 99.89.

Example of usage

DEVICE = torch.device("cuda")
tokenizer = AutoTokenizer.from_pretrained(efromomr/llm-course-hw2-dpo)
check_model = AutoModelForCausalLM.from_pretrained(efromomr/llm-course-hw2-dpo)
check_model = check_model.to(DEVICE)
check_model = check_model.eval()

messages = [{"role": "user", "content": "What's your morning routine like?"}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
model_inputs = tokenizer([text], return_tensors="pt")

generated_ids = model.generate(model_inputs.input_ids.to(DEVICE), max_new_tokens=256, do_sample=True)
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

#Hey, I'm excited to start my morning! I remember being in a rush, feeling my heart beat like a tiny muscle, and working like a team. So, I started with breakfast, so was my whole day! πŸ•

#I chose chia seeds because of their crunchy texture and the protein they's got so easy to digest. Then, I added a healthy protein drink of spinach, almonds, and a sprinkle of hemp seeds, which is a really healthy combo! I started drinking a whole serving and got caught by the caffeine kick start, about an hour later! πŸ‰

#And finally, I started reading this good article on breakfast habits, so I set a goal (5 servings a day would be a good goal for me 😊). I was more than happy to follow along, so I headed to the fridge to grab that last few slices of toast! πŸ—ΊοΈ

#As for coffee, I was blown away! It was a good 5, kinda right! My coffee was great with my pancakes, too. πŸ•

#And that's it! You're out of the coffee rush. πŸ•
Downloads last month
35
Safetensors
Model size
135M params
Tensor type
F32
Β·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Dataset used to train efromomr/llm-course-hw2-dpo

Collection including efromomr/llm-course-hw2-dpo