agentlans's picture
Update README.md
6f93fba verified
metadata
tags:
  - causal-lm
  - transformers
  - finetuned
  - instruction-following
  - dpo
license: apache-2.0
datasets:
  - agentlans/crash-course
  - Intel/orca_dpo_pairs
language:
  - en
base_model:
  - HuggingFaceTB/SmolLM2-135M-Instruct

SmolLM2-135M-Instruct-Plus

This model is a finetuned version of HuggingFaceTB/SmolLM2-135M-Instruct, aiming to maximize knowledge in a small 135M parameter model.

⚠️ Consider this model a creative text generator. Without additional finetuning, it gives wildly inaccurate answers. Don't trust the output of this model without additional verification.

Model Details

Intended Uses

For research, experimentation, and educational purposes where a small instruction-following model is desired.

Limitations

  • Hallucinations: Prone to generating incorrect information due to its small size.
  • Repetitive Output: May produce repetitive text.

Training Details

Both SFT and DPO share common settings: liger_kernel booster, LoRA fine-tuning, custom model, BF16 compute type, batch size of 2, and a cosine scheduler with a learning rate of 5e-5. RSLoRA is enabled with a rank of 16 and alpha of 32.

The main differences are in the dataset and training specifics. SFT uses CrashCourse_120K with packing enabled and LoRA dropout of 0, while DPO uses orca_pairs with packing disabled and a LoRA dropout of 0.95.

Evaluation

Provides coherent and creative answers but may often be incorrect. Thorough evaluation is recommended before deployment.