End of training

Browse files

Files changed (13) hide show

README.md +68 -0
adapter_config.json +18 -0
blocks_4_hook_resid_post.pt +3 -0
blocks_manifest.json +5 -0
special_tokens_map.json +24 -0
tokenizer.json +0 -0
tokenizer_config.json +216 -0
trainable_param.json +0 -0
trainable_param_2025-02-27-08:13.json +0 -0
trainable_param_2025-02-27-08:17.json +0 -0
trainable_param_2025-02-27-08:35.json +0 -0
trainable_param_20250227-0810.json +0 -0
training_args.bin +3 -0

README.md ADDED Viewed

	@@ -0,0 +1,68 @@

+---
+base_model: EleutherAI/pythia-70m-deduped
+library_name: transformers
+model_name: PYTHIA-FT-ORPO-ISAERFT
+tags:
+- generated_from_trainer
+- smol-course
+- module_1
+- isaerft
+licence: license
+---
+# Model Card for PYTHIA-FT-ORPO-ISAERFT
+This model is a fine-tuned version of [EleutherAI/pythia-70m-deduped](https://huggingface.co/EleutherAI/pythia-70m-deduped).
+It has been trained using [TRL](https://github.com/huggingface/trl).
+## Quick start
+```python
+from transformers import pipeline
+question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
+generator = pipeline("text-generation", model="AMindToThink/PYTHIA-FT-ORPO-ISAERFT", device="cuda")
+output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
+print(output["generated_text"])
+```
+## Training procedure
+This model was trained with ORPO, a method introduced in [ORPO: Monolithic Preference Optimization without Reference Model](https://huggingface.co/papers/2403.07691).
+### Framework versions
+- TRL: 0.15.1
+- Transformers: 4.49.0
+- Pytorch: 2.6.0
+- Datasets: 2.21.0
+- Tokenizers: 0.21.0
+## Citations
+Cite ORPO as:
+```bibtex
+@article{hong2024orpo,
+    title        = {{ORPO: Monolithic Preference Optimization without Reference Model}},
+    author       = {Jiwoo Hong and Noah Lee and James Thorne},
+    year         = 2024,
+    eprint       = {arXiv:2403.07691}
+}
+```
+Cite TRL as:
+```bibtex
+@misc{vonwerra2022trl,
+	title        = {{TRL: Transformer Reinforcement Learning}},
+	author       = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallouédec},
+	year         = 2020,
+	journal      = {GitHub repository},
+	publisher    = {GitHub},
+	howpublished = {\url{https://github.com/huggingface/trl}}
+}
+```

adapter_config.json ADDED Viewed

	@@ -0,0 +1,18 @@

+{
+  "auto_mapping": null,
+  "base_model_name_or_path": null,
+  "depth": -1,
+  "hidden_size": null,
+  "inference_mode": false,
+  "is_prompt_learning": false,
+  "lora_r": null,
+  "peft_type": "ISAERFT",
+  "revision": null,
+  "target_hooks": [
+    [
+      "pythia-70m-deduped-res-sm",
+      "blocks.4.hook_resid_post"
+    ]
+  ],
+  "task_type": "CAUSAL_LM"
+}

blocks_4_hook_resid_post.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b7346db737452a0198e12e05d532be53936021a7d4858cf981aac59901abab24
+size 132593

blocks_manifest.json ADDED Viewed

	@@ -0,0 +1,5 @@

+{
+  "block_names": [
+    "blocks_4_hook_resid_post"
+  ]
+}

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,24 @@

+{
+  "bos_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": "<|endoftext|>",
+  "unk_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

tokenizer.json ADDED Viewed