dhmeltzer
/

llama-7b-SFT-qlora-eli5-wiki_DPO_ds_RM_contrast_1024_r_64_alpha_16

Safetensors

Generated from Trainer

Model card Files Files and versions Community

dhmeltzer commited on Sep 5, 2023

Commit

2ab2de6

1 Parent(s): 89f423a

Upload model

Browse files

Files changed (2) hide show

README.md +14 -68
adapter_model.safetensors +3 -0

README.md CHANGED Viewed

@@ -1,75 +1,21 @@
 ---
-base_model: dhmeltzer/llama-7b-SFT_eli5_wiki65k_1024_r_64_alpha_16_merged
-tags:
-- generated_from_trainer
-model-index:
-- name: llama-7b-SFT-qlora-eli5-wiki_DPO_ds_RM_contrast_1024_r_64_alpha_16
-  results: []
 ---
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
-# llama-7b-SFT-qlora-eli5-wiki_DPO_ds_RM_contrast_1024_r_64_alpha_16
-This model is a fine-tuned version of [dhmeltzer/llama-7b-SFT_eli5_wiki65k_1024_r_64_alpha_16_merged](https://huggingface.co/dhmeltzer/llama-7b-SFT_eli5_wiki65k_1024_r_64_alpha_16_merged) on an unknown dataset.
-It achieves the following results on the evaluation set:
-- Loss: 0.6234
-- Rewards/chosen: 0.0858
-- Rewards/rejected: -0.1898
-- Rewards/accuracies: 0.6574
-- Rewards/margins: 0.2756
-- Logps/rejected: -198.1188
-- Logps/chosen: -205.4868
-- Logits/rejected: 0.7931
-- Logits/chosen: 0.8315
-## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
 ## Training procedure
-### Training hyperparameters
-The following hyperparameters were used during training:
-- learning_rate: 0.0002
-- train_batch_size: 32
-- eval_batch_size: 32
-- seed: 42
-- gradient_accumulation_steps: 4
-- total_train_batch_size: 128
-- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
-- lr_scheduler_type: cosine
-- lr_scheduler_warmup_ratio: 0.03
-- num_epochs: 1
-### Training results
-| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
-|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
-| 0.6867        | 0.1   | 19   | 0.6390          | 0.0633         | -0.1318          | 0.6451             | 0.1951          | -197.8286      | -205.5991    | 0.7774          | 0.8133        |
-| 0.6727        | 0.21  | 38   | 0.6384          | 0.0354         | -0.2285          | 0.6529             | 0.2639          | -198.3123      | -205.7386    | 0.8054          | 0.8432        |
-| 0.6577        | 0.31  | 57   | 0.6391          | -0.0114        | -0.2258          | 0.6406             | 0.2145          | -198.2988      | -205.9725    | 0.7954          | 0.8346        |
-| 0.6609        | 0.42  | 76   | 0.6344          | -0.3737        | -0.6175          | 0.6417             | 0.2438          | -200.2571      | -207.7841    | 0.7818          | 0.8194        |
-| 0.6536        | 0.52  | 95   | 0.6285          | -0.1130        | -0.3816          | 0.6652             | 0.2687          | -199.0778      | -206.4805    | 0.7958          | 0.8350        |
-| 0.654         | 0.62  | 114  | 0.6342          | 0.0007         | -0.2311          | 0.6484             | 0.2318          | -198.3250      | -205.9122    | 0.7917          | 0.8303        |
-| 0.6435        | 0.73  | 133  | 0.6258          | 0.0462         | -0.2234          | 0.6562             | 0.2696          | -198.2865      | -205.6845    | 0.7949          | 0.8332        |
-| 0.6508        | 0.83  | 152  | 0.6234          | 0.0858         | -0.1898          | 0.6574             | 0.2756          | -198.1188      | -205.4868    | 0.7931          | 0.8315        |
-| 0.6361        | 0.94  | 171  | 0.6269          | 0.1007         | -0.1655          | 0.6618             | 0.2662          | -197.9971      | -205.4121    | 0.7975          | 0.8353        |
 ### Framework versions
-- Transformers 4.32.1
-- Pytorch 2.0.1+cu118
-- Datasets 2.14.4
-- Tokenizers 0.13.3

 ---
+library_name: peft
 ---
 ## Training procedure
+The following `bitsandbytes` quantization config was used during training:
+- quant_method: bitsandbytes
+- load_in_8bit: False
+- load_in_4bit: True
+- llm_int8_threshold: 6.0
+- llm_int8_skip_modules: None
+- llm_int8_enable_fp32_cpu_offload: False
+- llm_int8_has_fp16_weight: False
+- bnb_4bit_quant_type: nf4
+- bnb_4bit_use_double_quant: True
+- bnb_4bit_compute_dtype: bfloat16
 ### Framework versions
+- PEFT 0.5.0

adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:794d5feb4c159bec3d0214ace93797b952c97e25ec89f2c112ac604479f61284
+size 639691872