Built with Axolotl

See axolotl config

axolotl version: 0.4.1

base_model: EleutherAI/pythia-160m-deduped
load_in_8bit: 
datasets:
  - path: vicgalle/alpaca-gpt4
    type: alpaca
  - path: llamafactory/alpaca_gpt4_en
    type: alpaca
  - path: cognitivecomputations/dolphin
    name: flan1m-alpaca-uncensored
    type: alpaca
    shards: 10

dataset_prepared_path: ds-mega-alpaca
#dataset_shard_num: 10
chat_template: inst
val_set_size: 0.001
adapter: lora
lora_model_dir: 
sequence_len: 2048
lora_r: 16
lora_alpha: 16
lora_dropout: 0.05
lora_target_modules:
  - query_key_value
lora_target_linear: 
lora_fan_in_fan_out: true  # pythia/GPTNeoX lora specific
lora_modules_to_save:
  - embed_in
  - embed_out
  - lm_head
lora_on_cpu: false
# ReLoRA configuration
# # Must use either 'lora' or 'qlora' adapter, and does not support fsdp or deepspeed
# relora_steps: # Number of steps per ReLoRA restart
# relora_warmup_steps: # Number of per-restart warmup steps
# relora_anneal_steps: # Number of anneal steps for each relora cycle
# relora_prune_ratio: # threshold for optimizer magnitude when pruning
# relora_cpu_offload:  # True to perform lora weight merges on cpu during restarts, for modest gpu memory savings
relora_steps: 600
relora_warmup_steps: 10
relora_cpu_offload: true 
wandb_project: pythia
wandb_entity:
wandb_watch:
wandb_name: pythia-160m-dolphin-extended
wandb_log_model:
output_dir: ./outputs/lora-alpaca-pythia-160m-dolphin-extended
gradient_accumulation_steps: 16
micro_batch_size: 1
num_epochs: 1
learning_rate: 0.0004
lr_scheduler: cosine_with_restarts
#cosine_min_lr_ratio: 0.1
train_on_inputs: false
group_by_length: false
#bf16: auto
#fp16: true
#tf32: false
float16: true
flash_attn: 
xformers_attention: true
optimizer: paged_adamw_8bit
gpu_memory_limit: 8GiB
hub_model_id: jtatman/pythia-160m-dolphin-extended
early_stopping_patience: 10
#resume_from_checkpoint:  outputs/lora-alpaca-pythia-160m-dolphin-extended/checkpoint-11400
auto_resume_from_checkpoints: true
local_rank:
weight_decay: 0.0
#evals_per_epoch: 4
eval_steps: 200
logging_steps: 1
save_steps: 200
save_total_limit: 5
warmup_steps: 100
tokens:
  - "[INST]"
  - "[/INST]"

pythia-160m-dolphin-extended

This model is a fine-tuned version of EleutherAI/pythia-160m-deduped on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 6.6729

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0004
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine_with_restarts
  • lr_scheduler_warmup_steps: 100
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss
25.9906 0.0001 1 29.5342
21.1303 0.0167 200 20.2350
16.5026 0.0334 400 18.4930
17.2725 0.0500 600 16.3395
11.9697 0.0667 800 12.1401
11.3783 0.0834 1000 11.8383
12.8084 0.1001 1200 12.9667
9.4119 0.1167 1400 9.8787
10.3527 0.1334 1600 10.0560
9.3545 0.1501 1800 9.7355
8.9165 0.1668 2000 9.1513
8.5467 0.1835 2200 8.2025
7.9152 0.2001 2400 7.6616
7.3362 0.2168 2600 7.5699
7.9374 0.2335 2800 7.4818
7.838 0.2502 3000 7.4635
7.5731 0.2668 3200 7.4899
7.8289 0.2835 3400 7.3594
7.8906 0.3002 3600 8.0934
7.7318 0.3169 3800 7.5812
7.2089 0.3335 4000 7.4839
7.202 0.3502 4200 7.4486
6.9493 0.3669 4400 7.3208
7.1492 0.3836 4600 7.2469
7.3443 0.4003 4800 7.1378
7.7056 0.4169 5000 7.1385
55.0553 0.4336 5200 50.0135
7.1868 0.4503 5400 6.9898
6.5803 0.4670 5600 6.9559
8.6171 0.4836 5800 7.9075
7.1373 0.5003 6000 6.9280
6.7077 0.5170 6200 6.8797
7.0026 0.5337 6400 6.8635
6.6797 0.5504 6600 6.8178
6.8067 0.5670 6800 6.7893
6.5979 0.5837 7000 6.8106
6.7283 0.6004 7200 6.7998
7.0015 0.6171 7400 6.7705
6.1182 0.6337 7600 6.7592
6.7919 0.6504 7800 6.7446
6.4523 0.6671 8000 6.7260
6.765 0.6838 8200 6.7135
6.4625 0.7004 8400 6.7099
6.79 0.7171 8600 6.7070
6.6101 0.7338 8800 6.7017
6.7541 0.7505 9000 6.6964
6.7777 0.7672 9200 6.6901
7.2082 0.7838 9400 6.6869
6.4263 0.8005 9600 6.6875
6.1944 0.8172 9800 6.6803
6.7745 0.8339 10000 6.6865
6.6746 0.8505 10200 6.6756
6.6319 0.8672 10400 6.6941
6.6657 0.8839 10600 6.6764
6.8516 0.9006 10800 6.6776
6.6391 0.9173 11000 6.6749
6.5763 0.9339 11200 6.6729
6.585 0.9506 11400 6.6694
6.2999 0.9673 11600 6.6722
6.8343 0.9840 11800 6.6729

Framework versions

  • PEFT 0.11.1
  • Transformers 4.41.2
  • Pytorch 2.3.0+cu121
  • Datasets 2.19.1
  • Tokenizers 0.19.1

Evaluation Results

Groups Version Filter n-shot Metric Value Stderr
Open LLM Leaderboard N/A none 5 rouge2_max 16.4873 ± 1.0172
- winogrande 1 none 5 acc 0.5120 ± 0.0224
- gsm8k 3 strict-match 5 exact_match 0.0060 ± 0.0035
- hellaswag 1 none 10 acc 0.3520 ± 0.0214
- mmlu N/A none 0 acc 0.2533 ± 0.0039
none 5 rouge2_acc 0.1920 ± 0.0176
none 5 rougeL_acc 0.3860 ± 0.0218
flexible-extract 5 exact_match 0.0220 ± 0.0066
strict-match 5 exact_match 0.0060 ± 0.0035
none 5 rougeL_diff -0.7765 ± 1.0034
none 5 rouge1_acc 0.3700 ± 0.0216
none 5 rouge1_diff -1.5564 ± 1.0223
none 5 acc_norm 0.3180 ± 0.0145
none 5 bleu_diff -0.6500 ± 0.6421
none 5 rouge1_max 36.3550 ± 0.9462
none 5 acc 0.2664 ± 0.0036
none 5 rougeL_max 33.8798 ± 0.9367
none 5 bleu_max 15.2292 ± 0.6714
none 5 bleu_acc 0.4360 ± 0.0222
none 5 rouge2_diff -3.3178 ± 0.9477
- mmlu N/A none 0 acc 0.2533 ± 0.0039
- humanities N/A none 5 acc 0.2408 ± 0.0075
- other N/A none 5 acc 0.2443 ± 0.0080
- social_sciences N/A none 5 acc 0.2538 ± 0.0081
- stem N/A none 5 acc 0.2740 ± 0.0079
- truthfulqa N/A none 0 rouge2_max 16.4873 ± 1.0172
none 0 rouge2_acc 0.1920 ± 0.0176
none 0 rougeL_acc 0.3860 ± 0.0218
none 0 rougeL_diff -0.7765 ± 1.0034
none 0 rouge1_acc 0.3700 ± 0.0216
none 0 rouge1_diff -1.5564 ± 1.0223
none 0 bleu_diff -0.6500 ± 0.6421
none 0 rouge1_max 36.3550 ± 0.9462
none 0 acc 0.3435 ± 0.0137
none 0 rougeL_max 33.8798 ± 0.9367
none 0 bleu_max 15.2292 ± 0.6714
none 0 bleu_acc 0.4360 ± 0.0222
none 0 rouge2_diff -3.3178 ± 0.9477
Downloads last month
165
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for jtatman/pythia-160m-dolphin-extended

Finetuned
(123)
this model
Quantizations
1 model

Datasets used to train jtatman/pythia-160m-dolphin-extended