Built with Axolotl

See axolotl config

axolotl version: 0.4.1

adapter: lora
auto_find_batch_size: true
base_model: katuni4ka/tiny-random-falcon-40b
bf16: auto
chat_template: llama3
dataloader_num_workers: 12
dataset_prepared_path: null
datasets:
- data_files:
  - ec111cc653f30e9a_train_data.json
  ds_type: json
  format: custom
  path: /workspace/input_data/ec111cc653f30e9a_train_data.json
  type:
    field_input: ''
    field_instruction: instruction
    field_output: output
    format: '{instruction}'
    no_input_format: '{instruction}'
    system_format: '{system}'
    system_prompt: ''
debug: null
deepspeed: null
early_stopping_patience: 3
early_stopping_threshold: 0.0001
eval_max_new_tokens: 128
eval_steps: 1200
eval_strategy: null
flash_attention: true
fp16: null
fsdp: null
fsdp_config: null
gradient_accumulation_steps: 2
gradient_checkpointing: false
group_by_length: false
hub_model_id: mrferr3t/0c6b4c13-e227-46cd-b496-04f63f164db1
hub_repo: null
hub_strategy: checkpoint
hub_token: null
learning_rate: 0.0004
load_in_4bit: false
load_in_8bit: false
local_rank: null
logging_steps: 1200
lora_alpha: 16
lora_dropout: 0.05
lora_fan_in_fan_out: null
lora_model_dir: null
lora_r: 8
lora_target_linear: true
lr_scheduler: cosine
max_steps: 
micro_batch_size: 32
mlflow_experiment_name: /tmp/ec111cc653f30e9a_train_data.json
model_type: AutoModelForCausalLM
num_epochs: 100
optimizer: adamw_bnb_8bit
output_dir: miner_id_24
pad_to_sequence_len: true
resume_from_checkpoint: /workspace/hub_repo/last-checkpoint
s2_attention: null
sample_packing: false
save_steps: 1200
saves_per_epoch: 0
sequence_len: 512
special_tokens:
  pad_token: <|endoftext|>
strict: false
tf32: false
tokenizer_type: AutoTokenizer
train_on_inputs: false
trust_remote_code: true
val_set_size: 0.05
wandb_entity: null
wandb_mode: 
wandb_name: 2732f256-9976-4b7d-b49e-d254f0bbccf7
wandb_project: Gradients-On-Demand
wandb_run: your_name
wandb_runid: 2732f256-9976-4b7d-b49e-d254f0bbccf7
warmup_steps: 100
weight_decay: 0.0
xformers_attention: null

0c6b4c13-e227-46cd-b496-04f63f164db1

This model is a fine-tuned version of katuni4ka/tiny-random-falcon-40b on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 10.0345

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0004
  • train_batch_size: 32
  • eval_batch_size: 32
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 64
  • optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • num_epochs: 100

Training results

Training Loss Epoch Step Validation Loss
No log 0.0009 1 11.1313
20.5675 2.1680 2400 10.1603
20.2989 4.3360 4800 10.1175
20.239 6.5041 7200 10.0924
20.2067 8.6721 9600 10.0801
20.1885 10.8401 12000 10.0730
20.1748 13.0081 14400 10.0671
20.1654 15.1762 16800 10.0625
20.1595 17.3442 19200 10.0591
20.1544 19.5122 21600 10.0572
20.1493 21.6802 24000 10.0559
20.1461 23.8482 26400 10.0547
20.1428 26.0163 28800 10.0523
20.1405 28.1843 31200 10.0509
20.1372 30.3523 33600 10.0501
20.1354 32.5203 36000 10.0496
20.1335 34.6883 38400 10.0487
20.1309 36.8564 40800 10.0478
20.1297 39.0244 43200 10.0468
20.1295 41.1924 45600 10.0454
20.1259 43.3604 48000 10.0444
20.1234 45.5285 50400 10.0427
20.1211 47.6965 52800 10.0423
20.1206 49.8645 55200 10.0420
20.1183 52.0325 57600 10.0409
20.1177 54.2005 60000 10.0401
20.1165 56.3686 62400 10.0394
20.1141 58.5366 64800 10.0391
20.1138 60.7046 67200 10.0383
20.1124 62.8726 69600 10.0375
20.1136 65.0407 72000 10.0376
20.1108 67.2087 74400 10.0365
20.1106 69.3767 76800 10.0366
20.1107 71.5447 79200 10.0363
20.1099 73.7127 81600 10.0358
20.1089 75.8808 84000 10.0355
20.1083 78.0488 86400 10.0353
20.1081 80.2168 88800 10.0352
20.1081 82.3848 91200 10.0351
20.1075 84.5528 93600 10.0349
20.1065 86.7209 96000 10.0348
20.1074 88.8889 98400 10.0346
20.1066 91.0569 100800 10.0346
20.1056 93.2249 103200 10.0345
20.1065 95.3930 105600 10.0346
20.1062 97.5610 108000 10.0345
20.1061 99.7290 110400 10.0345

Framework versions

  • PEFT 0.13.2
  • Transformers 4.46.0
  • Pytorch 2.5.0+cu124
  • Datasets 3.0.1
  • Tokenizers 0.20.1
Downloads last month
6
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this model’s pipeline type.

Model tree for mrferr3t/0c6b4c13-e227-46cd-b496-04f63f164db1

Adapter
(305)
this model