See axolotl config

axolotl version: 0.4.1

adapter: lora
auto_find_batch_size: true
base_model: katuni4ka/tiny-random-falcon-40b
bf16: auto
chat_template: llama3
dataloader_num_workers: 12
dataset_prepared_path: null
datasets:
- data_files:
  - ec111cc653f30e9a_train_data.json
  ds_type: json
  format: custom
  path: /workspace/input_data/ec111cc653f30e9a_train_data.json
  type:
    field_input: ''
    field_instruction: instruction
    field_output: output
    format: '{instruction}'
    no_input_format: '{instruction}'
    system_format: '{system}'
    system_prompt: ''
debug: null
deepspeed: null
early_stopping_patience: 3
early_stopping_threshold: 0.0001
eval_max_new_tokens: 128
eval_steps: 1200
eval_strategy: null
flash_attention: true
fp16: null
fsdp: null
fsdp_config: null
gradient_accumulation_steps: 2
gradient_checkpointing: false
group_by_length: false
hub_model_id: mrferr3t/0c6b4c13-e227-46cd-b496-04f63f164db1
hub_repo: null
hub_strategy: checkpoint
hub_token: null
learning_rate: 0.0004
load_in_4bit: false
load_in_8bit: false
local_rank: null
logging_steps: 1200
lora_alpha: 16
lora_dropout: 0.05
lora_fan_in_fan_out: null
lora_model_dir: null
lora_r: 8
lora_target_linear: true
lr_scheduler: cosine
max_steps: 
micro_batch_size: 32
mlflow_experiment_name: /tmp/ec111cc653f30e9a_train_data.json
model_type: AutoModelForCausalLM
num_epochs: 100
optimizer: adamw_bnb_8bit
output_dir: miner_id_24
pad_to_sequence_len: true
resume_from_checkpoint: /workspace/hub_repo/last-checkpoint
s2_attention: null
sample_packing: false
save_steps: 1200
saves_per_epoch: 0
sequence_len: 512
special_tokens:
  pad_token: <|endoftext|>
strict: false
tf32: false
tokenizer_type: AutoTokenizer
train_on_inputs: false
trust_remote_code: true
val_set_size: 0.05
wandb_entity: null
wandb_mode: 
wandb_name: 2732f256-9976-4b7d-b49e-d254f0bbccf7
wandb_project: Gradients-On-Demand
wandb_run: your_name
wandb_runid: 2732f256-9976-4b7d-b49e-d254f0bbccf7
warmup_steps: 100
weight_decay: 0.0
xformers_attention: null

0c6b4c13-e227-46cd-b496-04f63f164db1

This model is a fine-tuned version of katuni4ka/tiny-random-falcon-40b on the None dataset. It achieves the following results on the evaluation set:

Loss: 10.0345

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0004
train_batch_size: 32
eval_batch_size: 32
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 64
optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 100
num_epochs: 100

Training results

Training Loss	Epoch	Step	Validation Loss
No log	0.0009	1	11.1313
20.5675	2.1680	2400	10.1603
20.2989	4.3360	4800	10.1175
20.239	6.5041	7200	10.0924
20.2067	8.6721	9600	10.0801
20.1885	10.8401	12000	10.0730
20.1748	13.0081	14400	10.0671
20.1654	15.1762	16800	10.0625
20.1595	17.3442	19200	10.0591
20.1544	19.5122	21600	10.0572
20.1493	21.6802	24000	10.0559
20.1461	23.8482	26400	10.0547
20.1428	26.0163	28800	10.0523
20.1405	28.1843	31200	10.0509
20.1372	30.3523	33600	10.0501
20.1354	32.5203	36000	10.0496
20.1335	34.6883	38400	10.0487
20.1309	36.8564	40800	10.0478
20.1297	39.0244	43200	10.0468
20.1295	41.1924	45600	10.0454
20.1259	43.3604	48000	10.0444
20.1234	45.5285	50400	10.0427
20.1211	47.6965	52800	10.0423
20.1206	49.8645	55200	10.0420
20.1183	52.0325	57600	10.0409
20.1177	54.2005	60000	10.0401
20.1165	56.3686	62400	10.0394
20.1141	58.5366	64800	10.0391
20.1138	60.7046	67200	10.0383
20.1124	62.8726	69600	10.0375
20.1136	65.0407	72000	10.0376
20.1108	67.2087	74400	10.0365
20.1106	69.3767	76800	10.0366
20.1107	71.5447	79200	10.0363
20.1099	73.7127	81600	10.0358
20.1089	75.8808	84000	10.0355
20.1083	78.0488	86400	10.0353
20.1081	80.2168	88800	10.0352
20.1081	82.3848	91200	10.0351
20.1075	84.5528	93600	10.0349
20.1065	86.7209	96000	10.0348
20.1074	88.8889	98400	10.0346
20.1066	91.0569	100800	10.0346
20.1056	93.2249	103200	10.0345
20.1065	95.3930	105600	10.0346
20.1062	97.5610	108000	10.0345
20.1061	99.7290	110400	10.0345

Framework versions

PEFT 0.13.2
Transformers 4.46.0
Pytorch 2.5.0+cu124
Datasets 3.0.1
Tokenizers 0.20.1

mrferr3t
/

0c6b4c13-e227-46cd-b496-04f63f164db1

0c6b4c13-e227-46cd-b496-04f63f164db1

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for mrferr3t/0c6b4c13-e227-46cd-b496-04f63f164db1

Evaluation results