See axolotl config
axolotl version: 0.4.1
adapter: lora
auto_find_batch_size: true
base_model: katuni4ka/tiny-random-falcon-40b
bf16: auto
chat_template: llama3
dataloader_num_workers: 12
dataset_prepared_path: null
datasets:
- data_files:
- ec111cc653f30e9a_train_data.json
ds_type: json
format: custom
path: /workspace/input_data/ec111cc653f30e9a_train_data.json
type:
field_input: ''
field_instruction: instruction
field_output: output
format: '{instruction}'
no_input_format: '{instruction}'
system_format: '{system}'
system_prompt: ''
debug: null
deepspeed: null
early_stopping_patience: 3
early_stopping_threshold: 0.0001
eval_max_new_tokens: 128
eval_steps: 1200
eval_strategy: null
flash_attention: true
fp16: null
fsdp: null
fsdp_config: null
gradient_accumulation_steps: 2
gradient_checkpointing: false
group_by_length: false
hub_model_id: mrferr3t/0c6b4c13-e227-46cd-b496-04f63f164db1
hub_repo: null
hub_strategy: checkpoint
hub_token: null
learning_rate: 0.0004
load_in_4bit: false
load_in_8bit: false
local_rank: null
logging_steps: 1200
lora_alpha: 16
lora_dropout: 0.05
lora_fan_in_fan_out: null
lora_model_dir: null
lora_r: 8
lora_target_linear: true
lr_scheduler: cosine
max_steps:
micro_batch_size: 32
mlflow_experiment_name: /tmp/ec111cc653f30e9a_train_data.json
model_type: AutoModelForCausalLM
num_epochs: 100
optimizer: adamw_bnb_8bit
output_dir: miner_id_24
pad_to_sequence_len: true
resume_from_checkpoint: /workspace/hub_repo/last-checkpoint
s2_attention: null
sample_packing: false
save_steps: 1200
saves_per_epoch: 0
sequence_len: 512
special_tokens:
pad_token: <|endoftext|>
strict: false
tf32: false
tokenizer_type: AutoTokenizer
train_on_inputs: false
trust_remote_code: true
val_set_size: 0.05
wandb_entity: null
wandb_mode:
wandb_name: 2732f256-9976-4b7d-b49e-d254f0bbccf7
wandb_project: Gradients-On-Demand
wandb_run: your_name
wandb_runid: 2732f256-9976-4b7d-b49e-d254f0bbccf7
warmup_steps: 100
weight_decay: 0.0
xformers_attention: null
0c6b4c13-e227-46cd-b496-04f63f164db1
This model is a fine-tuned version of katuni4ka/tiny-random-falcon-40b on the None dataset. It achieves the following results on the evaluation set:
- Loss: 10.0345
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0004
- train_batch_size: 32
- eval_batch_size: 32
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 64
- optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 100
- num_epochs: 100
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
No log | 0.0009 | 1 | 11.1313 |
20.5675 | 2.1680 | 2400 | 10.1603 |
20.2989 | 4.3360 | 4800 | 10.1175 |
20.239 | 6.5041 | 7200 | 10.0924 |
20.2067 | 8.6721 | 9600 | 10.0801 |
20.1885 | 10.8401 | 12000 | 10.0730 |
20.1748 | 13.0081 | 14400 | 10.0671 |
20.1654 | 15.1762 | 16800 | 10.0625 |
20.1595 | 17.3442 | 19200 | 10.0591 |
20.1544 | 19.5122 | 21600 | 10.0572 |
20.1493 | 21.6802 | 24000 | 10.0559 |
20.1461 | 23.8482 | 26400 | 10.0547 |
20.1428 | 26.0163 | 28800 | 10.0523 |
20.1405 | 28.1843 | 31200 | 10.0509 |
20.1372 | 30.3523 | 33600 | 10.0501 |
20.1354 | 32.5203 | 36000 | 10.0496 |
20.1335 | 34.6883 | 38400 | 10.0487 |
20.1309 | 36.8564 | 40800 | 10.0478 |
20.1297 | 39.0244 | 43200 | 10.0468 |
20.1295 | 41.1924 | 45600 | 10.0454 |
20.1259 | 43.3604 | 48000 | 10.0444 |
20.1234 | 45.5285 | 50400 | 10.0427 |
20.1211 | 47.6965 | 52800 | 10.0423 |
20.1206 | 49.8645 | 55200 | 10.0420 |
20.1183 | 52.0325 | 57600 | 10.0409 |
20.1177 | 54.2005 | 60000 | 10.0401 |
20.1165 | 56.3686 | 62400 | 10.0394 |
20.1141 | 58.5366 | 64800 | 10.0391 |
20.1138 | 60.7046 | 67200 | 10.0383 |
20.1124 | 62.8726 | 69600 | 10.0375 |
20.1136 | 65.0407 | 72000 | 10.0376 |
20.1108 | 67.2087 | 74400 | 10.0365 |
20.1106 | 69.3767 | 76800 | 10.0366 |
20.1107 | 71.5447 | 79200 | 10.0363 |
20.1099 | 73.7127 | 81600 | 10.0358 |
20.1089 | 75.8808 | 84000 | 10.0355 |
20.1083 | 78.0488 | 86400 | 10.0353 |
20.1081 | 80.2168 | 88800 | 10.0352 |
20.1081 | 82.3848 | 91200 | 10.0351 |
20.1075 | 84.5528 | 93600 | 10.0349 |
20.1065 | 86.7209 | 96000 | 10.0348 |
20.1074 | 88.8889 | 98400 | 10.0346 |
20.1066 | 91.0569 | 100800 | 10.0346 |
20.1056 | 93.2249 | 103200 | 10.0345 |
20.1065 | 95.3930 | 105600 | 10.0346 |
20.1062 | 97.5610 | 108000 | 10.0345 |
20.1061 | 99.7290 | 110400 | 10.0345 |
Framework versions
- PEFT 0.13.2
- Transformers 4.46.0
- Pytorch 2.5.0+cu124
- Datasets 3.0.1
- Tokenizers 0.20.1
- Downloads last month
- 6
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
HF Inference API was unable to determine this model’s pipeline type.
Model tree for mrferr3t/0c6b4c13-e227-46cd-b496-04f63f164db1
Base model
katuni4ka/tiny-random-falcon-40b