See axolotl config

axolotl version: 0.4.1

adapter: lora
base_model: unsloth/gemma-2b-it
batch_size: 8
bf16: true
chat_template: tokenizer_default_fallback_alpaca
datasets:
- data_files:
  - 88cfea977fe74782_train_data.json
  ds_type: json
  format: custom
  path: /workspace/input_data/88cfea977fe74782_train_data.json
  type:
    field_instruction: smiles
    field_output: molt5
    format: '{instruction}'
    no_input_format: '{instruction}'
    system_format: '{system}'
    system_prompt: ''
evals_per_epoch: 1
flash_attention: true
gpu_memory_limit: 80GiB
gradient_checkpointing: true
group_by_length: true
hub_model_id: willtensora/f0d6caa9-89a9-4666-9a6d-c8cda2015281
hub_strategy: checkpoint
learning_rate: 0.0002
logging_steps: 10
lora_alpha: 256
lora_dropout: 0.1
lora_r: 128
lora_target_linear: true
lr_scheduler: cosine
micro_batch_size: 1
model_type: AutoModelForCausalLM
num_epochs: 100
optimizer: adamw_bnb_8bit
output_dir: miner_id_24
pad_to_sequence_len: true
resize_token_embeddings_to_32x: false
sample_packing: false
saves_per_epoch: 2
sequence_len: 2048
tokenizer_type: GemmaTokenizerFast
train_on_inputs: false
trust_remote_code: true
val_set_size: 0.1
wandb_entity: ''
wandb_mode: online
wandb_project: Gradients-On-Demand
wandb_run: your_name
wandb_runid: default
warmup_ratio: 0.05
xformers_attention: true

f0d6caa9-89a9-4666-9a6d-c8cda2015281

This model is a fine-tuned version of unsloth/gemma-2b-it on the None dataset. It achieves the following results on the evaluation set:

Loss: 1.9427

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0002
train_batch_size: 1
eval_batch_size: 1
seed: 42
distributed_type: multi-GPU
num_devices: 8
total_train_batch_size: 8
total_eval_batch_size: 8
optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 12
num_epochs: 100

Training results

Training Loss	Epoch	Step	Validation Loss
No log	0.05	1	3.5172
1.0375	1.0	20	0.9186
0.5932	2.0	40	0.9190
0.4433	3.0	60	0.9756
0.3115	4.0	80	0.9780
0.2432	5.0	100	1.0348
0.219	6.0	120	1.1386
0.1868	7.0	140	1.0399
0.1624	8.0	160	1.2174
0.2109	9.0	180	1.1489
0.1223	10.0	200	1.2047
0.1149	11.0	220	1.2123
0.114	12.0	240	1.2854
0.0914	13.0	260	1.3633
0.0823	14.0	280	1.2355
0.0901	15.0	300	1.2453
0.093	16.0	320	1.3146
0.077	17.0	340	1.4159
0.0797	18.0	360	1.3376
0.0839	19.0	380	1.4419
0.0506	20.0	400	1.3841
0.0582	21.0	420	1.3847
0.0644	22.0	440	1.3697
0.0524	23.0	460	1.4068
0.0602	24.0	480	1.3840
0.0597	25.0	500	1.4276
0.0371	26.0	520	1.5041
0.0448	27.0	540	1.4607
0.0494	28.0	560	1.4608
0.042	29.0	580	1.5975
0.0334	30.0	600	1.4700
0.0403	31.0	620	1.5470
0.043	32.0	640	1.5968
0.0349	33.0	660	1.5662
0.0412	34.0	680	1.6331
0.0263	35.0	700	1.6191
0.0249	36.0	720	1.6646
0.0365	37.0	740	1.4995
0.0176	38.0	760	1.7255
0.0426	39.0	780	1.5561
0.0174	40.0	800	1.6246
0.0259	41.0	820	1.7055
0.0182	42.0	840	1.6314
0.013	43.0	860	1.5924
0.0194	44.0	880	1.7000
0.0194	45.0	900	1.6371
0.0171	46.0	920	1.7760
0.0094	47.0	940	1.7117
0.0061	48.0	960	1.7486
0.004	49.0	980	1.7964
0.003	50.0	1000	1.8029
0.0047	51.0	1020	1.7653
0.0033	52.0	1040	1.7602
0.0028	53.0	1060	1.7846
0.0091	54.0	1080	1.7363
0.0009	55.0	1100	1.7427
0.0005	56.0	1120	1.7763
0.0003	57.0	1140	1.8004
0.0004	58.0	1160	1.8191
0.0004	59.0	1180	1.8343
0.0004	60.0	1200	1.8433
0.0002	61.0	1220	1.8534
0.0003	62.0	1240	1.8619
0.0003	63.0	1260	1.8702
0.0002	64.0	1280	1.8774
0.0002	65.0	1300	1.8829
0.0003	66.0	1320	1.8894
0.0003	67.0	1340	1.8937
0.0001	68.0	1360	1.8985
0.0001	69.0	1380	1.9014
0.0003	70.0	1400	1.9057
0.0	71.0	1420	1.9103
0.0001	72.0	1440	1.9126
0.0003	73.0	1460	1.9165
0.0002	74.0	1480	1.9191
0.0002	75.0	1500	1.9210
0.0003	76.0	1520	1.9238
0.0001	77.0	1540	1.9273
0.0002	78.0	1560	1.9279
0.0002	79.0	1580	1.9301
0.0002	80.0	1600	1.9313
0.0003	81.0	1620	1.9321
0.0001	82.0	1640	1.9346
0.0	83.0	1660	1.9355
0.0004	84.0	1680	1.9356
0.0	85.0	1700	1.9385
0.0003	86.0	1720	1.9385
0.0001	87.0	1740	1.9396
0.0002	88.0	1760	1.9398
0.0001	89.0	1780	1.9407
0.0001	90.0	1800	1.9418
0.0002	91.0	1820	1.9418
0.0002	92.0	1840	1.9414
0.0003	93.0	1860	1.9418
0.0	94.0	1880	1.9427
0.0002	95.0	1900	1.9436
0.0003	96.0	1920	1.9425
0.0002	97.0	1940	1.9429
0.0003	98.0	1960	1.9430
0.0001	99.0	1980	1.9433
0.0002	100.0	2000	1.9427

Framework versions

PEFT 0.13.2
Transformers 4.46.0
Pytorch 2.5.0+cu124
Datasets 3.0.1
Tokenizers 0.20.1

willtensora
/

f0d6caa9-89a9-4666-9a6d-c8cda2015281

f0d6caa9-89a9-4666-9a6d-c8cda2015281

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for willtensora/f0d6caa9-89a9-4666-9a6d-c8cda2015281

Evaluation results