|
|
|
--- |
|
license: apache-2.0 |
|
tags: |
|
- generated_from_trainer |
|
- google/gemma |
|
- PyTorch |
|
- transformers |
|
- trl |
|
- peft |
|
- tensorboard |
|
base_model: google/gemma-2b |
|
widget: |
|
- example_title: Pirate! |
|
messages: |
|
- role: system |
|
content: You are a pirate chatbot who always responds with Arr! |
|
- role: user |
|
content: "There's a llama on my lawn, how can I get rid of him?" |
|
output: |
|
text: >- |
|
Arr! 'Tis a puzzlin' matter, me hearty! A llama on yer lawn be a rare |
|
sight, but I've got a plan that might help ye get rid of 'im. Ye'll need |
|
to gather some carrots and hay, and then lure the llama away with the |
|
promise of a tasty treat. Once he's gone, ye can clean up yer lawn and |
|
enjoy the peace and quiet once again. But beware, me hearty, for there |
|
may be more llamas where that one came from! Arr! |
|
model-index: |
|
- name: gemma-2b-chat |
|
results: [] |
|
datasets: |
|
- HuggingFaceH4/deita-10k-v0-sft |
|
language: |
|
- en |
|
pipeline_tag: text-generation |
|
--- |
|
|
|
# Model Card for gemma-2b-chat: |
|
|
|
**gemma-2b-chat** is a language model that is trained to act as helpful assistant. It is a finetuned version of [google/gemma-2b](https://huggingface.co/google/gemma-2b) that was trained using `SFTTrainer` on publicly available dataset [ |
|
HuggingFaceH4/deita-10k-v0-sft](https://huggingface.co/datasets/HuggingFaceH4/deita-10k-v0-sft). |
|
|
|
## Training Procedure: |
|
|
|
The training code used to create this model was generated by [Menouar/LLM-FineTuning-Notebook-Generator](https://huggingface.co/spaces/Menouar/LLM-FineTuning-Notebook-Generator). |
|
|
|
|
|
|
|
## Training hyperparameters |
|
|
|
The following hyperparameters were used during the training: |
|
|
|
|
|
- output_dir: temp_gemma-2b-chat |
|
|
|
- overwrite_output_dir: True |
|
|
|
- do_train: False |
|
|
|
- do_eval: False |
|
|
|
- do_predict: False |
|
|
|
- evaluation_strategy: no |
|
|
|
- prediction_loss_only: False |
|
|
|
- per_device_train_batch_size: 3 |
|
|
|
- per_device_eval_batch_size: 8 |
|
|
|
- per_gpu_train_batch_size: None |
|
|
|
- per_gpu_eval_batch_size: None |
|
|
|
- gradient_accumulation_steps: 2 |
|
|
|
- eval_accumulation_steps: None |
|
|
|
- eval_delay: 0 |
|
|
|
- learning_rate: 2e-05 |
|
|
|
- weight_decay: 0.0 |
|
|
|
- adam_beta1: 0.9 |
|
|
|
- adam_beta2: 0.999 |
|
|
|
- adam_epsilon: 1e-08 |
|
|
|
- max_grad_norm: 0.3 |
|
|
|
- num_train_epochs: 1 |
|
|
|
- max_steps: -1 |
|
|
|
- lr_scheduler_type: cosine |
|
|
|
- lr_scheduler_kwargs: {} |
|
|
|
- warmup_ratio: 0.1 |
|
|
|
- warmup_steps: 0 |
|
|
|
- log_level: passive |
|
|
|
- log_level_replica: warning |
|
|
|
- log_on_each_node: True |
|
|
|
- logging_dir: temp_gemma-2b-chat/runs/Mar11_17-14-25_f4965e0005f4 |
|
|
|
- logging_strategy: steps |
|
|
|
- logging_first_step: False |
|
|
|
- logging_steps: 10 |
|
|
|
- logging_nan_inf_filter: True |
|
|
|
- save_strategy: epoch |
|
|
|
- save_steps: 500 |
|
|
|
- save_total_limit: None |
|
|
|
- save_safetensors: True |
|
|
|
- save_on_each_node: False |
|
|
|
- save_only_model: False |
|
|
|
- no_cuda: False |
|
|
|
- use_cpu: False |
|
|
|
- use_mps_device: False |
|
|
|
- seed: 42 |
|
|
|
- data_seed: None |
|
|
|
- jit_mode_eval: False |
|
|
|
- use_ipex: False |
|
|
|
- bf16: True |
|
|
|
- fp16: False |
|
|
|
- fp16_opt_level: O1 |
|
|
|
- half_precision_backend: auto |
|
|
|
- bf16_full_eval: False |
|
|
|
- fp16_full_eval: False |
|
|
|
- tf32: None |
|
|
|
- local_rank: 0 |
|
|
|
- ddp_backend: None |
|
|
|
- tpu_num_cores: None |
|
|
|
- tpu_metrics_debug: False |
|
|
|
- debug: [] |
|
|
|
- dataloader_drop_last: False |
|
|
|
- eval_steps: None |
|
|
|
- dataloader_num_workers: 0 |
|
|
|
- dataloader_prefetch_factor: None |
|
|
|
- past_index: -1 |
|
|
|
- run_name: temp_gemma-2b-chat |
|
|
|
- disable_tqdm: False |
|
|
|
- remove_unused_columns: True |
|
|
|
- label_names: None |
|
|
|
- load_best_model_at_end: False |
|
|
|
- metric_for_best_model: None |
|
|
|
- greater_is_better: None |
|
|
|
- ignore_data_skip: False |
|
|
|
- fsdp: [] |
|
|
|
- fsdp_min_num_params: 0 |
|
|
|
- fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False} |
|
|
|
- fsdp_transformer_layer_cls_to_wrap: None |
|
|
|
- accelerator_config: AcceleratorConfig(split_batches=False, dispatch_batches=None, even_batches=True, use_seedable_sampler=True) |
|
|
|
- deepspeed: None |
|
|
|
- label_smoothing_factor: 0.0 |
|
|
|
- optim: adamw_torch_fused |
|
|
|
- optim_args: None |
|
|
|
- adafactor: False |
|
|
|
- group_by_length: False |
|
|
|
- length_column_name: length |
|
|
|
- report_to: ['tensorboard'] |
|
|
|
- ddp_find_unused_parameters: None |
|
|
|
- ddp_bucket_cap_mb: None |
|
|
|
- ddp_broadcast_buffers: None |
|
|
|
- dataloader_pin_memory: True |
|
|
|
- dataloader_persistent_workers: False |
|
|
|
- skip_memory_metrics: True |
|
|
|
- use_legacy_prediction_loop: False |
|
|
|
- push_to_hub: False |
|
|
|
- resume_from_checkpoint: None |
|
|
|
- hub_model_id: None |
|
|
|
- hub_strategy: every_save |
|
|
|
- hub_token: None |
|
|
|
- hub_private_repo: False |
|
|
|
- hub_always_push: False |
|
|
|
- gradient_checkpointing: True |
|
|
|
- gradient_checkpointing_kwargs: {'use_reentrant': False} |
|
|
|
- include_inputs_for_metrics: False |
|
|
|
- fp16_backend: auto |
|
|
|
- push_to_hub_model_id: None |
|
|
|
- push_to_hub_organization: None |
|
|
|
- push_to_hub_token: None |
|
|
|
- mp_parameters: |
|
|
|
- auto_find_batch_size: False |
|
|
|
- full_determinism: False |
|
|
|
- torchdynamo: None |
|
|
|
- ray_scope: last |
|
|
|
- ddp_timeout: 1800 |
|
|
|
- torch_compile: False |
|
|
|
- torch_compile_backend: None |
|
|
|
- torch_compile_mode: None |
|
|
|
- dispatch_batches: None |
|
|
|
- split_batches: None |
|
|
|
- include_tokens_per_second: False |
|
|
|
- include_num_input_tokens_seen: False |
|
|
|
- neftune_noise_alpha: None |
|
|
|
- distributed_state: Distributed environment: NO |
|
Num processes: 1 |
|
Process index: 0 |
|
Local process index: 0 |
|
Device: cuda |
|
|
|
|
|
- _n_gpu: 1 |
|
|
|
- __cached__setup_devices: cuda:0 |
|
|
|
- deepspeed_plugin: None |
|
|
|
|