pygemma-2b-ultra / README.md
Menouar's picture
Update README.md
50a336c verified
metadata
license: other
tags:
  - generated_from_trainer
  - google/gemma
  - PyTorch
  - transformers
  - trl
  - peft
  - tensorboard
model-index:
  - name: pygemma-2b-ultra
    results: []
datasets:
  - Vezora/Tested-143k-Python-Alpaca
language:
  - en
license_name: gemma-terms-of-use
license_link: https://ai.google.dev/gemma/terms
base_model: google/gemma-2b
widget:
  - example_title: Compute Sum
    messages:
      - role: system
        content: >
          Welcome to PyGemma, your AI-powered Python assistant. I'm here to help
          you answer common questions about the Python programming language.
          Let's dive into Python!

          Please note: - I strive to provide accurate and reliable information,
          but I'm not perfect. If you notice any inaccuracies, please let me
          know. - I'm committed to maintaining a safe and respectful
          environment. Please avoid using toxic or harmful language. - I'm
          continuously learning and improving. If you notice any inconsistencies
          in my responses, I appreciate your patience and feedback.
      - role: user
        content: Create a function to calculate the sum of a sequence of integers.
inference:
  parameters:
    temperature: 0.1
    max_new_tokens: 1024
    top_p: 0.95
    top_k: 50
pipeline_tag: text-generation

Model Card for pygemma-2b-ultra:

🐍💬🤖

pygemma-2b-ultra is a language model that is trained to act as Python assistant. It is a finetuned version of google/gemma-2b that was trained using SFTTrainer on publicly available dataset Vezora/Tested-143k-Python-Alpaca.

Training Metrics

The training metrics can be found on TensorBoard.

Training hyperparameters

The following hyperparameters were used during the training:

  • output_dir: peft-lora-model

  • overwrite_output_dir: True

  • do_train: False

  • do_eval: False

  • do_predict: False

  • evaluation_strategy: no

  • prediction_loss_only: False

  • per_device_train_batch_size: 2

  • per_device_eval_batch_size: None

  • per_gpu_train_batch_size: None

  • per_gpu_eval_batch_size: None

  • gradient_accumulation_steps: 4

  • eval_accumulation_steps: None

  • eval_delay: 0

  • learning_rate: 2e-05

  • weight_decay: 0.0

  • adam_beta1: 0.9

  • adam_beta2: 0.999

  • adam_epsilon: 1e-08

  • max_grad_norm: 0.3

  • num_train_epochs: 1

  • max_steps: -1

  • lr_scheduler_type: cosine

  • lr_scheduler_kwargs: {}

  • warmup_ratio: 0.1

  • warmup_steps: 0

  • log_level: passive

  • log_level_replica: warning

  • log_on_each_node: True

  • logging_dir: peft-lora-model/runs/Mar22_07-15-06_2013f91d074a

  • logging_strategy: steps

  • logging_first_step: False

  • logging_steps: 10

  • logging_nan_inf_filter: True

  • save_strategy: epoch

  • save_steps: 500

  • save_total_limit: None

  • save_safetensors: True

  • save_on_each_node: False

  • save_only_model: False

  • no_cuda: False

  • use_cpu: False

  • use_mps_device: False

  • seed: 42

  • data_seed: None

  • jit_mode_eval: False

  • use_ipex: False

  • bf16: True

  • fp16: False

  • fp16_opt_level: O1

  • half_precision_backend: auto

  • bf16_full_eval: False

  • fp16_full_eval: False

  • tf32: None

  • local_rank: 0

  • ddp_backend: None

  • tpu_num_cores: None

  • tpu_metrics_debug: False

  • debug: []

  • dataloader_drop_last: False

  • eval_steps: None

  • dataloader_num_workers: 0

  • dataloader_prefetch_factor: None

  • past_index: -1

  • run_name: peft-lora-model

  • disable_tqdm: False

  • remove_unused_columns: True

  • label_names: None

  • load_best_model_at_end: False

  • metric_for_best_model: None

  • greater_is_better: None

  • ignore_data_skip: False

  • fsdp: []

  • fsdp_min_num_params: 0

  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}

  • fsdp_transformer_layer_cls_to_wrap: None

  • accelerator_config: AcceleratorConfig(split_batches=False, dispatch_batches=None, even_batches=True, use_seedable_sampler=True)

  • deepspeed: None

  • label_smoothing_factor: 0.0

  • optim: adamw_torch_fused

  • optim_args: None

  • adafactor: False

  • group_by_length: False

  • length_column_name: length

  • report_to: ['tensorboard']

  • ddp_find_unused_parameters: None

  • ddp_bucket_cap_mb: None

  • ddp_broadcast_buffers: None

  • dataloader_pin_memory: True

  • dataloader_persistent_workers: False

  • skip_memory_metrics: True

  • use_legacy_prediction_loop: False

  • push_to_hub: False

  • resume_from_checkpoint: None

  • hub_model_id: None

  • hub_strategy: every_save

  • hub_token: None

  • hub_private_repo: False

  • hub_always_push: False

  • gradient_checkpointing: True

  • gradient_checkpointing_kwargs: {'use_reentrant': False}

  • include_inputs_for_metrics: False

  • fp16_backend: auto

  • push_to_hub_model_id: None

  • push_to_hub_organization: None

  • push_to_hub_token: None

  • mp_parameters:

  • auto_find_batch_size: False

  • full_determinism: False

  • torchdynamo: None

  • ray_scope: last

  • ddp_timeout: 1800

  • torch_compile: False

  • torch_compile_backend: None

  • torch_compile_mode: None

  • dispatch_batches: None

  • split_batches: None

  • include_tokens_per_second: False

  • include_num_input_tokens_seen: False

  • neftune_noise_alpha: None

  • distributed_state: Distributed environment: NO Num processes: 1 Process index: 0 Local process index: 0 Device: cuda

  • _n_gpu: 1

  • __cached__setup_devices: cuda:0

  • deepspeed_plugin: None