--- inference: false license: mit base_model: microsoft/phi-2 tags: - axolotl - generated_from_trainer model-index: - name: Phasmid-2_v2 results: [] datasets: - PygmalionAI/PIPPA - HuggingFaceH4/no_robots --- ``` _ (`-. ('-. .-. ('-. .-') _ .-') _ .-') _ ( (OO )( OO ) / ( OO ).-. ( OO ).( '.( OO )_ ( ( OO) ) _.` \,--. ,--. / . --. /(_)---\_),--. ,--.) ,-.-') \ .'_ (__...--''| | | | | \-. \ / _ | | `.' | | |OO),`'--..._) | / | || .| |.-'-' | |\ :` `. | | | | \| | \ ' | |_.' || | \| |_.' | '..`''.)| |'.'| | | |(_/| | ' | | .___.'| .-. | | .-. |.-._) \| | | | ,| |_.'| | / : | | | | | | | | | |\ /| | | |(_| | | '--' / `--' `--' `--' `--' `--' `-----' `--' `--' `--' `-------' ``` [Built with Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl)
See axolotl config axolotl version: `0.3.0` ```yaml base_model: microsoft/phi-2 model_type: PhiForCausalLM tokenizer_type: AutoTokenizer is_llama_derived_model: false trust_remote_code: true load_in_8bit: false load_in_4bit: false strict: false datasets: - path: SE6446/SE6446_phasmid_ds type: completion hub_model_id: SE6446/Phasmid-2_v2 hub_strategy: every_save use_auth_token: true dataset_prepared_path: /phasmid-2-ds-path val_set_size: 0.05 output_dir: ./phasmid-sft-out sequence_len: 2048 sample_packing: true pad_to_sequence_len: adapter: lora_model_dir: lora_r: lora_alpha: lora_dropout: lora_target_linear: lora_fan_in_fan_out: wandb_project: wandb_entity: wandb_watch: wandb_name: wandb_log_model: gradient_accumulation_steps: 1 micro_batch_size: 1 num_epochs: 4 optimizer: adamw_torch adam_beta2: 0.95 adam_epsilon: 0.00001 max_grad_norm: 1.0 lr_scheduler: cosine learning_rate: 0.0003 train_on_inputs: false group_by_length: true bf16: true fp16: false tf32: true gradient_checkpointing: early_stopping_patience: resume_from_checkpoint: local_rank: logging_steps: 1 xformers_attention: flash_attention: warmup_steps: 100 evals_per_epoch: 4 saves_per_epoch: 1 debug: deepspeed: weight_decay: 0.1 fsdp: fsdp_config: resize_token_embeddings_to_32x: true special_tokens: bos_token: "<|endoftext|>" eos_token: "<|endoftext|>" unk_token: "<|endoftext|>" pad_token: "<|endoftext|>" ```

# Phasmid-2_v2 This model is a fine-tuned version of [microsoft/phi-2](https://huggingface.co/microsoft/phi-2) on a mix of no_robots and the PIPPA dataset. It achieves the following results on the evaluation set: - Loss: 2.2924 ## Model description Phasmid-2 has been trained on intructional data and thus can perform far better at instruction following than phi-2. However I have not extensively tested the model. ## Intended uses & limitations This model is little more than a side project and I shall treat it as such. Phasmid-2 (due to it's size), can still suffer from problematic hallucinations and poor information. No effort was made to reduce potentially toxic responses, as such you should train this model further if you require it to do so. ## Inference Ensure that eniops is installed ``` pip install einops ``` Phi doesn't like device_map = auto, therefore you should specify as like the following: 1. FP16 / Flash-Attention / CUDA: ```python model = AutoModelForCausalLM.from_pretrained("SE6446/Phasmid-2", torch_dtype="auto", flash_attn=True, flash_rotary=True, fused_dense=True, device_map="cuda", trust_remote_code=True) ``` 2. FP16 / CUDA: ```python model = AutoModelForCausalLM.from_pretrained("SE6446/Phasmid-2", torch_dtype="auto", device_map="cuda", trust_remote_code=True) ``` 3. FP32 / CUDA: ```python model = AutoModelForCausalLM.from_pretrained("SE6446/Phasmid-2", torch_dtype=torch.float32, device_map="cuda", trust_remote_code=True) ``` 4. FP32 / CPU: ```python model = AutoModelForCausalLM.from_pretrained("SE6446/Phasmid-2", torch_dtype=torch.float32, device_map="cpu", trust_remote_code=True) ``` And then use the following snippet ```python tokenizer = AutoTokenizer.from_pretrained("SE6446/Phasmid-1_5-V0_1", trust_remote_code=True, torch_dtype="auto") inputs = tokenizer('''SYSTEM: You are a helpful assistant. Please answer truthfully and politely. {custom_prompt}\n USER: {{userinput}}\n ASSISTANT: {{character name if applicable}}:''', return_tensors="pt", return_attention_mask=False) outputs = model.generate(**inputs, max_length=200) text = tokenizer.batch_decode(outputs)[0] print(text) ``` it should generate after "ASSISTANT:". ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 0.0003 - train_batch_size: 1 - eval_batch_size: 1 - seed: 42 - optimizer: Adam with betas=(0.9,0.95) and epsilon=1e-05 - lr_scheduler_type: cosine - lr_scheduler_warmup_steps: 100 - num_epochs: 4 ### Training results | Training Loss | Epoch | Step | Validation Loss | |:-------------:|:-----:|:-----:|:---------------:| | 2.3313 | 0.0 | 1 | 2.1374 | | 2.5755 | 0.25 | 1319 | 2.5281 | | 2.4864 | 0.5 | 2638 | 2.5314 | | 2.0961 | 0.75 | 3957 | 2.4697 | | 2.6547 | 1.0 | 5276 | 2.4213 | | 2.1235 | 1.24 | 6595 | 2.3926 | | 1.8875 | 1.49 | 7914 | 2.3233 | | 0.9059 | 1.74 | 9233 | 2.2590 | | 2.2046 | 1.99 | 10552 | 2.1985 | | 1.1938 | 2.23 | 11871 | 2.2555 | | 1.1425 | 2.48 | 13190 | 2.2393 | | 0.6688 | 2.73 | 14509 | 2.2237 | | 1.1111 | 2.98 | 15828 | 2.2126 | | 0.651 | 3.21 | 17147 | 2.2859 | | 0.8669 | 3.46 | 18466 | 2.2914 | | 0.4149 | 3.71 | 19785 | 2.2924 | ### Framework versions - Transformers 4.37.0.dev0 - Pytorch 2.0.1+cu118 - Datasets 2.16.1 - Tokenizers 0.15.0