--- license: other language: - en library_name: transformers pipeline_tag: text-generation tags: - llama - decapoda-research-7b-hf - prompt answering - peft --- ## Model Card for Model ID This repository contains a LLaMA-7B further fine-tuned model on conversations and question answering prompts. This model is a fine-tuned version of [chainyo/alpaca-lora-7b](https://huggingface.co/chainyo/alpaca-lora-7b) on conversations dataset. ⚠️ **I used [LLaMA-7b-hf](https://huggingface.co/decapoda-research/llama-7b-hf) as a base model, so this model is for Research purpose only (See the [license](https://huggingface.co/decapoda-research/llama-7b-hf/blob/main/LICENSE))** ## Model Details ### Model Description The decapoda-research/llama-7b-hf model was finetuned on conversations and question answering prompts. **Developed by:** [More Information Needed] **Shared by:** [More Information Needed] **Model type:** Causal LM **Language(s) (NLP):** English, multilingual **License:** Research **Finetuned from model:** decapoda-research/llama-7b-hf ## Model Sources [optional] **Repository:** [More Information Needed] **Paper:** [More Information Needed] **Demo:** [More Information Needed] ## Uses The model can be used for prompt answering ### Direct Use The model can be used for prompt answering ### Downstream Use Generating text and prompt answering ## Recommendations Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. # Usage ## Creating prompt The model was trained on the following kind of prompt: ```python def generate_prompt(instruction: str, input_ctxt: str = None) -> str: if input_ctxt: return f"""Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request. ### Instruction: {instruction} ### Input: {input_ctxt} ### Response:""" else: return f"""Below is an instruction that describes a task. Write a response that appropriately completes the request. ### Instruction: {instruction} ### Response:""" ``` ## How to Get Started with the Model Use the code below to get started with the model. ```python import torch from transformers import GenerationConfig, LlamaTokenizer, LlamaForCausalLM tokenizer = LlamaTokenizer.from_pretrained("Sandiago21/llama-7b-hf-prompt-answering") model = LlamaForCausalLM.from_pretrained( "Sandiago21/llama-7b-hf-prompt-answering", load_in_8bit=True, torch_dtype=torch.float16, device_map="auto", ) generation_config = GenerationConfig( temperature=0.2, top_p=0.75, top_k=40, num_beams=4, max_new_tokens=128, ) model.eval() if torch.__version__ >= "2": model = torch.compile(model) ``` ### Example of Usage ```python instruction = "What is the capital city of Greece and with which countries does Greece border?" input_ctxt = None # For some tasks, you can provide an input context to help the model generate a better response. prompt = generate_prompt(instruction, input_ctxt) input_ids = tokenizer(prompt, return_tensors="pt").input_ids input_ids = input_ids.to(model.device) with torch.no_grad(): outputs = model.generate( input_ids=input_ids, generation_config=generation_config, return_dict_in_generate=True, output_scores=True, ) response = tokenizer.decode(outputs.sequences[0], skip_special_tokens=True) print(response) >>> The capital city of Greece is Athens and it borders Albania, Macedonia, Bulgaria and Turkey. ``` ## Training Details ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 2e-05 - train_batch_size: 4 - eval_batch_size: 8 - seed: 42 - gradient_accumulation_steps: 2 - total_train_batch_size: 8 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - lr_scheduler_warmup_steps: 50 - num_epochs: 2 - mixed_precision_training: Native AMP ### Framework versions - Transformers 4.28.1 - Pytorch 2.0.0+cu117 - Datasets 2.12.0 - Tokenizers 0.12.1 ### Training Data The decapoda-research/llama-7b-hf was finetuned on conversations and question answering data ### Training Procedure The decapoda-research/llama-7b-hf model was further trained and finetuned on question answering and prompts data for 1 epoch (approximately 10 hours of training on a single GPU) ## Model Architecture and Objective The model is based on decapoda-research/llama-7b-hf model and finetuned adapters on top of the main model on conversations and question answering data.