--- license: other language: - en library_name: transformers pipeline_tag: conversational --- ## Model Card for Model ID Finetuned depacoda-research/llamma-13b-hf on conversations ## Model Details ### Model Description The depacoda-research/llamma-13b-hf model was finetuned on conversations and question answering prompts **Developed by:** [More Information Needed] **Shared by:** [More Information Needed] **Model type:** Causal LM **Language(s) (NLP):** English, multilingual **License:** Research **Finetuned from model:** depacoda-research/llamma-13b-hf ## Model Sources [optional] **Repository:** [More Information Needed] **Paper:** [More Information Needed] **Demo:** [More Information Needed] ## Uses The model can be used for prompt answering ### Direct Use The model can be used for prompt answering ### Downstream Use Generating text and prompt answering ## Recommendations Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. ## How to Get Started with the Model Use the code below to get started with the model. ``` from transformers import LlamaTokenizer, LlamaForCausalLM from peft import PeftModel MODEL_NAME = "decapoda-research/llama-13b-hf" tokenizer = LlamaTokenizer.from_pretrained(MODEL_NAME, add_eos_token=True) tokenizer.pad_token_id = 0 model = LlamaForCausalLM.from_pretrained(MODEL_NAME, load_in_8bit=True, device_map="auto") model = PeftModel.from_pretrained(model, "Sandiago21/public-ai-model") ``` ### Example of Usage ``` from transformers import GenerationConfig PROMPT = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\nWhich is the capital city of Greece and with which countries does Greece border?\n\n### Input:\nQuestion answering\n\n### Response:\n""" DEVICE = "cuda" inputs = tokenizer( PROMPT, return_tensors="pt", ) input_ids = inputs["input_ids"].to(DEVICE) generation_config = GenerationConfig( temperature=0.1, top_p=0.95, repetition_penalty=1.2, ) print("Generating Response ... ") generation_output = model.generate( input_ids=input_ids, generation_config=generation_config, return_dict_in_generate=True, output_scores=True, max_new_tokens=256, ) for s in generation_output.sequences: print(tokenizer.decode(s)) ``` ## Training Details ### Training Data The decapoda-research/llama-13b-hf was finetuned on conversations and question answering data ### Training Procedure The decapoda-research/llama-13b-hf model was further trained and finetuned on question answering and prompts data ## Model Architecture and Objective The model is based on decapoda-research/llama-13b-hf model and finetuned adapters on top of the main model on conversations and question answering data.