---
license: other
language:
- en
library_name: transformers
pipeline_tag: conversational
---

## Model Card for Model ID

Finetuned depacoda-research/llamma-13b-hf on conversations


## Model Details


### Model Description

The depacoda-research/llamma-13b-hf model was finetuned on conversations and question answering prompts

**Developed by:** [More Information Needed]

**Shared by:** [More Information Needed]

**Model type:** Causal LM

**Language(s) (NLP):** English, multilingual

**License:** Research

**Finetuned from model:** depacoda-research/llamma-13b-hf


## Model Sources [optional]

**Repository:** [More Information Needed]
**Paper:** [More Information Needed]
**Demo:** [More Information Needed]

## Uses

The model can be used for prompt answering


### Direct Use

The model can be used for prompt answering


### Downstream Use

Generating text and prompt answering


## Recommendations

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.


## How to Get Started with the Model

Use the code below to get started with the model.

```
from transformers import LlamaTokenizer, LlamaForCausalLM
from peft import PeftModel

MODEL_NAME = "decapoda-research/llama-13b-hf"
tokenizer = LlamaTokenizer.from_pretrained(MODEL_NAME, add_eos_token=True)
tokenizer.pad_token_id = 0

model = LlamaForCausalLM.from_pretrained(MODEL_NAME, load_in_8bit=True, device_map="auto")
model = PeftModel.from_pretrained(model, "Sandiago21/public-ai-model")
```

### Example of Usage
```
from transformers import GenerationConfig

PROMPT = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\nWhich is the capital city of Greece and with which countries does Greece border?\n\n### Input:\nQuestion answering\n\n### Response:\n"""
DEVICE = "cuda"

inputs = tokenizer(
    PROMPT,
    return_tensors="pt",
)

input_ids = inputs["input_ids"].to(DEVICE)

generation_config = GenerationConfig(
    temperature=0.1,
    top_p=0.95,
    repetition_penalty=1.2,
)

print("Generating Response ... ")
generation_output = model.generate(
    input_ids=input_ids,
    generation_config=generation_config,
    return_dict_in_generate=True,
    output_scores=True,
    max_new_tokens=256,
)

for s in generation_output.sequences:
    print(tokenizer.decode(s))
```

## Training Details


### Training Data

The decapoda-research/llama-13b-hf was finetuned on conversations and question answering data


### Training Procedure

The decapoda-research/llama-13b-hf model was further trained and finetuned on question answering and prompts data


## Model Architecture and Objective

The model is based on decapoda-research/llama-13b-hf model and finetuned adapters on top of the main model on conversations and question answering data.