|
--- |
|
license: mit |
|
language: |
|
- it |
|
- en |
|
library_name: transformers |
|
tags: |
|
- sft |
|
- it |
|
- gemma |
|
- chatml |
|
--- |
|
|
|
# Model Information |
|
|
|
VolareQuantized is a compact iteration of the model [Volare](https://huggingface.co/MoxoffSpA/Volare), optimized for efficiency. |
|
|
|
It is offered in two distinct configurations: a 4-bit version and an 8-bit version, each designed to maintain the model's effectiveness while significantly reducing its size |
|
and computational requirements. |
|
|
|
- It's trained both on publicly available datasets, like [SQUAD-it](https://huggingface.co/datasets/squad_it), and datasets we've created in-house. |
|
- it's designed to understand and maintain context, making it ideal for Retrieval Augmented Generation (RAG) tasks and applications requiring contextual awareness. |
|
- It is quantized in a 4-bit version and an 8-bit version following the procedure [here](https://github.com/ggerganov/llama.cpp). |
|
|
|
# Evaluation |
|
|
|
We evaluated the model using the same test sets as used for the [Open Ita LLM Leaderboard](https://huggingface.co/spaces/FinancialSupport/open_ita_llm_leaderboard) |
|
|
|
| hellaswag_it acc_norm | arc_it acc_norm | m_mmlu_it 5-shot acc | Average | F1 | |
|
|:----------------------| :--------------- | :-------------------- | :------- | :-- | |
|
| 0.6474 | 0.4671 | 0.5521 | 0.555 | 69.82 | |
|
|
|
|
|
## Usage |
|
|
|
You need to download the .gguf model first |
|
|
|
If you want to use the cpu install these dependencies: |
|
|
|
```python |
|
pip install llama-cpp-python huggingface_hub |
|
``` |
|
|
|
If you want to use the gpu instead: |
|
|
|
```python |
|
CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install huggingface_hub llama-cpp-python --force-reinstall --upgrade --no-cache-dir |
|
``` |
|
|
|
And then use this code to see a response to the prompt. |
|
|
|
```python |
|
from huggingface_hub import hf_hub_download |
|
from llama_cpp import Llama |
|
|
|
model_path = hf_hub_download( |
|
repo_id="MoxoffSpA/VolareQuantized", |
|
filename="Volare-ggml-Q4_K_M.gguf" |
|
) |
|
|
|
# Set gpu_layers to the number of layers to offload to GPU. Set to 0 if no GPU acceleration is available on your system. |
|
llm = Llama( |
|
model_path=model_path, |
|
n_ctx=2048, # The max sequence length to use - note that longer sequence lengths require much more resources |
|
n_threads=8, # The number of CPU threads to use, tailor to your system and the resulting performance |
|
n_gpu_layers=0 # The number of layers to offload to GPU, if you have GPU acceleration available |
|
) |
|
|
|
# Simple inference example |
|
question = """Quanto è alta la torre di Pisa?""" |
|
context = """ |
|
La Torre di Pisa è un campanile del XII secolo, famoso per la sua inclinazione. Alta circa 56 metri. |
|
""" |
|
|
|
prompt = f"Domanda: {question}, contesto: {context}" |
|
|
|
output = llm( |
|
f"[INST] {prompt} [/INST]", # Prompt |
|
max_tokens=128, |
|
stop=["\n"], |
|
echo=True, |
|
temperature=0.1, |
|
top_p=0.95 |
|
) |
|
|
|
# Chat Completion API |
|
|
|
print(output['choices'][0]['text']) |
|
``` |
|
|
|
## Bias, Risks and Limitations |
|
|
|
VolareQuantized and its original model have not been aligned to human preferences for safety within the RLHF phase or deployed with in-the-loop filtering of |
|
responses like ChatGPT, so the model can produce problematic outputs (especially when prompted to do so). It is also unknown what the size and composition |
|
of the corpus was used to train the base model, however, it is likely to have included a mix of Web data and technical sources |
|
like books and code. |
|
|
|
## Links to resources |
|
|
|
- SQUAD-it dataset: https://huggingface.co/datasets/squad_it |
|
- Gemma-7b model: https://huggingface.co/google/gemma-7b |
|
- Open Ita LLM Leaderbord: https://huggingface.co/spaces/FinancialSupport/open_ita_llm_leaderboard |
|
|
|
## Quantized versions |
|
|
|
We have the not quantized version here: |
|
https://huggingface.co/MoxoffSpA/Volare |
|
|
|
## The Moxoff Team |
|
|
|
Jacopo Abate, Marco D'Ambra, Luigi Simeone, Gianpaolo Francesco Trotta |
|
|