File size: 5,157 Bytes
7e97350
9d12e96
824e0d2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7e97350
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7727260
7e97350
 
 
 
 
 
 
bf63efa
7e97350
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e3deb06
7e97350
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0f114b8
7e97350
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
---
license: other
datasets:
- raicrits/YouTube_RAI_dataset
language:
- it
pipeline_tag: text2text-generation
tags:
- LLM
- Italian
- LoRa
- Classification
- LLama3
- Topics
library_name: transformers, peft
---

---

# Model Card raicrits/Llama3_ChangeOfTopic

<!-- Provide a quick summary of what the model is/does. -->

LoRa adapters for [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) obtained through a finetuning process (using LoRA technique) aimed at making the model capable of detecting
a change of topic in a given text. 


### Model Description


The model resulting from the application of the adapters in this repository to the base model [meta-llama/MMeta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) is optimized to perform the 
specific task of detecting a change of topic in a given text. Given a text the model answers with "1" in the case that it detects a change of topic and "0" otherwise.
The training has been done using the chapters in the Youtube videos contained in the train split of the dataset [raicrits/YouTube_RAI_dataset](https://huggingface.co/meta-llama/raicrits/YouTube_RAI_dataset).
Because of the finetuning process it is important to respect the prompt template in order to get good results.


- **Developed by:** Stefano Scotta ([email protected])
- **Model type:** LLM finetuned on the specific task of detect a change of topic in a given text
- **Language(s) (NLP):** Italian
- **License:** unknown
- **Finetuned from model [optional]:** [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct)


## Uses

The model can be used to check if in a given text occurs a change of topic or not.

<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->


## Bias, Risks, and Limitations

As any other LLM it is possible that the model generates content which does not correspond to the reality as well as wrong, biased, offensive and inappropriate answers.


## How to Get Started with the Model

Use the code below to get started with the model.

 **Usage:**
Use the code below to get started with the model.
 ``` python
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel

model_id = "meta-llama/Meta-Llama-3-8B"
lora_id = "raicrits/Llama3_ChangeOfTopic"

quantization_config = BitsAndBytesConfig(
            load_in_8bit=True)

base_model = AutoModelForCausalLM.from_pretrained(model_id, 
                                             quantization_config=quantization_config, 
                                             device_map=device)
model = PeftModel.from_pretrained(base_model, lora_id)


tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

terminators = [
    tokenizer.eos_token_id,
    tokenizer.convert_tokens_to_ids("<|eot_id|>")
]

messages = [
    {"role": "system", "content": "You are an AI assistant able to detect change of topics in given texts."},
    {"role": "user", "content": f"""Analyze the following text written in italian and in case you detect a change of topic answer just with "1", otherwise, if the topic remains the same within all the given text answer just "0". do not add further text.
    
Text: {'<text>'}"""
]

input_ids = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    return_tensors="pt").to(model.device)

with torch.no_grad():        
    outputs = model.generate(
        input_ids,
        max_new_tokens=1,
        eos_token_id=terminators,
        do_sample=True,
        temperature=0.2
        )
    response = outputs[0][input_ids.shape[-1]:]
print(tokenizer.decode(response, skip_special_tokens=False))
```

## Training Details

### Training Data

Chapters in the Youtube videos contained in the train split of the dataset [raicrits/YouTube_RAI_dataset](https://huggingface.co/meta-llama/raicrits/YouTube_RAI_dataset)

### Training Procedure

The fine-tuning procedure was done using [LoRA](https://arxiv.org/abs/2106.09685) approach.

 **Training setting:**
- train epochs=1,

- learning_rate=2e-05

- mixed precision training: int8


 **LoRA configuration:**
-  r= 8
  
- lora_alpha=16
  
- target_modules=["q_proj", "k_proj", "v_proj", "o_proj"]

- lora_dropout=0.1
  
- bias="none"
  
- task_type=CAUSAL_LM



## Environmental Impact

<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->

Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).

- **Hardware Type:** 1 NVIDIA A100/40Gb
- **Hours used:** 45
- **Cloud Provider:** Private Infrastructure
- **Carbon Emitted:** 4.86kg eq. CO2

## Model Card Authors

Stefano Scotta ([email protected])

## Model Card Contact

[email protected]