raicrits
/

BERT_ChangeOfTopic

+---
+license: unknown
+datasets:
+- raicrits/YouTube_RAI_dataset
+language:
+- it
+pipeline_tag: text-classification
+tags:
+- LLM
+- Italian
+- Classification
+- BERT
+- Topics
+library_name: transformers
+---
+---
+# Model Card raicrits/Llama3_ChangeOfTopic
+<!-- Provide a quick summary of what the model is/does. -->
+[bert-base-multilingual-cased](https://huggingface.co/bert-base-multilingual-cased) finetuned to be capable of detecting
+a change of topic in a given text.
+### Model Description
+The model  is finetuned for the specific task of detecting a change of topic in a given text. Given a text the model answers with "1" in the case that it detects a change of topic and "0" otherwise.
+The training has been done using the chapters in the Youtube videos contained in the train split of the dataset [raicrits/YouTube_RAI_dataset](https://huggingface.co/meta-llama/raicrits/YouTube_RAI_dataset).
+- **Developed by:** Stefano Scotta ([email protected])
+- **Model type:** LLM finetuned on the specific task of detect a change of topic in a given text
+- **Language(s) (NLP):** Italian
+- **License:** unknown
+- **Finetuned from model [optional]:** [bert-base-multilingual-cased](https://huggingface.co/bert-base-multilingual-cased)
+## Uses
+The model can be used to check if in a given text occurs a change of topic or not.
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+## How to Get Started with the Model
+Use the code below to get started with the model.
+ **Usage:**
+Use the code below to get started with the model.
+ ``` python
+import torch
+from transformers import AutoTokenizer, BertForSequenceClassification, BertTokenizer, AutoModelForCausalLM, pipeline
+model_bert = torch.load('/opt/data/AI4MEDIA/LLMProject/models/bert_multi_CT_30sec_shift10_weight_loss')
+model_bert = model_bert.to(device_bert)
+tokenizer_bert = AutoTokenizer.from_pretrained('bert-base-multilingual-cased')
+encoded_dict = tokenizer_bert.encode_plus(
+                    '<text>',
+                    add_special_tokens = True,
+                    max_length = 256,
+                  # max_length = min(max_len, 512),
+                    truncation = True,
+                    padding='max_length',
+                    return_attention_mask = True,
+                    return_tensors = 'pt',
+               )
+input_ids = encoded_dict['input_ids'].to(device_bert)
+input_mask = encoded_dict['attention_mask'].to(device_bert)
+with torch.no_grad():
+    output= model_bert(input_ids,
+                           token_type_ids=None,
+                           attention_mask=input_mask)
+    logits = output.logits
+    logits = logits.detach().cpu().numpy()
+    pred_flat = np.argmax(logits, axis=1).flatten()
+print(pred_flat[0])
+```
+## Training Details
+### Training Data
+Chapters in the Youtube videos contained in the train split of the dataset [raicrits/YouTube_RAI_dataset](https://huggingface.co/meta-llama/raicrits/YouTube_RAI_dataset)
+### Training Procedure
+ **Training setting:**
+- train epochs=18,
+- learning_rate=2e-05
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** 1 NVIDIA A100/40Gb
+- **Hours used:** 20
+- **Cloud Provider:** Private Infrastructure
+- **Carbon Emitted:** 2.38kg eq. CO2
+## Model Card Authors
+Stefano Scotta ([email protected])
+## Model Card Contact
+[email protected]