---
license: mit
datasets:
- Superar/Puntuguese
language:
- pt
pipeline_tag: token-classification
tags:
- humor
- puns
- pun-location
---

# "Não é medo, é recheio": Sequence Labeling for Pun Location and Detection in Portuguese

This repository contains the models fine-tuned for the task of Pun Location with Portuguese Language, trained with the [Puntuguese](https://huggingface.co/datasets/Superar/Puntuguese) dataset. There are several models available:

- `GlorIA-1.3B-all`
- `GlorIA-1.3B-positive`
- `albertina-900m-ptbr-all`
- `albertina-900m-ptbr-positive`
- `albertina-900m-ptpt-all`
- `albertina-900m-ptpt-positive`

The `*-all` models were fine-tuned with all the data from the training portion of Puntuguese, including negative examples. Meanwhile, the `*-positive` models were trained only on texts that contain at least one pun sign.

We make available all of the models' checkpoints. Therefore, we encourage to walk through the files and find the one most suitable.

## How to use

To load a model, use the `AutoModelForSequenceClassification.from_pretrained()` method with the `subfolder` argument.

For example, if we want to load the checkpoint 500 of `albertina-900m-ptbr-positive`, we need the following code:

```python
from transformers import AutoModelForSequenceClassification

model = AutoModelForSequenceClassification.from_pretrained('Superar/Portuguese-Pun-Location',
                                                            subfolder='albertina-900m-ptbr-positive/checkpoint-500')
```

This should load the correct model.

## How to cite

```bibtex
@inproceedings{gameiro_etal:epia2024,
  title = {Sequence Labeling for Pun Location and Detection in {{Portuguese}}},
  booktitle = {Proceedings of 23rd {{EPIA}} Conference on Artificial Intelligence, {{EPIA}} 2024},
  author = {Gameiro, Patr{\'{\i}}cia and In{\'a}cio, Marcio and Gon{\c c}alo Oliveira, Hugo and Alves, Ana},
  year = {2024},
  pages = {In press},
  address = {Viana do Castelo, Portugal}
}
```