--- license: mit datasets: - Superar/Puntuguese language: - pt pipeline_tag: token-classification tags: - humor - puns - pun-location --- # "Não é medo, é recheio": Sequence Labeling for Pun Location and Detection in Portuguese This repository contains the models fine-tuned for the task of Pun Location with Portuguese Language, trained with the [Puntuguese](https://huggingface.co/datasets/Superar/Puntuguese) dataset. There are several models available: - `GlorIA-1.3B-all` - `GlorIA-1.3B-positive` - `albertina-900m-ptbr-all` - `albertina-900m-ptbr-positive` - `albertina-900m-ptpt-all` - `albertina-900m-ptpt-positive` The `*-all` models were fine-tuned with all the data from the training portion of Puntuguese, including negative examples. Meanwhile, the `*-positive` models were trained only on texts that contain at least one pun sign. We make available all of the models' checkpoints. Therefore, we encourage to walk through the files and find the one most suitable. ## How to use To load a model, use the `AutoModelForSequenceClassification.from_pretrained()` method with the `subfolder` argument. For example, if we want to load the checkpoint 500 of `albertina-900m-ptbr-positive`, we need the following code: ```python from transformers import AutoModelForSequenceClassification model = AutoModelForSequenceClassification.from_pretrained('Superar/Portuguese-Pun-Location', subfolder='albertina-900m-ptbr-positive/checkpoint-500') ``` This should load the correct model. ## How to cite ```bibtex @inproceedings{gameiro_etal:epia2024, title = {Sequence Labeling for Pun Location and Detection in {{Portuguese}}}, booktitle = {Proceedings of 23rd {{EPIA}} Conference on Artificial Intelligence, {{EPIA}} 2024}, author = {Gameiro, Patr{\'{\i}}cia and In{\'a}cio, Marcio and Gon{\c c}alo Oliveira, Hugo and Alves, Ana}, year = {2024}, pages = {In press}, address = {Viana do Castelo, Portugal} } ```