cardiffnlp/roberta-large-tweet-topic-multi-2020
This model is a fine-tuned version of roberta-large on the tweet_topic_multi. This model is fine-tuned on train_2020
split and validated on test_2021
split of tweet_topic.
Fine-tuning script can be found here. It achieves the following results on the test_2021 set:
- F1 (micro): 0.7323655694132079
- F1 (macro): 0.5794562917377284
- Accuracy: 0.4937462775461584
Usage
import math
import torch
from transformers import AutoModelForSequenceClassification, AutoTokenizer
def sigmoid(x):
return 1 / (1 + math.exp(-x))
tokenizer = AutoTokenizer.from_pretrained("cardiffnlp/roberta-large-tweet-topic-multi-2020")
model = AutoModelForSequenceClassification.from_pretrained("cardiffnlp/roberta-large-tweet-topic-multi-2020", problem_type="multi_label_classification")
model.eval()
class_mapping = model.config.id2label
with torch.no_grad():
text = #NewVideo Cray Dollas- Water- Ft. Charlie Rose- (Official Music Video)- {{URL}} via {@YouTube@} #watchandlearn {{USERNAME}}
tokens = tokenizer(text, return_tensors='pt')
output = model(**tokens)
flags = [sigmoid(s) > 0.5 for s in output[0][0].detach().tolist()]
topic = [class_mapping[n] for n, i in enumerate(flags) if i]
print(topic)
Reference
@inproceedings{dimosthenis-etal-2022-twitter,
title = "{T}witter {T}opic {C}lassification",
author = "Antypas, Dimosthenis and
Ushio, Asahi and
Camacho-Collados, Jose and
Neves, Leonardo and
Silva, Vitor and
Barbieri, Francesco",
booktitle = "Proceedings of the 29th International Conference on Computational Linguistics",
month = oct,
year = "2022",
address = "Gyeongju, Republic of Korea",
publisher = "International Committee on Computational Linguistics"
}
- Downloads last month
- 8
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
the model is not deployed on the HF Inference API.
Dataset used to train cardiffnlp/roberta-large-tweet-topic-multi-2020
Evaluation results
- F1 on cardiffnlp/tweet_topic_multiself-reported0.732
- F1 (macro) on cardiffnlp/tweet_topic_multiself-reported0.579
- Accuracy on cardiffnlp/tweet_topic_multiself-reported0.494