CamemBERT-base for sentiment analysis on tweets

This is a Camembert-base model trained on a corpus of 50K french tweets.

  • Git Repo containing the dataset and the code (scraping & training) : Git

The model can predict which of the 25 emojis it has been trained with suits the best on a given sentence / tweet. The 25 emojis are the 25 most frequent in the dataset.

We've succeeded to obtain a 32% accuracy on a small amount of tweets.

Note: We've also decided to keep the emojis in their demojized versions because some emojis could be seen as two (ex : 👍🏿)

Loading the model

from transformers import AutoModelForSequenceClassification, AutoTokenizer, TFAutoModelForSequenceClassification

MODEL = f"Jessy3ric/camembert-twitter-emoji"
tokenizer = AutoTokenizer.from_pretrained(MODEL)

model = AutoModelForSequenceClassification.from_pretrained(MODEL)
model.save_pretrained(MODEL)
Downloads last month
4
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.