GuiltRoBERTa-en: A Two-Stage Classifier for Guilt-Assignment Rhetoric in English Political Texts

GuiltRoBERTa-en is a two-stage AI pipeline for detecting guilt-assignment rhetoric in English political discourse. It combines:

  1. Stage 1 – Emotion Pre-Filtering: emotion labels from the Babel Emotions6 Tool
  2. Stage 2 – Guilt Classification: a fine-tuned binary XLM-RoBERTa model trained on manually annotated English texts (guilt vs no_guilt)

The approach is grounded in political communication theory, which suggests that guilt attribution often emerges in anger-laden contexts. Thus, only texts labeled as "Anger" in Stage 1 are passed to the guilt classifier.


🧩 Model Architecture

Stage 1: Emotion Pre-Filtering (Babel Emotions Tool)

  • Tool: Emotions 6 Babel Machine
  • Task: 6-class emotion classification (Anger, Fear, Disgust, Sadness, Joy, None of them)
  • Input: CSV file with one text per row
  • Output: CSV file with predicted labels and probabilities
  • Usage: retain only rows with predicted_emotion == "Anger" for Stage 2

The Babel Emotions Tool is not an API but a web-based interface. Upload a CSV file, download the labeled results, and use them as input to the guilt classifier.

Stage 2: Guilt Classification

  • Base model: xlm-roberta-base
  • Task: Binary classification (guilt, no_guilt)
  • Training data: Sentence-level annotated English corpus
  • Optimization: Class-weighted loss function to handle label imbalance
  • Recommended threshold: Ï„ = 0.15

Motivation

Guilt assignment — attributing moral responsibility or blame — is a key rhetorical strategy in political communication. Since guilt often appears alongside anger, direct one-stage classification risks conflating emotional tones.

This two-stage pipeline improves precision by:

  • Filtering anger-related contexts first
  • Then applying a dedicated guilt detector only where relevant

Evaluation

The model was evaluated on a held-out validation set (20% stratified split) with the following approach:

Stage 1 Filter Threshold (Ï„) Precision Recall F1 Accuracy
Anger-only 0.15 optimized optimized optimized optimized
  • Best configuration: Anger-only, Ï„ = 0.15
  • Metrics: Accuracy, Precision, Recall, F1-score, ROC-AUC, PR-AUC
  • The two-stage model shows improved performance compared to single-stage baselines

Usage Example

Step 1: Get Emotion Predictions from Babel

  1. Visit https://emotionsbabel.poltextlab.com/
  2. Upload your CSV file (one text per row)
  3. Download the predictions (includes emotion_predicted column)

Step 2: Apply Guilt Classifier

import pandas as pd
from transformers import AutoTokenizer, AutoModelForSequenceClassification, TextClassificationPipeline

# Load Babel emotion predictions
df = pd.read_excel("your_data_with_emotion_predictions.xlsx")

# Filter for 'Anger' only
anger_df = df[df["emotion_predicted"] == "Anger"].copy()

# Load the guilt classifier
repo_id = "your-org/guiltroberta-en"  # Update with actual path
tokenizer = AutoTokenizer.from_pretrained(repo_id)
model = AutoModelForSequenceClassification.from_pretrained(repo_id)
pipe = TextClassificationPipeline(model=model, tokenizer=tokenizer, return_all_scores=True)

# Apply guilt predictions with threshold
THRESHOLD = 0.15

anger_df["guilt_score"] = anger_df["text"].apply(
    lambda t: pipe(t)[0][1]["score"]  # score for 'guilt' label
)

anger_df["guilt_predicted"] = anger_df["guilt_score"] > THRESHOLD

# Save results
anger_df.to_excel("anger_with_guilt_predictions.xlsx", index=False)

# Statistics
print(f"Total anger sentences: {len(anger_df)}")
print(f"Predicted guilt: {anger_df['guilt_predicted'].sum()}")
print(f"Guilt ratio: {anger_df['guilt_predicted'].mean():.2%}")

Alternative: Direct Inference

import torch
from transformers import XLMRobertaTokenizer, XLMRobertaForSequenceClassification

# Load model
model_path = "your-org/guiltroberta-en"
tokenizer = XLMRobertaTokenizer.from_pretrained(model_path)
model = XLMRobertaForSequenceClassification.from_pretrained(model_path)

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
model.eval()

# Example: anger-labeled sentence
text = "I'm furious at myself for letting this happen again."

# Tokenize and predict
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512, padding=True)
inputs = {k: v.to(device) for k, v in inputs.items()}

with torch.no_grad():
    outputs = model(**inputs)
    logits = outputs.logits
    prob_guilt = torch.softmax(logits, dim=-1)[0][1].item()

# Apply threshold
THRESHOLD = 0.15
prediction = "guilt" if prob_guilt > THRESHOLD else "no_guilt"

print(f"Guilt probability: {prob_guilt:.4f}")
print(f"Prediction: {prediction}")

Training Configuration

Epochs: 4
Learning Rate: 2e-5
Batch Size: 8
Max Sequence Length: 512 tokens
Optimizer: AdamW
Scheduler: Linear warmup
Train/Validation Split: 80/20 (stratified)
Class Weighting: Applied to handle label imbalance
Downloads last month
22
Safetensors
Model size
0.3B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support