BERT NMB+ (Disinformation Sequence Classification):

Classifies 512 chunks of a news article as "Likely" or "Unlikely" biased/disinformation.

Fine-tuned BERT (bert-base-uncased) on the headline, aritcle_text and text_label fields in the News Media Bias Plus Dataset.

This model was trained with weighted sampling so that each batch contains 50% 'Likely' examples and 50% 'Unlikely' examples. The same model trained without weighted sampling is here, and got slightly better taining eval metrics. However, this model preformed better when predictions were evaluated by gpt-4o as a judge.

Metics

Evaluated on a 0.1 random sample of the NMB+ dataset, unseen during training

  • Accuracy: 0.7597
  • Precision: 0.9223
  • Recall: 0.7407
  • F1 Score: 0.8216

How to Use:

Keep in mind, this model was trained on full 512 token chunks (tends to over-predict Unlikely for standalone sentences). If you're planning on processing stand alone sentences, you may find better results with this NMB+ model, which was trained on biased headlines.

from transformers import pipeline

classifier = pipeline("text-classification", model="maximuspowers/nmbp-bert-full-articles-balanced")
result = classifier("He was a terrible politician.", top_k=2)

Example Response:

[
  {
    'label': 'Likely',
    'score': 0.9967995882034302
  },
  {
    'label': 'Unlikely',
    'score': 0.003200419945642352
  }
]
Downloads last month
20
Safetensors
Model size
109M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for maximuspowers/nmbp-bert-full-articles-balanced

Finetuned
(2310)
this model

Dataset used to train maximuspowers/nmbp-bert-full-articles-balanced