Model Card

This model is a fine-tuned version of intfloat/multilingual-e5-small. It was fine-tuned on Factrank data with additional machine annotated data from Dutch and Belgian parliamentary proceedings.

The primary goal of this model is to determine whether a given statement warrants fact-checking. It does not determine whether the statement is factually correct.

1 label is given: FR, FNR or NF.

  • FR: Factual Relevant (the statement is fact-checkable and requires verification)
  • FNR: Factual, Not Relevant (the statement can be fact-checked, but the wider relevance is lower)
  • NF: Not Factual (the statement does not contain information for fact-checking)

Examples:

  • FR: Toch blijkt uit cijfers van Flanders Investment & Trade dat ons handel met het Verenigd Koninkrijk opnieuw op het niveau ligt van voor de brexit.
  • FNR: Ayleen werd opgelicht via dating fraude door de Tinder Swindler: "Het zijn net vampiers."
  • NF: Het heeft weinig zin om zomaar een aantal maatregelen te tonen.

Supported language: Dutch

Usage

from transformers import pipeline
from transformers import AutoTokenizer, AutoModelForSequenceClassification, AutoConfig
from huggingface_hub import login

config = AutoConfig.from_pretrained("textgain/FactRank_e5_small")
tokenizer = AutoTokenizer.from_pretrained("textgain/FactRank_e5_small")
model = AutoModelForSequenceClassification.from_pretrained("textgain/FactRank_e5_small", config=config)
pipe = pipeline(model=model, tokenizer=tokenizer, task="text-classification")


sample_texts = [
    "In een wereld die steeds digitaler wordt, moeten we het ook makkelijker maken om de controle over je financiën te hebben.",
    "Ik wil helemaal geen haren tussen u en de heer De Cock leggen.",
    "Je kunt van mening verschillen over welk gevolg je daaraan moet verbinden.",
    "We hebben 4.500 nieuwe kankergevallen in Nederland per jaar als gevolg van alcoholgebruik.",
    "Alcoholgebruik kost de samenleving 2 tot 4 miljard euro.",
    "Dus kan de minister daar vandaag wat meer over zeggen?"
    ]

results = pipe(sample_texts)
predicted_labels = [res["label"] for res in results]

Interpretation of Results

Factors Influencing the Label:

  • Subjective Evaluation: The presence of evaluations such as "interesting", "surprising", "incredible" might push the model towards predicting NF.
  • Research: The mention of research or studies pushes the model to consider the statement as a verifiable fact.
  • Context: Statements made in certain contexts may be more likely to get an FR label, e.g. statements about health and medicine.

Training Details

The model was trained on a total of 13 786 data samples.

Parameters:

num_epochs = 5
batch_size = 32
learning_rate = 1e-5
dropout = 0.5
gradient_accumulation_steps = 4

Acknowledgment

BENEDMO Logo

This transformer was made in the context of the BENEDMO project. BENEDMO brings together a network of expertise on disinformation and fact-checking. Through a Flemish-Dutch collaboration, BENEDMO aims to address the impact and challenges of disinformation.

BENEDMO has received funding from the European Union under Grant Agreement number 101158277-BENEDMO-2023-DEPLOY-04.

Downloads last month
64
Safetensors
Model size
118M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for textgain/FactRank_e5_small

Finetuned
(72)
this model