Model Card
This model is a fine-tuned version of intfloat/multilingual-e5-small. It was fine-tuned on Factrank data with additional machine annotated data from Dutch and Belgian parliamentary proceedings.
The primary goal of this model is to determine whether a given statement warrants fact-checking. It does not determine whether the statement is factually correct.
1 label is given: FR, FNR or NF.
- FR: Factual Relevant (the statement is fact-checkable and requires verification)
- FNR: Factual, Not Relevant (the statement can be fact-checked, but the wider relevance is lower)
- NF: Not Factual (the statement does not contain information for fact-checking)
Examples:
- FR: Toch blijkt uit cijfers van Flanders Investment & Trade dat ons handel met het Verenigd Koninkrijk opnieuw op het niveau ligt van voor de brexit.
- FNR: Ayleen werd opgelicht via dating fraude door de Tinder Swindler: "Het zijn net vampiers."
- NF: Het heeft weinig zin om zomaar een aantal maatregelen te tonen.
Supported language: Dutch
Usage
from transformers import pipeline
from transformers import AutoTokenizer, AutoModelForSequenceClassification, AutoConfig
from huggingface_hub import login
config = AutoConfig.from_pretrained("textgain/FactRank_e5_small")
tokenizer = AutoTokenizer.from_pretrained("textgain/FactRank_e5_small")
model = AutoModelForSequenceClassification.from_pretrained("textgain/FactRank_e5_small", config=config)
pipe = pipeline(model=model, tokenizer=tokenizer, task="text-classification")
sample_texts = [
"In een wereld die steeds digitaler wordt, moeten we het ook makkelijker maken om de controle over je financiën te hebben.",
"Ik wil helemaal geen haren tussen u en de heer De Cock leggen.",
"Je kunt van mening verschillen over welk gevolg je daaraan moet verbinden.",
"We hebben 4.500 nieuwe kankergevallen in Nederland per jaar als gevolg van alcoholgebruik.",
"Alcoholgebruik kost de samenleving 2 tot 4 miljard euro.",
"Dus kan de minister daar vandaag wat meer over zeggen?"
]
results = pipe(sample_texts)
predicted_labels = [res["label"] for res in results]
Interpretation of Results
Factors Influencing the Label:
- Subjective Evaluation: The presence of evaluations such as "interesting", "surprising", "incredible" might push the model towards predicting NF.
- Research: The mention of research or studies pushes the model to consider the statement as a verifiable fact.
- Context: Statements made in certain contexts may be more likely to get an FR label, e.g. statements about health and medicine.
Training Details
The model was trained on a total of 13 786 data samples.
Parameters:
num_epochs = 5
batch_size = 32
learning_rate = 1e-5
dropout = 0.5
gradient_accumulation_steps = 4
Acknowledgment
This transformer was made in the context of the BENEDMO project. BENEDMO brings together a network of expertise on disinformation and fact-checking. Through a Flemish-Dutch collaboration, BENEDMO aims to address the impact and challenges of disinformation.
BENEDMO has received funding from the European Union under Grant Agreement number 101158277-BENEDMO-2023-DEPLOY-04.
- Downloads last month
- 64
Model tree for textgain/FactRank_e5_small
Base model
intfloat/multilingual-e5-small