dataset
Could you please upload the dataset used (euvsdisinfo)
Hello, yes, I have uploaded the dataset that was used to train the model: https://huggingface.co/datasets/winterForestStump/fake-news-detector-euvsdisinfo
Have you tried this model with real examples like There is war between Ukraine and Russia, I found when testing with real world news statements it makes wrong predictions. Please correct me if I am wrong. Almost it classified given sentences as FAKE.
Hello. One short sentence is not enough. You have to provide more information, usually several paragraphs of news text.
Moreover, it does not contain information about recent events and cannot be used as a fact checker.
Consider this news for example: https://www.bbc.com/news/articles/cz0rn85v5kjo. You can feed into the model at least first three paragraphs, and the result will be TRUE.
from transformers import pipeline
pipe = pipeline("text-classification", model="winterForestStump/Roberta-fake-news-detector")
text = '''
Slovak PM meets Putin in surprise Moscow visit. Slovakia's Prime Minister Robert Fico has made a surprise visit to Moscow for talks with Vladimir Putin - becoming only the third Western leader to meet the Russian leader since the full-scale invasion of Ukraine three years ago.
Fico - a vocal critic of the European Union's support for Kyiv in the war - said they discussed supplies of Russian gas to Slovakia - which his country relies on.
A deal with Russian gas giant Gazprom to transit energy through Ukraine to Slovakia is due to expire at the end of this year.
'''
pipe(text)
[{'label': 'TRUE', 'score': 0.9999926090240479}]
But in any case, you should use this model very carefully. Because the dataset for fine-tuning is very biased: the dataset contains only news about the war in Ukraine, as well as disinformation spread since 2014 for the most part in Europe.
The results are also not stable and will contain many false positives (FALSE results): the “real” articles used were text produced by EUVSDISINFO.eu as a rebuttal to the fakes (i.e. not real articles from the media).