Fine-tuning dataset

by mnc5 - opened Dec 5, 2024

mnc5

Dec 5, 2024

•

edited Dec 11, 2024

What is the fine-tuning dataset used for this model? Thanks.

mnc5 changed discussion title from Data source for fake.csv to Fine-tuning data sources ISOT and LIAR or only ISOT? Dec 5, 2024

luckyshotjpg

Dec 11, 2024

I would caution against using models that are trained on these datasets as they are inherently biased. While they perform well under holdout or cross-validation conditions, they struggle to generalise effectively outside of their training datasets (even when testing within the same domain).

Hoy, N. and Koulouri, T., 2022, December. Exploring the generalisability of fake news detection models. In 2022 IEEE International Conference on Big Data (Big Data) (pp. 5731-5740). IEEE.

mnc5 changed discussion title from Fine-tuning data sources ISOT and LIAR or only ISOT? to Data source Dec 11, 2024

mnc5 changed discussion title from Data source to Which dataset was used to fine-tune this model? Dec 11, 2024

mnc5 changed discussion title from Which dataset was used to fine-tune this model? to Fine-tuning dataset Dec 11, 2024

Hansa23

Dec 22, 2024

Have you tried this model with real examples like There is war between Ukraine and Russia, I found when testing with real world news statements it makes wrong predictions. Please correct me if I am wrong. Almost it classified given sentences as FAKE.

Hansa23

Dec 22, 2024

I would caution against using models that are trained on these datasets as they are inherently biased. While they perform well under holdout or cross-validation conditions, they struggle to generalise effectively outside of their training datasets (even when testing within the same domain).

Hoy, N. and Koulouri, T., 2022, December. Exploring the generalisability of fake news detection models. In 2022 IEEE International Conference on Big Data (Big Data) (pp. 5731-5740). IEEE.

Yeah I agree with this 100%, I am also searching for solution for this. I found the same issue on Kaggle notebooks which claimed more than 97% accuracy which used ROBERTA, BERT and other models and finetuned with LIAR, fake-real etc. datasets.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment