distilroberta-mbfc-bias
This model is a fine-tuned version of distilroberta-base on the Proppy dataset, using political bias from mediabiasfactcheck.com as labels.
It achieves the following results on the evaluation set:
- Loss: 1.4130
- Acc: 0.6348
Training and evaluation data
The training data used is the proppy corpus. Articles are labeled for political bias using the political bias of the source publication, as scored by mediabiasfactcheck.com. See Proppy: Organizing the News Based on Their Propagandistic Content for details.
To create a more balanced training set, common labels are downsampled to have a maximum of 2000 articles. The resulting label distribution in the training data is as follows:
extremeright 689
leastbiased 2000
left 783
leftcenter 2000
right 1260
rightcenter 1418
unknown 2000
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 3e-05
- train_batch_size: 32
- eval_batch_size: 32
- seed: 12345
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 16
- num_epochs: 20
- mixed_precision_training: Native AMP
Training results
Training Loss | Epoch | Step | Validation Loss | Acc |
---|---|---|---|---|
0.9493 | 1.0 | 514 | 1.2765 | 0.4730 |
0.7376 | 2.0 | 1028 | 1.0003 | 0.5812 |
0.6702 | 3.0 | 1542 | 1.1294 | 0.5631 |
0.6161 | 4.0 | 2056 | 1.0439 | 0.6058 |
0.4934 | 5.0 | 2570 | 1.1196 | 0.6028 |
0.4558 | 6.0 | 3084 | 1.0993 | 0.5977 |
0.4717 | 7.0 | 3598 | 1.0308 | 0.6373 |
0.3961 | 8.0 | 4112 | 1.1291 | 0.6234 |
0.3829 | 9.0 | 4626 | 1.1554 | 0.6316 |
0.3442 | 10.0 | 5140 | 1.1548 | 0.6465 |
0.2505 | 11.0 | 5654 | 1.3605 | 0.6169 |
0.2105 | 12.0 | 6168 | 1.3310 | 0.6297 |
0.262 | 13.0 | 6682 | 1.2706 | 0.6383 |
0.2031 | 14.0 | 7196 | 1.3658 | 0.6378 |
0.2021 | 15.0 | 7710 | 1.4130 | 0.6348 |
Framework versions
- Transformers 4.11.2
- Pytorch 1.7.1
- Datasets 1.11.0
- Tokenizers 0.10.3
- Downloads last month
- 6
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.