File size: 4,201 Bytes
a37a8cd 3102e25 110ad28 3102e25 110ad28 eb92bf4 3102e25 110ad28 eb92bf4 110ad28 eb92bf4 110ad28 a37a8cd 3102e25 8445047 eb92bf4 3102e25 110ad28 3102e25 110ad28 3102e25 eb92bf4 110ad28 3102e25 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 |
---
license: mit
base_model: roberta-base
tags:
- topic
- classification
- news
- roberta
metrics:
- accuracy
- f1
- precision
- recall
datasets:
- dstefa/New_York_Times_Topics
widget:
- text: >-
Olympic champion Kostas Kederis today left hospital ahead of his date with IOC inquisitors claiming his innocence and vowing.
example_title: Sports
- text: >-
Although many individuals are doing fever checks to screen for Covid-19, many Covid-19 patients never have a fever.
example_title: Health and Wellness
- text: >-
Twelve myths about Russia's War in Ukraine exposed
example_title: Crime
model-index:
- name: roberta-base_topic_classification_nyt_news
results:
- task:
name: Text Classification
type: text-classification
dataset:
name: New_York_Times_Topics
type: News
metrics:
- type: F1
name: F1
value: 0.91
- type: accuracy
name: accuracy
value: 0.91
- type: precision
name: precision
value: 0.91
- type: recall
name: recall
value: 0.91
pipeline_tag: text-classification
---
# roberta-base_topic_classification_nyt_news
This model is a fine-tuned version of [roberta-base](https://huggingface.co/roberta-base) on the NYT News dataset, which contains 256,000 news titles from articles published from 2000 to the present (https://www.kaggle.com/datasets/aryansingh0909/nyt-articles-21m-2000-present).
It achieves the following results on the test set of 51200 cases:
- Accuracy: 0.91
- F1: 0.91
- Precision: 0.91
- Recall: 0.91
## Training data
Training data was classified as follow:
class |Description
-|-
0 |Sports
1 |Arts, Culture, and Entertainment
2 |Business and Finance
3 |Health and Wellness
4 |Lifestyle and Fashion
5 |Science and Technology
6 |Politics
7 |Crime
## Training procedure
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 500
- num_epochs: 5
### Training results
| Training Loss | Epoch | Step | Validation Loss | Accuracy | F1 | Precision | Recall |
|:-------------:|:-----:|:------:|:---------------:|:--------:|:------:|:---------:|:------:|
| 0.3192 | 1.0 | 20480 | 0.4078 | 0.8865 | 0.8859 | 0.8892 | 0.8865 |
| 0.2863 | 2.0 | 40960 | 0.4271 | 0.8972 | 0.8970 | 0.8982 | 0.8972 |
| 0.1979 | 3.0 | 61440 | 0.3797 | 0.9094 | 0.9092 | 0.9098 | 0.9094 |
| 0.1239 | 4.0 | 81920 | 0.3981 | 0.9117 | 0.9113 | 0.9114 | 0.9117 |
| 0.1472 | 5.0 | 102400 | 0.4033 | 0.9137 | 0.9135 | 0.9134 | 0.9137 |
### Model performance
-|precision|recall|f1|support
-|-|-|-|-
Sports|0.97|0.98|0.97|6400
Arts, Culture, and Entertainment|0.94|0.95|0.94|6400
Business and Finance|0.85|0.84|0.84|6400
Health and Wellness|0.90|0.93|0.91|6400
Lifestyle and Fashion|0.95|0.95|0.95|6400
Science and Technology|0.89|0.83|0.86|6400
Politics|0.93|0.88|0.90|6400
Crime|0.85|0.93|0.89|6400
| | | |
accuracy|||0.91|51200
macro avg|0.91|0.91|0.91|51200
weighted avg|0.91|0.91|0.91|51200
### How to use roberta-base_topic_classification_nyt_news with HuggingFace
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from transformers import pipeline
tokenizer = AutoTokenizer.from_pretrained("dstefa/roberta-base_topic_classification_nyt_news")
model = AutoModelForSequenceClassification.from_pretrained("dstefa/roberta-base_topic_classification_nyt_news")
pipe = pipeline("text-classification", model=model, tokenizer=tokenizer)
text = "Kederis proclaims innocence Olympic champion Kostas Kederis today left hospital ahead of his date with IOC inquisitors claiming his innocence and vowing."
pipe(text)
[{'label': 'Sports', 'score': 0.9989326596260071}]
```
### Framework versions
- Transformers 4.32.1
- Pytorch 2.1.0+cu121
- Datasets 2.12.0
- Tokenizers 0.13.2
|