metadata

license: cc-by-4.0
language:
  - en
pipeline_tag: text-classification
tags:
  - RoBERTa-large
  - topic
  - news

Fine-tuned RoBERTa-large for detecting news on government regulation

Model Description

This model is a finetuned RoBERTa-large, for classifying whether news articles are about government regulation.

How to Use

from transformers import pipeline
classifier = pipeline("text-classification", model="dell-research-harvard/topic-govt_regulation")
classifier("Senate passes gun control bill")

Training data

The model was trained on a hand-labelled sample of data from the NEWSWIRE dataset.

Split	Size
Train	612
Dev	131
Test	131

Test set results

Metric	Result
F1	0.8750
Accuracy	0.9237
Precision	0.7955
Recall	0.9722

Citation Information

You can cite this dataset using

@misc{silcock2024newswirelargescalestructureddatabase,
      title={Newswire: A Large-Scale Structured Database of a Century of Historical News}, 
      author={Emily Silcock and Abhishek Arora and Luca D'Amico-Wong and Melissa Dell},
      year={2024},
      eprint={2406.09490},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2406.09490}, 
}

Applications

We applied this model to a century of historical news articles. You can see all the classifications in the NEWSWIRE dataset.