Newswire Classifier (AP, UPI, NEA) - BERT Transformers
π Overview
This repository contains three separately trained BERT models for identifying whether a newspaper article was produced by one of three major newswire services:
- AP (Associated Press)
- UPI (United Press International)
- NEA (Newspaper Enterprise Association)
The models are designed for historical news classification from public-domain newswire articles (1960β1975).
π§ Model Architecture
- Base Model:
bert-base-uncased
- Task: Binary classification (
1
if from the specific newswire,0
otherwise) - Optimizer: AdamW
- Loss Function: Binary Cross-Entropy with Logits
- Batch Size: 16
- Epochs: 4
- Learning Rate: 2e-5
- Device: TPU (v2-8) in Google Colab
π Training Data
- Source: Historical newspapers (1960β1975, public domain)
- Articles: 4000 per training round (1000 from target newswire, 3000 from other sources)
- Features Used: Headline, author, and first 100 characters of the article.
- Labeling:
1
for articles from the target newswire,0
for all others.
π Model Performance
Model | Accuracy | Precision | Recall | F1 Score |
---|---|---|---|---|
AP | 0.9925 | 0.9926 | 0.9925 | 0.9925 |
UPI | 0.9999 | 0.9999 | 0.9999 | 0.9999 |
NEA | 0.9875 | 0.9880 | 0.9875 | 0.9876 |
π οΈ Usage
Installation
pip install transformers torch
Example Inference (AP Classifier)
from transformers import AutoModelForSequenceClassification, AutoTokenizer
model = AutoModelForSequenceClassification.from_pretrained("mike-mcrae/newswire_classifier/AP")
tokenizer = AutoTokenizer.from_pretrained("mike-mcrae/newswire_classifier/AP")
text = "(AP) President speaks at conference..."
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128)
outputs = model(**inputs)
prediction = outputs.logits.argmax().item()
print("AP Article" if prediction == 1 else "Not AP Article")
βοΈ Recommended Usage Notes
- The models were trained on a combination of the first 100 characters of headline + author + the first 100 characters of articles, as the mention of the newswire often appears in these sections. Using the same format for inference may improve accuracy.
π Licensing & Data Source
- Training Data: Historical newspaper articles (1960β1975) from public-domain sources.
- License: Public domain (for data) and MIT License (for model and code).
π¬ Citation
If you use these models, please cite:
@misc{newswire_classifier,
author = {McRae, Michael},
title = {Newswire Classifier (AP, UPI, NEA) - BERT Transformers},
year = {2025},
publisher = {Hugging Face},
url = {https://huggingface.co/username/newswire_classifier}
}
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The model has no library tag.
Model tree for mikemcrae25/newswire_classifiers
Base model
google-bert/bert-base-uncased