mikemcrae25
/

newswire_classifiers

Text Classification

English

Model card Files Files and versions Community

mikemcrae25 commited on Feb 16

Commit

e4144f4

verified ·

1 Parent(s): c6715fc

Update README.md

Browse files

Files changed (1) hide show

README.md +78 -8

README.md CHANGED Viewed

@@ -1,12 +1,82 @@
-# Newswire Classifier
-This repository contains three separate BERT models for classifying news articles as from AP, UPI, or NEA.
-- **AP Classifier**: Trained on articles from AP.
-- **UPI Classifier**: Trained on articles from UPI.
-- **NEA Classifier**: Trained on articles from NEA.
-### Usage
 ```python
 from transformers import AutoModelForSequenceClassification, AutoTokenizer
-model = AutoModelForSequenceClassification.from_pretrained('username/newswire_classifier/AP')
-tokenizer = AutoTokenizer.from_pretrained('username/newswire_classifier/AP')
 ```

+---
+license: mit
+language:
+- en
+metrics:
+- accuracy
+- bertscore
+- f1
+base_model:
+- google-bert/bert-base-uncased
+pipeline_tag: text-classification
+---
+# Newswire Classifier (AP, UPI, NEA) - BERT Transformers
+## 📘 Overview
+This repository contains three separately trained BERT models for identifying whether a newspaper article was produced by one of three major newswire services:
+- **AP (Associated Press)**
+- **UPI (United Press International)**
+- **NEA (Newspaper Enterprise Association)**
+The models are designed for historical news classification from public-domain newswire articles (1960–1975).
+## 🧠 Model Architecture
+- **Base Model:** `bert-base-uncased`
+- **Task:** Binary classification (`1` if from the specific newswire, `0` otherwise)
+- **Optimizer:** AdamW
+- **Loss Function:** Binary Cross-Entropy with Logits
+- **Batch Size:** 16
+- **Epochs:** 4
+- **Learning Rate:** 2e-5
+- **Device:** TPU (v2-8) in Google Colab
+## 📊 Training Data
+- **Source:** Historical newspapers (1960–1975, public domain)
+- **Articles:** 4000 per training round (1000 from target newswire, 3000 from other sources)
+- **Features Used:** Headline, author, and first 100 characters of the article.
+- **Labeling:** `1` for articles from the target newswire, `0` for all others.
+## 🚀 Model Performance
+| Model | Accuracy | Precision | Recall | F1 Score |
+|-------|----------|----------|-------|----------|
+| **AP** | 0.9925 | 0.9926 | 0.9925 | 0.9925 |
+| **UPI** | 0.9999 | 0.9999 | 0.9999 | 0.9999 |
+| **NEA** | 0.9875 | 0.9880 | 0.9875 | 0.9876 |
+## 🛠️ Usage
+### Installation
+```bash
+pip install transformers torch
+```
+### Example Inference (AP Classifier)
 ```python
 from transformers import AutoModelForSequenceClassification, AutoTokenizer
+model = AutoModelForSequenceClassification.from_pretrained("username/newswire_classifier/AP")
+tokenizer = AutoTokenizer.from_pretrained("username/newswire_classifier/AP")
+text = "(AP) President speaks at conference..."
+inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128)
+outputs = model(**inputs)
+prediction = outputs.logits.argmax().item()
+print("AP Article" if prediction == 1 else "Not AP Article")
+```
+## ⚙️ Recommended Usage Notes
+- The models were trained on headlines, authors, and the first 100 characters of articles, as the mention of the newswire often appears in these sections. Using the same format for inference may improve accuracy.
+## 📜 Licensing & Data Source
+- **Training Data:** Historical newspaper articles (1960–1975) from public-domain sources.
+- **License:** Public domain (for data) and MIT License (for model and code).
+## 💬 Citation
+If you use these models, please cite:
 ```
+@misc{newswire_classifier,
+  author = {Your Name},
+  title = {Newswire Classifier (AP, UPI, NEA) - BERT Transformers},
+  year = {2025},
+  publisher = {Hugging Face},
+  url = {https://huggingface.co/username/newswire_classifier}
+}
+```