|
--- |
|
license: mit |
|
language: |
|
- en |
|
base_model: |
|
- distilbert/distilbert-base-uncased |
|
tags: |
|
- finance |
|
- document-classification |
|
datasets: |
|
- gretelai/synthetic_pii_finance_multilingual |
|
metrics: |
|
- accuracy |
|
pipeline_tag: text-classification |
|
--- |
|
|
|
# ๐ Finance Document Classification |
|
|
|
A fine-tuned DistilBERT model for classifying finance-related documents. This model is based on `distilbert-base-uncased` and fine-tuned on the English subset of the Synthetic PII Finance Multilingual dataset. It is suitable for multi-class document classification tasks in the finance domain. |
|
|
|
## Model Details |
|
- **Base Model:** distilbert-base-uncased |
|
- **Task:** Multi-class finance document classification |
|
- **Language:** English |
|
- **Dataset:** Synthetic PII Finance Multilingual (English subset) |
|
- **Framework:** Hugging Face Transformers |
|
|
|
## Metrics |
|
| Metric | Score | |
|
|-------------|---------| |
|
| Accuracy | 98.65% | |
|
| Precision | 98.70% | |
|
| Recall | 98.65% | |
|
| F1 | 98.65% | |
|
|
|
## How to Use |
|
|
|
```python |
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification |
|
import torch |
|
|
|
model_id = "Ar86Bat/Finance-Document-Text-Classification" |
|
tokenizer = AutoTokenizer.from_pretrained(model_id) |
|
model = AutoModelForSequenceClassification.from_pretrained(model_id) |
|
|
|
text = "Client requested details about investment restrictions." |
|
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True) |
|
|
|
with torch.no_grad(): |
|
outputs = model(**inputs) |
|
probs = torch.nn.functional.softmax(outputs.logits, dim=-1) |
|
pred_id = torch.argmax(probs, dim=1).item() |
|
|
|
print("Predicted class ID:", pred_id) |
|
``` |
|
|
|
## Intended Uses & Limitations |
|
- **Intended use:** Automated classification of finance-related documents for compliance, organization, or workflow automation. |
|
- **Not suitable for:** Non-financial or out-of-domain documents without further fine-tuning. |
|
|
|
## Example API Usage |
|
This model can be served via FastAPI or other REST frameworks. Example request/response: |
|
|
|
**Request:** |
|
```json |
|
{ |
|
"text": "Client requested details about investment restrictions." |
|
} |
|
``` |
|
**Response:** |
|
```json |
|
{ |
|
"label": "Investment Restrictions", |
|
"confidence": 0.987 |
|
} |
|
``` |
|
|
|
## Citation |
|
If you use this model, please cite the repository: |
|
|
|
``` |
|
@misc{ar86bat_finance_doc_classification_2025, |
|
author = {Arif Hizlan}, |
|
title = {Finance Document Text Classification}, |
|
year = {2025}, |
|
howpublished = {\\url{https://huggingface.co/Ar86Bat/Finance-Document-Text-Classification}} |
|
} |
|
``` |
|
|
|
## License |
|
MIT License |