RigelClauseNet: BERT-Based Fraud Clause Detector
RigelClauseNet is a fine-tuned BERT-based binary classifier that detects fraudulent, high-risk, or suspicious clauses in legal and policy-related documents, including privacy policies, loan agreements, and terms of service.
It is designed to help:
- Legal analysts
- Fintech systems
- Regulatory auditors
- End users seeking clarity in digital contracts
🔍 Use Case
Given a clause or paragraph from a document, the model outputs:
- A binary risk label (
SAFE
,RISKY
) - A probability confidence score
- A breakdown of class probabilities
This enables organizations to flag suspicious clauses early, audit contracts, and build smarter compliance pipelines.
🧠 Model Details
- Base Model:
google-bert/bert-base-uncased
- Architecture: BERT + Sequence Classification Head
- Training Data: 5,000 semi-synthetic and curated clauses (labeled as
SAFE
orRISKY
) - Classes:
0
→ Safe clause1
→ Fraudulent/risky clause
- Trained On: Google Colab with Hugging Face Transformers
- Performance:
- Accuracy: 98.47%
- Precision: 99.19%
- Recall: 99.19%
- F1 Score: 99.99% (on validation set)
📌 Examples
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch.nn.functional as F
import torch
model = AutoModelForSequenceClassification.from_pretrained("nitinsri/RigelClauseNet")
tokenizer = AutoTokenizer.from_pretrained("nitinsri/RigelClauseNet")
def predict_clause(text):
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=128)
with torch.no_grad():
logits = model(**inputs).logits
probs = F.softmax(logits, dim=1)
label = torch.argmax(probs).item()
return {
"label": "RISKY" if label == 1 else "SAFE",
"confidence": round(probs[0][label].item(), 4),
"probabilities": probs.tolist()
}
# Example
predict_clause("Late payments will incur a 25% monthly penalty.")
🧠 Intended Usage
You can use this model for:
- Scanning uploaded PDFs, contracts, or policies
- Highlighting or flagging suspicious legal language
- Powering backend systems in legal-tech and compliance
🚫 Limitations
- Trained on semi-synthetic clauses, not actual legal corporations.
- Binary classifier only — it does not explain why a clause is risky.
- Contextual or nested document logic is not supported (yet).
📂 Files
File | Description |
---|---|
model.safetensors |
Fine-tuned model weights |
config.json |
BERT classification head config |
tokenizer.json |
Tokenizer for preprocessing |
vocab.txt |
BERT vocabulary |
💡 Future Plans
- Multi-class classification (
safe
,risky
,ambiguous
) - Explanation layer (highlight key tokens that trigger risk)
- Full document-level context scanning
- Integration with Hugging Face Spaces (with UI)
👨💻 Author
Built by [Nithin Sri]
🚀 Hugging Face: https://huggingface.co/nitinsri
📧 Email: [email protected]
📜 License
MIT License
“Clarity and transparency in digital contracts are not luxuries — they are rights. RigelGuard helps enforce that.”
- Downloads last month
- 0
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for nitinsri/RigelClauseNet
Base model
google-bert/bert-base-uncased