RigelClauseNet: BERT-Based Fraud Clause Detector

RigelClauseNet is a fine-tuned BERT-based binary classifier that detects fraudulent, high-risk, or suspicious clauses in legal and policy-related documents, including privacy policies, loan agreements, and terms of service.

It is designed to help:

Legal analysts
Fintech systems
Regulatory auditors
End users seeking clarity in digital contracts

🔍 Use Case

Given a clause or paragraph from a document, the model outputs:

A binary risk label (SAFE, RISKY)
A probability confidence score
A breakdown of class probabilities

This enables organizations to flag suspicious clauses early, audit contracts, and build smarter compliance pipelines.

🧠 Model Details

Base Model: google-bert/bert-base-uncased
Architecture: BERT + Sequence Classification Head
Training Data: 5,000 semi-synthetic and curated clauses (labeled as SAFE or RISKY)
Classes:
- 0 → Safe clause
- 1 → Fraudulent/risky clause
Trained On: Google Colab with Hugging Face Transformers
Performance:
- Accuracy: 98.47%
- Precision: 99.19%
- Recall: 99.19%
- F1 Score: 99.99% (on validation set)

📌 Examples

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch.nn.functional as F
import torch

model = AutoModelForSequenceClassification.from_pretrained("nitinsri/RigelClauseNet")
tokenizer = AutoTokenizer.from_pretrained("nitinsri/RigelClauseNet")

def predict_clause(text):
    inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=128)
    with torch.no_grad():
        logits = model(**inputs).logits
        probs = F.softmax(logits, dim=1)
        label = torch.argmax(probs).item()
        return {
            "label": "RISKY" if label == 1 else "SAFE",
            "confidence": round(probs[0][label].item(), 4),
            "probabilities": probs.tolist()
        }

# Example
predict_clause("Late payments will incur a 25% monthly penalty.")

🧠 Intended Usage

You can use this model for:

Scanning uploaded PDFs, contracts, or policies
Highlighting or flagging suspicious legal language
Powering backend systems in legal-tech and compliance

🚫 Limitations

Trained on semi-synthetic clauses, not actual legal corporations.
Binary classifier only — it does not explain why a clause is risky.
Contextual or nested document logic is not supported (yet).

📂 Files

File	Description
`model.safetensors`	Fine-tuned model weights
`config.json`	BERT classification head config
`tokenizer.json`	Tokenizer for preprocessing
`vocab.txt`	BERT vocabulary

💡 Future Plans

Multi-class classification (safe, risky, ambiguous)
Explanation layer (highlight key tokens that trigger risk)
Full document-level context scanning
Integration with Hugging Face Spaces (with UI)

👨‍💻 Author

Built by [Nithin Sri]
🚀 Hugging Face: https://huggingface.co/nitinsri
📧 Email: [email protected]

📜 License

MIT License

“Clarity and transparency in digital contracts are not luxuries — they are rights. RigelGuard helps enforce that.”

nitinsri
/

RigelClauseNet