RigelClauseNet: BERT-Based Fraud Clause Detector

RigelClauseNet is a fine-tuned BERT-based binary classifier that detects fraudulent, high-risk, or suspicious clauses in legal and policy-related documents, including privacy policies, loan agreements, and terms of service.

It is designed to help:

  • Legal analysts
  • Fintech systems
  • Regulatory auditors
  • End users seeking clarity in digital contracts

🔍 Use Case

Given a clause or paragraph from a document, the model outputs:

  • A binary risk label (SAFE, RISKY)
  • A probability confidence score
  • A breakdown of class probabilities

This enables organizations to flag suspicious clauses early, audit contracts, and build smarter compliance pipelines.


🧠 Model Details

  • Base Model: google-bert/bert-base-uncased
  • Architecture: BERT + Sequence Classification Head
  • Training Data: 5,000 semi-synthetic and curated clauses (labeled as SAFE or RISKY)
  • Classes:
    • 0 → Safe clause
    • 1 → Fraudulent/risky clause
  • Trained On: Google Colab with Hugging Face Transformers
  • Performance:
    • Accuracy: 98.47%
    • Precision: 99.19%
    • Recall: 99.19%
    • F1 Score: 99.99% (on validation set)

📌 Examples

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch.nn.functional as F
import torch

model = AutoModelForSequenceClassification.from_pretrained("nitinsri/RigelClauseNet")
tokenizer = AutoTokenizer.from_pretrained("nitinsri/RigelClauseNet")

def predict_clause(text):
    inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=128)
    with torch.no_grad():
        logits = model(**inputs).logits
        probs = F.softmax(logits, dim=1)
        label = torch.argmax(probs).item()
        return {
            "label": "RISKY" if label == 1 else "SAFE",
            "confidence": round(probs[0][label].item(), 4),
            "probabilities": probs.tolist()
        }

# Example
predict_clause("Late payments will incur a 25% monthly penalty.")

🧠 Intended Usage

You can use this model for:

  • Scanning uploaded PDFs, contracts, or policies
  • Highlighting or flagging suspicious legal language
  • Powering backend systems in legal-tech and compliance

🚫 Limitations

  • Trained on semi-synthetic clauses, not actual legal corporations.
  • Binary classifier only — it does not explain why a clause is risky.
  • Contextual or nested document logic is not supported (yet).

📂 Files

File Description
model.safetensors Fine-tuned model weights
config.json BERT classification head config
tokenizer.json Tokenizer for preprocessing
vocab.txt BERT vocabulary

💡 Future Plans

  • Multi-class classification (safe, risky, ambiguous)
  • Explanation layer (highlight key tokens that trigger risk)
  • Full document-level context scanning
  • Integration with Hugging Face Spaces (with UI)

👨‍💻 Author

Built by [Nithin Sri]
🚀 Hugging Face: https://huggingface.co/nitinsri
📧 Email: [email protected]


📜 License

MIT License


“Clarity and transparency in digital contracts are not luxuries — they are rights. RigelGuard helps enforce that.”


Downloads last month
0
Safetensors
Model size
109M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for nitinsri/RigelClauseNet

Adapter
(78)
this model