DistilBERT Ticket Classifier (Distil_Bert_V3)

Model Overview

This is a fine-tuned DistilBERT model (distilbert-base-cased) designed to classify defect tickets and assign them to the appropriate team based on their text content. It cleaned the ticket data from Defect_ticket_V2.csv by fixing missing values input of ticket Description, Comment, and Summary, and predicts one of 5 team labels, each linked to a team email for automated routing.

  • Model Type: DistilBERT for Sequence Classification
  • Framework: PyTorch
  • Repository: ZAM-ITI-110/Distil_Bert_V3
  • License: MIT (see YAML metadata above)
  • Created: February 2025
  • Creator: AUNGHLAINGTUN/Student ID6319250G NYP

Intended Use

This model is intended for:

  • Automating ticket assignment in IT support or defect tracking systems.
  • Reducing manual triage time by predicting the responsible team based on ticket text.

Use Case

  • Input: A ticket with fields Description, Comment, and Summary (e.g., "Urgent server crash reported in production").
  • Output: A team label (0-4) mapped to a team email (e.g., [email protected]).

Out of Scope

  • Not designed for multi-label classification or sentiment analysis.
  • May not generalize well to tickets outside the training domain (e.g., non-technical support tickets).

Training Data

  • Dataset: Defect_ticket_v2.csv (private dataset)
  • Size: Approximately 5,000 samples (70% train: ~3,504, 15% validation: ~750, 15% test: ~750).
  • Features: Combined text from Description, Comment, and Summary columns.
  • Labels: 5 unique team labels (encoded as 0-4), derived from the Assigned Team column.
  • Preprocessing: Missing values filled with empty strings; text truncated/padded to 512 tokens.

Note: The dataset is not publicly available due to privacy constraints.

Training Procedure

  • Base Model: distilbert-base-cased
  • Fine-Tuning:
    • Epochs: 5
    • Batch Size: 8
    • Optimizer: AdamW (learning rate: 3e-5, weight decay: 0.01)
    • Scheduler: Linear with 10% warmup steps
  • Hardware: Trained on Google Colab with a T4 GPU (~31 seconds/epoch).
  • Mixed Precision: Enabled via PyTorch AMP for efficiency.
  • Loss Function: CrossEntropyLoss

Training Metrics

Epoch Train Loss Validation Loss Validation Accuracy
1 0.4021 0.0038 100%
2 0.0031 0.0011 100%
3 0.0013 0.0006 100%
4 0.0008 0.0004 100%
5 0.0007 0.0004 100%
  • Test Accuracy: 100% (on ~750 test samples).

Evaluation

  • Performance: Achieved 100% accuracy on both validation and test sets, indicating excellent fit to the provided data.
  • Caveats:
    • Perfect accuracy may suggest an easy classification task, limited dataset diversity, or potential data leakage (e.g., duplicates across splits).
    • Real-world performance on new, unseen tickets should be validated.

How to Use

  • Predicts the appropriate team and email for up to 6 ticket descriptions.
  • Click 'Predict' for each ticket or then 'Send Tickets' to process for all .

Installation

pip install transformers torch 
Downloads last month
5
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Space using ZAM-ITI-110/Distil_Bert_V3 1