roberta-student-fined-tunned

This model is a fine-tuned version of roberta-base on a dataset provided by Kim Taeuk (๊น€ํƒœ์šฑ), NLP teacher at Hanyang University.

The model was trained for multi-intent detection using the BlendX dataset, focusing on complex utterances containing multiple intents.

It achieves the following results on the evaluation set:

  • Loss: 0.0053
  • Exact Match Accuracy: 0.9075

Model description

The model is based on roberta-base, a robust transformer model pretrained on a large corpus of English text.

Fine-tuning was conducted on a specialized dataset focusing on multi-intent detection in utterances with complex intent structures.

Model Architecture

  • Base Model: roberta-base
  • Task: Multi-Intent Detection
  • Languages: English

Strengths

High accuracy on evaluation data.

Capable of detecting multiple intents within a single utterance.

Limitations

Fine-tuned on a specific dataset; performance may vary on other tasks.

Limited to English text.

Intended uses & limitations

Use Cases

Multi-intent detection tasks such as customer service queries, virtual assistants, and dialogue systems.

Academic research and educational projects.

Limitations

May require additional fine-tuning for domain-specific applications.

Not designed for multilingual tasks.

Training and evaluation data

The model was trained on the BlendX dataset, a multi-intent detection benchmark focusing on realistic combinations of user intents in task-oriented dialogues.

Data Details:

The dataset used for training this model is based on the BlendX dataset, focusing on multi-intent detection in task-oriented dialogues. While the actual BlendX dataset covers instances that can have varying number of intents between 1 to 3, the dataset for this assignment only includes instances where there are 2 intents for simplicity.

Dataset License and Source

The dataset used for training this model is licensed under the GNU General Public License v2.

Important Notes:

  • Any use, distribution, or modification of this dataset must comply with the terms of the GPL v2 license.
  • The dataset source and its original license can be found in its official GitHub repository.
  • Dataset File: Download Here

Dataset Format:

  • File Type: JSON
  • Size: 28,815 training samples, 1,513 validation samples
  • Data Fields:
    • split (string): Indicates if the sample belongs to the training or validation set.
    • utterance (string): The text input containing multiple intents.
    • intent (list of strings): The associated intents.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 32
  • eval_batch_size: 32
  • seed: 42
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine_with_restarts
  • warmup_steps: 200
  • num_epochs: 20
  • save_total_limit: 3
  • weight_decay: 0.01
  • eval_strategy: epoch
  • save_strategy: epoch
  • metric_for_best_model: eval_exact_match_accuracy
  • load_best_model_at_end: True
  • dataloader_pin_memory: True
  • fp16: False
  • greater_is_better: True

Training results

Training Loss Epoch Step Validation Loss Exact Match Accuracy
0.0723 1.0 2297 0.0720 0.0
0.0576 2.0 4594 0.0516 0.0
0.0328 3.0 6891 0.0264 0.0839
0.015 4.0 9188 0.0141 0.6907
0.0086 5.0 11485 0.0092 0.8771
0.0046 6.0 13782 0.0069 0.8929
0.0027 7.0 16079 0.0061 0.9002
0.0018 8.0 18376 0.0059 0.8936
0.0012 9.0 20673 0.0056 0.8995
0.0009 10.0 22970 0.0053 0.9075
0.0007 11.0 25267 0.0055 0.9055
0.0005 12.0 27564 0.0061 0.8976
0.0004 13.0 29861 0.0057 0.9061

Framework versions

  • Transformers 4.47.0
  • Pytorch 2.5.1+cu124
  • Datasets 3.2.0
  • Tokenizers 0.21.0

Improvement Perspectives

To achieve better results, several improvement strategies could be explored:

  • Model Capacity Expansion: Test larger models like roberta-large or other bigger models.
  • Batch Size Increase: Use larger batches for more stable updates.
  • Gradient accumulation steps parameter: Play with the number of updates steps to accumulate the gradients for, before performing a backward/update pass.
  • Learning Rate Management:
    • Experiment with strategies like polynomial or others, with dynamic adjustment.
    • Further reduce the learning rate
  • Enhanced Preprocessing:
    • Test data augmentation techniques such as random masking or synonym replacement.
    • Further reduce the gap between the different categories.
    • Change the weights according to the representativeness of the category.
    • Use another dataset.
  • Longer Training Duration: Increase the number of epochs and refine stopping criteria for more precise convergence.
  • Model Ensembling: Use multiple models to improve prediction robustness.
  • Advanced Attention Mechanisms: Test models using hierarchical attention or enhanced multi-head architectures.
  • Metric: Choosing the best metric based on our problem.

These strategies require significant computational resources and extended training time but offer substantial potential for performance improvement.

Downloads last month
6
Safetensors
Model size
125M params
Tensor type
F32
ยท
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for Meruem/roberta-student-fine-tuned

Finetuned
(1375)
this model