roberta-student-fined-tunned
This model is a fine-tuned version of roberta-base on a dataset provided by Kim Taeuk (๊นํ์ฑ), NLP teacher at Hanyang University.
The model was trained for multi-intent detection using the BlendX dataset, focusing on complex utterances containing multiple intents.
It achieves the following results on the evaluation set:
- Loss: 0.0053
- Exact Match Accuracy: 0.9075
Model description
The model is based on roberta-base, a robust transformer model pretrained on a large corpus of English text.
Fine-tuning was conducted on a specialized dataset focusing on multi-intent detection in utterances with complex intent structures.
Model Architecture
- Base Model: roberta-base
- Task: Multi-Intent Detection
- Languages: English
Strengths
High accuracy on evaluation data.
Capable of detecting multiple intents within a single utterance.
Limitations
Fine-tuned on a specific dataset; performance may vary on other tasks.
Limited to English text.
Intended uses & limitations
Use Cases
Multi-intent detection tasks such as customer service queries, virtual assistants, and dialogue systems.
Academic research and educational projects.
Limitations
May require additional fine-tuning for domain-specific applications.
Not designed for multilingual tasks.
Training and evaluation data
The model was trained on the BlendX dataset, a multi-intent detection benchmark focusing on realistic combinations of user intents in task-oriented dialogues.
Data Details:
The dataset used for training this model is based on the BlendX dataset, focusing on multi-intent detection in task-oriented dialogues. While the actual BlendX dataset covers instances that can have varying number of intents between 1 to 3, the dataset for this assignment only includes instances where there are 2 intents for simplicity.
Dataset License and Source
The dataset used for training this model is licensed under the GNU General Public License v2.
Important Notes:
- Any use, distribution, or modification of this dataset must comply with the terms of the GPL v2 license.
- The dataset source and its original license can be found in its official GitHub repository.
- Dataset File: Download Here
Dataset Format:
- File Type: JSON
- Size: 28,815 training samples, 1,513 validation samples
- Data Fields:
split
(string): Indicates if the sample belongs to the training or validation set.utterance
(string): The text input containing multiple intents.intent
(list of strings): The associated intents.
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 32
- eval_batch_size: 32
- seed: 42
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine_with_restarts
- warmup_steps: 200
- num_epochs: 20
- save_total_limit: 3
- weight_decay: 0.01
- eval_strategy: epoch
- save_strategy: epoch
- metric_for_best_model: eval_exact_match_accuracy
- load_best_model_at_end: True
- dataloader_pin_memory: True
- fp16: False
- greater_is_better: True
Training results
Training Loss | Epoch | Step | Validation Loss | Exact Match Accuracy |
---|---|---|---|---|
0.0723 | 1.0 | 2297 | 0.0720 | 0.0 |
0.0576 | 2.0 | 4594 | 0.0516 | 0.0 |
0.0328 | 3.0 | 6891 | 0.0264 | 0.0839 |
0.015 | 4.0 | 9188 | 0.0141 | 0.6907 |
0.0086 | 5.0 | 11485 | 0.0092 | 0.8771 |
0.0046 | 6.0 | 13782 | 0.0069 | 0.8929 |
0.0027 | 7.0 | 16079 | 0.0061 | 0.9002 |
0.0018 | 8.0 | 18376 | 0.0059 | 0.8936 |
0.0012 | 9.0 | 20673 | 0.0056 | 0.8995 |
0.0009 | 10.0 | 22970 | 0.0053 | 0.9075 |
0.0007 | 11.0 | 25267 | 0.0055 | 0.9055 |
0.0005 | 12.0 | 27564 | 0.0061 | 0.8976 |
0.0004 | 13.0 | 29861 | 0.0057 | 0.9061 |
Framework versions
- Transformers 4.47.0
- Pytorch 2.5.1+cu124
- Datasets 3.2.0
- Tokenizers 0.21.0
Improvement Perspectives
To achieve better results, several improvement strategies could be explored:
- Model Capacity Expansion: Test larger models like roberta-large or other bigger models.
- Batch Size Increase: Use larger batches for more stable updates.
- Gradient accumulation steps parameter: Play with the number of updates steps to accumulate the gradients for, before performing a backward/update pass.
- Learning Rate Management:
- Experiment with strategies like polynomial or others, with dynamic adjustment.
- Further reduce the learning rate
- Enhanced Preprocessing:
- Test data augmentation techniques such as random masking or synonym replacement.
- Further reduce the gap between the different categories.
- Change the weights according to the representativeness of the category.
- Use another dataset.
- Longer Training Duration: Increase the number of epochs and refine stopping criteria for more precise convergence.
- Model Ensembling: Use multiple models to improve prediction robustness.
- Advanced Attention Mechanisms: Test models using hierarchical attention or enhanced multi-head architectures.
- Metric: Choosing the best metric based on our problem.
These strategies require significant computational resources and extended training time but offer substantial potential for performance improvement.
- Downloads last month
- 6
Model tree for Meruem/roberta-student-fine-tuned
Base model
FacebookAI/roberta-base