roberta-student-fined-tunned

This model is a fine-tuned version of roberta-base on a dataset provided by Kim Taeuk (김태욱), NLP teacher at Hanyang University.

The model was trained for multi-intent detection using the BlendX dataset, focusing on complex utterances containing multiple intents.

It achieves the following results on the evaluation set:

Loss: 0.0053
Exact Match Accuracy: 0.9075

Model description

The model is based on roberta-base, a robust transformer model pretrained on a large corpus of English text.

Fine-tuning was conducted on a specialized dataset focusing on multi-intent detection in utterances with complex intent structures.

Model Architecture

Base Model: roberta-base
Task: Multi-Intent Detection
Languages: English

Strengths

High accuracy on evaluation data.

Capable of detecting multiple intents within a single utterance.

Limitations

Fine-tuned on a specific dataset; performance may vary on other tasks.

Limited to English text.

Intended uses & limitations

Use Cases

Multi-intent detection tasks such as customer service queries, virtual assistants, and dialogue systems.

Academic research and educational projects.

Limitations

May require additional fine-tuning for domain-specific applications.

Not designed for multilingual tasks.

Training and evaluation data

The model was trained on the BlendX dataset, a multi-intent detection benchmark focusing on realistic combinations of user intents in task-oriented dialogues.

Data Details:

The dataset used for training this model is based on the BlendX dataset, focusing on multi-intent detection in task-oriented dialogues. While the actual BlendX dataset covers instances that can have varying number of intents between 1 to 3, the dataset for this assignment only includes instances where there are 2 intents for simplicity.

Dataset License and Source

The dataset used for training this model is licensed under the GNU General Public License v2.

Important Notes:

Any use, distribution, or modification of this dataset must comply with the terms of the GPL v2 license.
The dataset source and its original license can be found in its official GitHub repository.
Dataset File: Download Here

Dataset Format:

File Type: JSON
Size: 28,815 training samples, 1,513 validation samples
Data Fields:
- split (string): Indicates if the sample belongs to the training or validation set.
- utterance (string): The text input containing multiple intents.
- intent (list of strings): The associated intents.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 32
eval_batch_size: 32
seed: 42
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine_with_restarts
warmup_steps: 200
num_epochs: 20
save_total_limit: 3
weight_decay: 0.01
eval_strategy: epoch
save_strategy: epoch
metric_for_best_model: eval_exact_match_accuracy
load_best_model_at_end: True
dataloader_pin_memory: True
fp16: False
greater_is_better: True

Training results

Training Loss	Epoch	Step	Validation Loss	Exact Match Accuracy
0.0723	1.0	2297	0.0720	0.0
0.0576	2.0	4594	0.0516	0.0
0.0328	3.0	6891	0.0264	0.0839
0.015	4.0	9188	0.0141	0.6907
0.0086	5.0	11485	0.0092	0.8771
0.0046	6.0	13782	0.0069	0.8929
0.0027	7.0	16079	0.0061	0.9002
0.0018	8.0	18376	0.0059	0.8936
0.0012	9.0	20673	0.0056	0.8995
0.0009	10.0	22970	0.0053	0.9075
0.0007	11.0	25267	0.0055	0.9055
0.0005	12.0	27564	0.0061	0.8976
0.0004	13.0	29861	0.0057	0.9061

Framework versions

Transformers 4.47.0
Pytorch 2.5.1+cu124
Datasets 3.2.0
Tokenizers 0.21.0

Improvement Perspectives

To achieve better results, several improvement strategies could be explored:

Model Capacity Expansion: Test larger models like roberta-large or other bigger models.
Batch Size Increase: Use larger batches for more stable updates.
Gradient accumulation steps parameter: Play with the number of updates steps to accumulate the gradients for, before performing a backward/update pass.
Learning Rate Management:
- Experiment with strategies like polynomial or others, with dynamic adjustment.
- Further reduce the learning rate
Enhanced Preprocessing:
- Test data augmentation techniques such as random masking or synonym replacement.
- Further reduce the gap between the different categories.
- Change the weights according to the representativeness of the category.
- Use another dataset.
Longer Training Duration: Increase the number of epochs and refine stopping criteria for more precise convergence.
Model Ensembling: Use multiple models to improve prediction robustness.
Advanced Attention Mechanisms: Test models using hierarchical attention or enhanced multi-head architectures.
Metric: Choosing the best metric based on our problem.

These strategies require significant computational resources and extended training time but offer substantial potential for performance improvement.

Meruem
/

roberta-student-fine-tuned

roberta-student-fined-tunned

Model description

Model Architecture

Strengths

Limitations

Intended uses & limitations

Use Cases

Limitations

Training and evaluation data

Data Details:

Dataset License and Source

Important Notes:

Dataset Format:

Training procedure

Training hyperparameters

Training results

Framework versions

Improvement Perspectives

Model tree for Meruem/roberta-student-fine-tuned

Evaluation results