Spaces:
Runtime error
Runtime error
# Configuation | |
## 1. Introduction | |
Configuration is divided into fine-grained reusable modules: | |
- `base`: basic configuration | |
- `logger`: logger setting | |
- `model_manager`: loading and saving model parameters | |
- `accelerator`: whether to enable multi-GPU | |
- `dataset`: dataset management | |
- `evaluator`: evaluation and metrics setting. | |
- `tokenizer`: Tokenizer initiation and tokenizing setting. | |
- `optimizer`: Optimizer initiation setting. | |
- `scheduler`: scheduler initiation setting. | |
- `model`: model construction setting. | |
From Sec. 2 to Sec. 11, we will describe the configuration in detail. Or you can see [Examples](examples/README.md) for Quick Start. | |
NOTE: `_*_` config are reserved fields in OpenSLU. | |
## Configuration Item Script | |
In OpenSLU configuration, we support simple calculation script for each configuration item. For example, we can get `dataset_name` by using `{dataset.dataset_name}`, and fill its value into python script `'LightChen2333/agif-slu-' + '*'`.(Without '', `{dataset.dataset_name}` value will be treated as a variable). | |
NOTE: each item with `{}` will be treated as python script. | |
```yaml | |
tokenizer: | |
_from_pretrained_: "'LightChen2333/agif-slu-' + '{dataset.dataset_name}'" # Support simple calculation script | |
``` | |
## `base` Config | |
```yaml | |
# `start_time` will generated automatically when start any config script, needless to be assigned. | |
# start_time: xxxxxxxx | |
base: | |
name: "OpenSLU" # project/logger name | |
multi_intent: false # whether to enable multi-intent setting | |
train: True # enable train else enable zero-shot | |
test: True # enable test during train. | |
device: cuda # device for cuda/cpu | |
seed: 42 # random seed | |
best_key: EMA # save model by which metric[intent_acc/slot_f1/EMA] | |
tokenizer_name: word_tokenizer # tokenizer: word_tokenizer for no pretrained model, else use [AutoTokenizer] tokenizer name | |
add_special_tokens: false # whether add [CLS], [SEP] special tokens | |
epoch_num: 300 # train epoch num | |
# eval_step: 280 # if eval_by_epoch = false and eval_step > 0, will evaluate model by steps | |
eval_by_epoch: true # evaluate model by epoch | |
batch_size: 16 # batch size | |
``` | |
## `logger` Config | |
```yaml | |
logger: | |
# `wandb` is supported both in single- multi-GPU, | |
# `tensorboard` is only supported in multi-GPU, | |
# and `fitlog` is only supported in single-GPU | |
logger_type: wandb | |
``` | |
## `model_manager` Config | |
```yaml | |
model_manager: | |
# if load_dir != `null`, OpenSLU will try to load checkpoint to continue training, | |
# if load_dir == `null`, OpenSLU will restart training. | |
load_dir: null | |
# The dir path to save model and training state. | |
# if save_dir == `null` model will be saved to `save/{start_time}` | |
save_dir: save/stack | |
# save_mode can be selected in [save-by-step, save-by-eval] | |
# `save-by-step` means save model only by {save_step} steps without evaluation. | |
# `save-by-eval` means save model by best validation performance | |
save_mode: save-by-eval | |
# save_step: 100 # only enabled when save_mode == `save-by-step` | |
max_save_num: 1 # The number of best models will be saved. | |
``` | |
## `accelerator` Config | |
```yaml | |
accelerator: | |
use_accelerator: false # will enable `accelerator` if use_accelerator is `true` | |
``` | |
## `dataset` Config | |
```yaml | |
dataset: | |
# support load model from hugging-face. | |
# dataset_name can be selected in [atis, snips, mix-atis, mix-snips] | |
dataset_name: atis | |
# support assign any one of dataset path and other dataset split is the same as split in `dataset_name` | |
# train: atis # support load model from hugging-face or assigned local data path. | |
# validation: {root}/ATIS/dev.jsonl | |
# test: {root}/ATIS/test.jsonl | |
``` | |
## `evaluator` Config | |
```yaml | |
evaluator: | |
best_key: EMA # the metric to judge the best model | |
eval_by_epoch: true # Evaluate after an epoch if `true`. | |
# Evaluate after {eval_step} steps if eval_by_epoch == `false`. | |
# eval_step: 1800 | |
# metric is supported the metric as below: | |
# - intent_acc | |
# - slot_f1 | |
# - EMA | |
# - intent_f1 | |
# - macro_intent_f1 | |
# - micro_intent_f1 | |
# NOTE: [intent_f1, macro_intent_f1, micro_intent_f1] is only supported in multi-intent setting. intent_f1 and macro_intent_f1 is the same metric. | |
metric: | |
- intent_acc | |
- slot_f1 | |
- EMA | |
``` | |
## `tokenizer` Config | |
```yaml | |
tokenizer: | |
# Init tokenizer. Support `word_tokenizer` and other tokenizers in huggingface. | |
_tokenizer_name_: word_tokenizer | |
# if `_tokenizer_name_` is not assigned, you can load pretrained tokenizer from hugging-face. | |
# _from_pretrained_: LightChen2333/stack-propagation-slu-atis | |
_padding_side_: right # the padding side of tokenizer, support [left/ right] | |
# Align mode between text and slot, support [fast/ general], | |
# `general` is supported in most tokenizer, `fast` is supported only in small portion of tokenizers. | |
_align_mode_: fast | |
_to_lower_case_: true | |
add_special_tokens: false # other tokenizer args, you can add other args to tokenizer initialization except `_*_` format args | |
max_length: 512 | |
``` | |
## `optimizer` Config | |
```yaml | |
optimizer: | |
_model_target_: torch.optim.Adam # Optimizer class/ function return Optimizer object | |
_model_partial_: true # partial load configuration. Here will add model.parameters() to complete all Optimizer parameters | |
lr: 0.001 # learning rate | |
weight_decay: 1e-6 # weight decay | |
``` | |
## `scheduler` Config | |
```yaml | |
scheduler: | |
_model_target_: transformers.get_scheduler | |
_model_partial_: true # partial load configuration. Here will add optimizer, num_training_steps to complete all Optimizer parameters | |
name : "linear" | |
num_warmup_steps: 0 | |
``` | |
## `model` Config | |
```yaml | |
model: | |
# _from_pretrained_: LightChen2333/stack-propagation-slu-atis # load model from hugging-face and is not need to assigned any parameters below. | |
_model_target_: model.OpenSLUModel # the general model class, can automatically build the model through configuration. | |
encoder: | |
_model_target_: model.encoder.AutoEncoder # auto-encoder to autoload provided encoder model | |
encoder_name: self-attention-lstm # support [lstm/ self-attention-lstm] and other pretrained models those hugging-face supported | |
embedding: # word embedding layer | |
# load_embedding_name: glove.6B.300d.txt # support autoload glove embedding. | |
embedding_dim: 256 # embedding dim | |
dropout_rate: 0.5 # dropout ratio after embedding | |
lstm: | |
layer_num: 1 # lstm configuration | |
bidirectional: true | |
output_dim: 256 # module should set output_dim for autoload input_dim in next module. You can also set input_dim manually. | |
dropout_rate: 0.5 | |
attention: # self-attention configuration | |
hidden_dim: 1024 | |
output_dim: 128 | |
dropout_rate: 0.5 | |
return_with_input: true # add inputs information, like attention_mask, to decoder module. | |
return_sentence_level_hidden: false # if return sentence representation to decoder module | |
decoder: | |
_model_target_: model.decoder.StackPropagationDecoder # decoder name | |
interaction: | |
_model_target_: model.decoder.interaction.StackInteraction # interaction module name | |
differentiable: false # interaction module config | |
intent_classifier: | |
_model_target_: model.decoder.classifier.AutoregressiveLSTMClassifier # intent classifier module name | |
layer_num: 1 | |
bidirectional: false | |
hidden_dim: 64 | |
force_ratio: 0.9 # teacher-force ratio | |
embedding_dim: 8 # intent embedding dim | |
ignore_index: -100 # ignore index to compute loss and metric | |
dropout_rate: 0.5 | |
mode: "token-level-intent" # decode mode, support [token-level-intent, intent, slot] | |
use_multi: "{base.multi_intent}" | |
return_sentence_level: true # whether to return sentence level prediction as decoded input | |
slot_classifier: | |
_model_target_: model.decoder.classifier.AutoregressiveLSTMClassifier | |
layer_num: 1 | |
bidirectional: false | |
force_ratio: 0.9 | |
hidden_dim: 64 | |
embedding_dim: 32 | |
ignore_index: -100 | |
dropout_rate: 0.5 | |
mode: "slot" | |
use_multi: false | |
return_sentence_level: false | |
``` | |
## Implementing a New Model | |
### 1. Interaction Re-Implement | |
Here we take `DCA-Net` as an example: | |
In most cases, you just need to rewrite `Interaction` module: | |
```python | |
from common.utils import HiddenData | |
from model.decoder.interaction import BaseInteraction | |
class DCANetInteraction(BaseInteraction): | |
def __init__(self, **config): | |
super().__init__(**config) | |
self.T_block1 = I_S_Block(self.config["output_dim"], self.config["attention_dropout"], self.config["num_attention_heads"]) | |
... | |
def forward(self, encode_hidden: HiddenData, **kwargs): | |
... | |
``` | |
and then you should configure your module: | |
```yaml | |
base: | |
... | |
optimizer: | |
... | |
scheduler: | |
... | |
model: | |
_model_target_: model.OpenSLUModel | |
encoder: | |
_model_target_: model.encoder.AutoEncoder | |
encoder_name: lstm | |
embedding: | |
load_embedding_name: glove.6B.300d.txt | |
embedding_dim: 300 | |
dropout_rate: 0.5 | |
lstm: | |
dropout_rate: 0.5 | |
output_dim: 128 | |
layer_num: 2 | |
bidirectional: true | |
output_dim: "{model.encoder.lstm.output_dim}" | |
return_with_input: true | |
return_sentence_level_hidden: false | |
decoder: | |
_model_target_: model.decoder.DCANetDecoder | |
interaction: | |
_model_target_: model.decoder.interaction.DCANetInteraction | |
output_dim: "{model.encoder.output_dim}" | |
attention_dropout: 0.5 | |
num_attention_heads: 8 | |
intent_classifier: | |
_model_target_: model.decoder.classifier.LinearClassifier | |
mode: "intent" | |
input_dim: "{model.decoder.output_dim.output_dim}" | |
ignore_index: -100 | |
slot_classifier: | |
_model_target_: model.decoder.classifier.LinearClassifier | |
mode: "slot" | |
input_dim: "{model.decoder.output_dim.output_dim}" | |
ignore_index: -100 | |
``` | |
Oops, you finish all model construction. You can run script as follows to train model: | |
```shell | |
python run.py -cp config/dca_net.yaml [-ds atis] | |
``` | |
### 2. Decoder Re-Implement | |
Sometimes, `interaction then classification` order can not meet your needs. Therefore, you should simply rewrite decoder for flexible interaction order: | |
Here, we take `stack-propagation` as an example: | |
1. We should rewrite interaction module for `stack-propagation` | |
```python | |
from common.utils import ClassifierOutputData, HiddenData | |
from model.decoder.interaction.base_interaction import BaseInteraction | |
class StackInteraction(BaseInteraction): | |
def __init__(self, **config): | |
super().__init__(**config) | |
... | |
def forward(self, intent_output: ClassifierOutputData, encode_hidden: HiddenData): | |
... | |
``` | |
2. We should rewrite `StackPropagationDecoder` for stack-propagation interaction order: | |
```python | |
from common.utils import HiddenData, OutputData | |
class StackPropagationDecoder(BaseDecoder): | |
def forward(self, hidden: HiddenData): | |
pred_intent = self.intent_classifier(hidden) | |
hidden = self.interaction(pred_intent, hidden) | |
pred_slot = self.slot_classifier(hidden) | |
return OutputData(pred_intent, pred_slot) | |
``` | |
3. Then we can easily combine general model by `config/stack-propagation.yaml` configuration file: | |
```yaml | |
base: | |
... | |
... | |
model: | |
_model_target_: model.OpenSLUModel | |
encoder: | |
... | |
decoder: | |
_model_target_: model.decoder.StackPropagationDecoder | |
interaction: | |
_model_target_: model.decoder.interaction.StackInteraction | |
differentiable: false | |
intent_classifier: | |
_model_target_: model.decoder.classifier.AutoregressiveLSTMClassifier | |
... # parameters needed __init__(*) | |
mode: "token-level-intent" | |
use_multi: false | |
return_sentence_level: true | |
slot_classifier: | |
_model_target_: model.decoder.classifier.AutoregressiveLSTMClassifier | |
... # parameters needed __init__(*) | |
mode: "slot" | |
use_multi: false | |
return_sentence_level: false | |
``` | |
4. You can run script as follows to train model: | |
```shell | |
python run.py -cp config/stack-propagation.yaml | |
``` | |