DeBERTa commited on
Commit
8f8b43d
·
1 Parent(s): 5385f0f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +17 -17
README.md CHANGED
@@ -12,7 +12,7 @@ widget:
12
 
13
  ## DeBERTa: Decoding-enhanced BERT with Disentangled Attention
14
 
15
- [DeBERTa](https://arxiv.org/abs/2006.03654) improves the BERT and RoBERTa models using disentangled attention and enhanced mask decoder. With those two improvements, DeBERTa out perform RoBERTa on a majority of NLU tasks with 80GB training data.
16
 
17
  Please check the [official repository](https://github.com/microsoft/DeBERTa) for more details and updates.
18
 
@@ -51,20 +51,20 @@ export TASK_NAME=rte
51
  output_dir="ds_results"
52
  num_gpus=8
53
  batch_size=4
54
- python -m torch.distributed.launch --nproc_per_node=${num_gpus} \
55
- run_glue.py \
56
- --model_name_or_path microsoft/deberta-v2-xxlarge-mnli \
57
- --task_name $TASK_NAME \
58
- --do_train \
59
- --do_eval \
60
- --max_seq_length 256 \
61
- --per_device_train_batch_size ${batch_size} \
62
- --learning_rate 3e-6 \
63
- --num_train_epochs 3 \
64
- --output_dir $output_dir \
65
- --overwrite_output_dir \
66
- --logging_steps 10 \
67
- --logging_dir $output_dir \
68
  --deepspeed ds_config.json
69
  ```
70
 
@@ -72,8 +72,8 @@ You can also run with `--sharded_ddp`
72
  ```bash
73
  cd transformers/examples/text-classification/
74
  export TASK_NAME=rte
75
- python -m torch.distributed.launch --nproc_per_node=8 run_glue.py --model_name_or_path microsoft/deberta-v2-xxlarge-mnli \
76
- --task_name $TASK_NAME --do_train --do_eval --max_seq_length 256 --per_device_train_batch_size 4 \
77
  --learning_rate 3e-6 --num_train_epochs 3 --output_dir /tmp/$TASK_NAME/ --overwrite_output_dir --sharded_ddp --fp16
78
  ```
79
 
 
12
 
13
  ## DeBERTa: Decoding-enhanced BERT with Disentangled Attention
14
 
15
+ [DeBERTa](https://arxiv.org/abs/2006.03654) improves the BERT and RoBERTa models using disentangled attention and enhanced mask decoder. It outperforms BERT and RoBERTa on majority of NLU tasks with 80GB training data.
16
 
17
  Please check the [official repository](https://github.com/microsoft/DeBERTa) for more details and updates.
18
 
 
51
  output_dir="ds_results"
52
  num_gpus=8
53
  batch_size=4
54
+ python -m torch.distributed.launch --nproc_per_node=${num_gpus} \\
55
+ run_glue.py \\
56
+ --model_name_or_path microsoft/deberta-v2-xxlarge-mnli \\
57
+ --task_name $TASK_NAME \\
58
+ --do_train \\
59
+ --do_eval \\
60
+ --max_seq_length 256 \\
61
+ --per_device_train_batch_size ${batch_size} \\
62
+ --learning_rate 3e-6 \\
63
+ --num_train_epochs 3 \\
64
+ --output_dir $output_dir \\
65
+ --overwrite_output_dir \\
66
+ --logging_steps 10 \\
67
+ --logging_dir $output_dir \\
68
  --deepspeed ds_config.json
69
  ```
70
 
 
72
  ```bash
73
  cd transformers/examples/text-classification/
74
  export TASK_NAME=rte
75
+ python -m torch.distributed.launch --nproc_per_node=8 run_glue.py --model_name_or_path microsoft/deberta-v2-xxlarge-mnli \\
76
+ --task_name $TASK_NAME --do_train --do_eval --max_seq_length 256 --per_device_train_batch_size 4 \\
77
  --learning_rate 3e-6 --num_train_epochs 3 --output_dir /tmp/$TASK_NAME/ --overwrite_output_dir --sharded_ddp --fp16
78
  ```
79