jnishi commited on
Commit
a7fd192
1 Parent(s): c79dd1a

add README

Browse files
Files changed (1) hide show
  1. README.md +84 -0
README.md CHANGED
@@ -1,3 +1,87 @@
1
  ---
2
  license: cc-by-sa-4.0
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: cc-by-sa-4.0
3
+ language:
4
+ - ja
5
  ---
6
+ # Model card for model ID
7
+
8
+ This is a T5 v1.1 model, pre-trained on a Japanese corpus.
9
+
10
+ ## Model details
11
+
12
+ T5 is a Transformer-based Encoder-Decoder model, now in v1.1, with the following improvements over the original T5.
13
+ - GEGLU activation in feed-forward hidden layer, rather than ReLU - see https://arxiv.org/abs/2002.05202 .
14
+ - Dropout was turned off in pre-training (quality win). Dropout should be re-enabled during fine-tuning.
15
+ - no parameter sharing between embedding and classifier layer
16
+ - "xl" and "xxl" replace "3B" and "11B". The model shapes are a bit different - larger d_model and smaller num_heads and d_ff.
17
+
18
+ This model is based on T5 v1.1. It was pre-trained on a Japanese corpus. For the Japanese corpus, Japanese Wikipedia and mC4/ja were used.
19
+
20
+ ### Model Description
21
+
22
+ <!-- Provide a longer summary of what this model is. -->
23
+
24
+ - **Developed by:** Retrieva, Inc.
25
+ - **Model type:** T5 v1.1
26
+ - **Language(s) (NLP):** Japanese
27
+ - **License:** CC-BY-SA 4.0
28
+
29
+
30
+ ## Training Details
31
+
32
+ We use T5X (https://github.com/google-research/t5x) for the training of this model, and it has been converted to the Huggingface transformer format.
33
+
34
+ ## Training Data
35
+
36
+ The training data used is
37
+ - The Japanese part of the multilingual C4(mC4/ja).
38
+ - Japanese Wikipedia(20220920).
39
+
40
+ #### Preprocessing
41
+ The following filtering is done
42
+ - Remove documents that do not use a single hiragana character. This removes English-only documents and documents in Chinese.
43
+ - Whitelist-style filtering using TLD of URL to remove affiliate sites.
44
+
45
+ #### Training Hyperparameters
46
+
47
+ - dropout rate: 0.0
48
+ - batch size: 256
49
+ - fp32
50
+ - input length: 512
51
+ - output length: 114
52
+
53
+ - Otherwise, the default value of T5X (https://github.com/google-research/t5x/blob/main/t5x/examples/t5/t5_1_1/small.gin) is followed, including the following.
54
+ - optimizer: Adafactor
55
+ - base_learning_rate: 1.0
56
+ - warmup steps: 10000
57
+
58
+ #### Speeds, Sizes, Times
59
+
60
+ We trained 2097152 steps.
61
+
62
+ ## Technical Specifications
63
+
64
+ ### Model Architecture and Objective
65
+ Model architecture.
66
+ - T5 v1.1(https://github.com/google-research/text-to-text-transfer-transformer/blob/main/released_checkpoints.md#t511)
67
+ - Size: Small(~77 million parameters)
68
+
69
+ ### Compute Infrastructure
70
+
71
+ Google Cloud TPU v4-8.
72
+
73
+ #### Software
74
+
75
+ - T5X(https://github.com/google-research/t5x).
76
+
77
+ ## More Information
78
+
79
+ https://note.com/retrieva/n/n7b4186dc5ada (in Japanese)
80
+
81
+ ## Model Card Authors
82
+
83
+ Jiro Nishitoba
84
+
85
+ ## Model Card Contact
86
+
87