snoop2head commited on
Commit
7b949b9
•
1 Parent(s): 429dbb9

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +14 -0
README.md ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # scibert-wechsel-korean
2
+
3
+ Scibert(🇺🇸) converted into Korean(🇰🇷) using WECHSEL technique.
4
+
5
+ ### Description
6
+ - SciBERT is trained on papers from the corpus of semanticscholar.org. Corpus size is 1.14M papers, 3.1B tokens.
7
+ - Wechsel is converting embedding layer's subword tokens from source language to target language.
8
+ - SciBERT trained with English language is converted into Korean langauge using Wechsel technique.
9
+ - Korean tokenizer is selected with KLUE PLMs' tokenizers due to its similar vocab size(32000) and performance.
10
+
11
+ ### Reference
12
+ - [Scibert](https://github.com/allenai/scibert)
13
+ - [WECHSEL](https://github.com/CPJKU/wechsel)
14
+ - [Korean Language Understanding Evaluation](https://github.com/KLUE-benchmark/KLUE)