Phi3_Korean_hypertokenizer / trainer_config.yaml
aindreias's picture
Add a Korean hypertokenizer
0854c7a verified
cls: HF
base_tokenizer_path: microsoft/Phi-3-mini-128k-instruct
dataset:
path: TODO
name: ko_wiki
split: train
column: text
target_num_hyper_token: 1000
batch_size: 1000
total_training_size: 50000