Iβd like to know which Korean dataset was used. I want to test it on Korean documents.
The base pre-training utilized the Korean portion of the cleaned CC100 dataset.
Β· Sign up or log in to comment