About Korean dataset

#4
by jhkwon - opened

I’d like to know which Korean dataset was used. I want to test it on Korean documents.

The base pre-training utilized the Korean portion of the cleaned CC100 dataset.

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment