dutch-tokenizer-arena / utils /compress_rate_util.py
xu-song's picture
add more tokenizer
c75633b
raw
history blame
83 Bytes
"""
中文数据:clue superclue
英文数据:glue cnn_dailymail gigaword
"""