File size: 818 Bytes
6b85421
 
 
 
 
 
 
 
 
 
 
 
eff086c
6b85421
 
 
 
 
 
 
eff086c
6b85421
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
# Model description

- Morphosyntactic analyzer: Trankit
- Tagset: UD
- Embedding vectors: XLM-RoBERTa-Base
- Dataset: NLPrePL-NKJP-fair-by-name (https://huggingface.co/datasets/ipipan/nlprepl)

# How to use

## Clone

```
git clone [email protected]:ipipan/nlpre_trankit_ud_xlm-roberta-base_nkjp-by-name
```

## Load model

```
import trankit

model_path = './nlpre_trankit_ud_xlm-roberta-base_nkjp-by-name'

trankit.verify_customized_pipeline(
    category='customized-mwt', # pipeline category
    save_dir=model_path, # directory used for saving models in previous steps
    embedding_name='xlm-roberta-base' # embedding version that we use for training our customized pipeline, by default, it is `xlm-roberta-base`
)

model = trankit.Pipeline(lang='customized-mwt', cache_dir=model_path, embedding='xlm-roberta-base')
```