File size: 2,601 Bytes
e72dc7c
163db08
 
 
 
b3f9993
 
 
564c962
 
 
 
e72dc7c
 
b3f9993
e72dc7c
 
 
 
163db08
 
 
 
 
 
 
5b91a3b
 
564c962
5b91a3b
 
 
 
564c962
5b91a3b
564c962
 
5b91a3b
564c962
5b91a3b
 
 
564c962
 
5b91a3b
 
 
f80a6d9
5b91a3b
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
---
language: 
- de
- en
- multilingual
widget:
- text: "In December 1903 in France the Royal Swedish Academy of Sciences awarded Pierre Curie, Marie Curie, and Henri Becquerel the Nobel Prize in Physics."
- text: "Für Richard Phillips Feynman war es immer wichtig in New York, die unanschaulichen Gesetzmäßigkeiten der Quantenphysik Laien und Studenten nahezubringen und verständlich zu machen."
- text: "My name is Julian and I live in Constance"
- text: "Terence David John Pratchett est né le 28 avril 1948 à Beaconsfield dans le Buckinghamshire, en Angleterre."
- text: "北京市,通称北京(汉语拼音:Běijīng;邮政式拼音:Peking),简称“京”,是中华人民共和国的首都及直辖市,是该国的政治、文化、科技、教育、军事和国际交往中心,是一座全球城市,是世界人口第三多的城市和人口最多的首都,具有重要的国际影响力,同時也是目前世界唯一的“双奥之城”,即唯一既主办过夏季"
- text: "काठमाडौँ नेपालको सङ्घीय राजधानी र नेपालको सबैभन्दा बढी जनसङ्ख्या भएको सहर हो।"
tags:
- roberta
license: mit
datasets:
- wikiann
---

# Roberta for Multilingual Named Entity Recognition

## Model description

#### Limitations and bias
This model is limited by its training dataset of entity-annotated news articles from a specific span of time. This may not generalize well for all use cases in different domains.  

## Training data

## Metrics

## Usage

```python
from transformers import AutoTokenizer, AutoModelForTokenClassification 

tokenizer = AutoTokenizer.from_pretrained("julian-schelb/roberta-ner-multilingual/", add_prefix_space=True)                          
model = AutoModelForTokenClassification.from_pretrained("julian-schelb/roberta-ner-multilingual/")

text = "In December 1903 in France the Royal Swedish Academy of Sciences awarded Pierre Curie, Marie Curie, and Henri Becquerel the Nobel Prize in Physics."

inputs = tokenizer(
    text, 
    add_special_tokens=False, 
    return_tensors="pt"
)

with torch.no_grad():
    logits = model(**inputs).logits

predicted_token_class_ids = logits.argmax(-1)

# Note that tokens are classified rather then input words which means that
# there might be more predicted token classes than words.
# Multiple token classes might account for the same word
predicted_tokens_classes = [model_tuned.config.id2label[t.item()] for t in predicted_token_class_ids[0]]
predicted_tokens_classes
```