|
|
|
--- |
|
license: apache-2.0 |
|
library_name: span-marker |
|
tags: |
|
- span-marker |
|
- token-classification |
|
- ner |
|
- named-entity-recognition |
|
pipeline_tag: token-classification |
|
widget: |
|
- text: >- |
|
Amelia Earhart flew her single engine Lockheed Vega 5B across the Atlantic |
|
to Paris . |
|
example_title: Amelia Earhart |
|
model-index: |
|
- name: >- |
|
SpanMarker w. xlm-roberta-large on CoNLL++ with document-level context by Tom Aarsen |
|
results: |
|
- task: |
|
type: token-classification |
|
name: Named Entity Recognition |
|
dataset: |
|
type: conllpp |
|
name: CoNLL++ w. document context |
|
split: test |
|
revision: 3e6012875a688903477cca9bf1ba644e65480bd6 |
|
metrics: |
|
- type: f1 |
|
value: 0.9554 |
|
name: F1 |
|
- type: precision |
|
value: 0.9600 |
|
name: Precision |
|
- type: recall |
|
value: 0.9509 |
|
name: Recall |
|
datasets: |
|
- conllpp |
|
- tomaarsen/conllpp |
|
language: |
|
- en |
|
metrics: |
|
- f1 |
|
- recall |
|
- precision |
|
--- |
|
|
|
# SpanMarker for Named Entity Recognition |
|
|
|
This is a [SpanMarker](https://github.com/tomaarsen/SpanMarkerNER) model that can be used for Named Entity Recognition. In particular, this SpanMarker model uses [xlm-roberta-large](https://huggingface.co/xlm-roberta-large) as the underlying encoder. See [train.py](train.py) for the training script. |
|
Note that this model was trained with document-level context, i.e. it will primarily perform well when provided with enough context. It is recommended to call `model.predict` with a π€ Dataset with `tokens`, `document_id` and `sentence_id` columns. |
|
See the [documentation](https://tomaarsen.github.io/SpanMarkerNER/api/span_marker.modeling.html#span_marker.modeling.SpanMarkerModel.predict) of the `model.predict` method for more information. |
|
|
|
## Usage |
|
|
|
To use this model for inference, first install the `span_marker` library: |
|
|
|
```bash |
|
pip install span_marker |
|
``` |
|
|
|
You can then run inference with this model like so: |
|
|
|
```python |
|
from span_marker import SpanMarkerModel |
|
|
|
# Download from the π€ Hub |
|
model = SpanMarkerModel.from_pretrained("tomaarsen/span-marker-xlm-roberta-large-conllpp-doc-context") |
|
# Run inference |
|
entities = model.predict("Amelia Earhart flew her single engine Lockheed Vega 5B across the Atlantic to Paris.") |
|
``` |
|
|
|
### Limitations |
|
|
|
**Warning**: This model works best when punctuation is separated from the prior words, so |
|
```python |
|
# β
|
|
model.predict("He plays J. Robert Oppenheimer , an American theoretical physicist .") |
|
# β |
|
model.predict("He plays J. Robert Oppenheimer, an American theoretical physicist.") |
|
|
|
# You can also supply a list of words directly: β
|
|
model.predict(["He", "plays", "J.", "Robert", "Oppenheimer", ",", "an", "American", "theoretical", "physicist", "."]) |
|
``` |
|
The same may be beneficial for some languages, such as splitting `"l'ocean Atlantique"` into `"l' ocean Atlantique"`. |
|
|
|
See the [SpanMarker](https://github.com/tomaarsen/SpanMarkerNER) repository for documentation and additional information on this library. |