File size: 3,119 Bytes
9a1bc99 cca9850 893983d 9a1bc99 0f078ba 9a1bc99 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 |
---
tags:
- spacy
- token-classification
language:
- en
model-index:
- name: en_chemner
results:
- task:
name: NER
type: token-classification
metrics:
- name: NER Precision
type: precision
value: 0.9906542056
- name: NER Recall
type: recall
value: 0.9636363636
- name: NER F Score
type: f_score
value: 0.9769585253
widget:
- text: >-
Cinammaldehyde is a fragrant compound found in cinammon. Icosanoic acid, is
a saturated fatty acid with a 20-carbon chain. Triptane is commonly used as
an anti-knock additive in aviation fuels. Benzophenone is a widely used
building block in organic chemistry, being the parent diarylketone. Geraniol
is a monoterpenoid and an alcohol. It is the primary component of citronella
oil and is a primary component of rose oil, palmarosa oil.
license: apache-2.0
---
# en_chemner: A spaCy Model for Chemical NER
## Model Description
The `en_chemner` model is a specialized Named Entity Recognition (NER) tool designed for the field of chemistry. Built using the spaCy framework,
it identifies and classifies chemical entities within English-language texts.
### Key Features
- **High Precision and Recall**: With a precision of 99.07% and a recall of 96.36%, the model offers highly accurate entity recognition, minimizing both false positives and false negatives.
- **Rich Label Scheme**: The model can identify a variety of chemical entities such as alcohols, aldehydes, alkanes, and more, making it versatile for different chemical analysis tasks.
- **Optimized for spaCy**: Integrated seamlessly with spaCy (>=3.6.1,<3.7.0), allowing for easy incorporation into existing spaCy pipelines and applications.
- **Extensive Vector Library**: Comes with over 514,000 unique vectors, each with 300 dimensions, providing a rich foundation for understanding and classifying chemical entities.
### Use Cases
The `en_chemner` model is ideal for:
- **Chemical Literature Analysis**: Automatically extracting chemical entities from research papers, patents, and textbooks.
- **Data Annotation**: Assisting in the annotation of chemical databases or creating datasets for further machine learning tasks.
- **Educational Purposes**: Helping students in chemistry-related fields to identify and understand various chemical compounds and their classifications.
-
| Feature | Description |
| --- | --- |
| **Name** | `en_chemner` |
| **Version** | `1.0.0` |
| **spaCy** | `>=3.6.1,<3.7.0` |
| **Default Pipeline** | `tok2vec`, `ner` |
| **Components** | `tok2vec`, `ner` |
| **Vectors** | 514157 keys, 514157 unique vectors (300 dimensions) |
| **Sources** | n/a |
| **License** | n/a |
| **Author** | [n/a]() |
### Label Scheme
<details>
<summary>View label scheme (7 labels for 1 components)</summary>
| Component | Labels |
| --- | --- |
| **`ner`** | `ALCOHOL`, `ALDEHYDE`, `ALKANE`, `ALKENE`, `ALKYNE`, `C_ACID`, `KETONE` |
</details>
### Accuracy
| Type | Score |
| --- | --- |
| `ENTS_F` | 97.70 |
| `ENTS_P` | 99.07 |
| `ENTS_R` | 96.36 |
| `TOK2VEC_LOSS` | 151.95 |
| `NER_LOSS` | 259.22 | |