File size: 3,119 Bytes
9a1bc99
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
cca9850
893983d
 
 
 
 
 
 
 
9a1bc99
0f078ba
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9a1bc99
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
---
tags:
- spacy
- token-classification
language:
- en
model-index:
- name: en_chemner
  results:
  - task:
      name: NER
      type: token-classification
    metrics:
    - name: NER Precision
      type: precision
      value: 0.9906542056
    - name: NER Recall
      type: recall
      value: 0.9636363636
    - name: NER F Score
      type: f_score
      value: 0.9769585253
widget:
- text: >-
    Cinammaldehyde is a fragrant compound found in cinammon. Icosanoic acid, is
    a saturated fatty acid with a 20-carbon chain. Triptane is commonly used as
    an anti-knock additive in aviation fuels. Benzophenone is a widely used
    building block in organic chemistry, being the parent diarylketone. Geraniol
    is a monoterpenoid and an alcohol. It is the primary component of citronella
    oil and is a primary component of rose oil, palmarosa oil.
license: apache-2.0
---
# en_chemner: A spaCy Model for Chemical NER

## Model Description

The `en_chemner` model is a specialized Named Entity Recognition (NER) tool designed for the field of chemistry. Built using the spaCy framework, 
it identifies and classifies chemical entities within English-language texts. 

### Key Features

- **High Precision and Recall**: With a precision of 99.07% and a recall of 96.36%, the model offers highly accurate entity recognition, minimizing both false positives and false negatives.
- **Rich Label Scheme**: The model can identify a variety of chemical entities such as alcohols, aldehydes, alkanes, and more, making it versatile for different chemical analysis tasks.
- **Optimized for spaCy**: Integrated seamlessly with spaCy (>=3.6.1,<3.7.0), allowing for easy incorporation into existing spaCy pipelines and applications.
- **Extensive Vector Library**: Comes with over 514,000 unique vectors, each with 300 dimensions, providing a rich foundation for understanding and classifying chemical entities.

### Use Cases

The `en_chemner` model is ideal for:
- **Chemical Literature Analysis**: Automatically extracting chemical entities from research papers, patents, and textbooks.
- **Data Annotation**: Assisting in the annotation of chemical databases or creating datasets for further machine learning tasks.
- **Educational Purposes**: Helping students in chemistry-related fields to identify and understand various chemical compounds and their classifications.
- 
| Feature | Description |
| --- | --- |
| **Name** | `en_chemner` |
| **Version** | `1.0.0` |
| **spaCy** | `>=3.6.1,<3.7.0` |
| **Default Pipeline** | `tok2vec`, `ner` |
| **Components** | `tok2vec`, `ner` |
| **Vectors** | 514157 keys, 514157 unique vectors (300 dimensions) |
| **Sources** | n/a |
| **License** | n/a |
| **Author** | [n/a]() |

### Label Scheme

<details>

<summary>View label scheme (7 labels for 1 components)</summary>

| Component | Labels |
| --- | --- |
| **`ner`** | `ALCOHOL`, `ALDEHYDE`, `ALKANE`, `ALKENE`, `ALKYNE`, `C_ACID`, `KETONE` |

</details>

### Accuracy

| Type | Score |
| --- | --- |
| `ENTS_F` | 97.70 |
| `ENTS_P` | 99.07 |
| `ENTS_R` | 96.36 |
| `TOK2VEC_LOSS` | 151.95 |
| `NER_LOSS` | 259.22 |