en_chemner: A spaCy Model for Chemical NER

Model Description

The en_chemner model is a specialized Named Entity Recognition (NER) tool designed for the field of chemistry. Built using the spaCy framework, it identifies and classifies chemical entities within English-language texts.

Key Features

  • High Precision and Recall: With a precision of 99.07% and a recall of 96.36%, the model offers highly accurate entity recognition, minimizing both false positives and false negatives.
  • Rich Label Scheme: The model can identify a variety of chemical entities such as alcohols, aldehydes, alkanes, and more, making it versatile for different chemical analysis tasks.
  • Optimized for spaCy: Integrated seamlessly with spaCy (>=3.6.1,<3.7.0), allowing for easy incorporation into existing spaCy pipelines and applications.
  • Extensive Vector Library: Comes with over 514,000 unique vectors, each with 300 dimensions, providing a rich foundation for understanding and classifying chemical entities.

Use Cases

The en_chemner model is ideal for: - Chemical Literature Analysis: Automatically extracting chemical entities from research papers, patents, and textbooks. - Data Annotation: Assisting in the annotation of chemical databases or creating datasets for further machine learning tasks. - Educational Purposes: Helping students in chemistry-related fields to identify and understand various chemical compounds and their classifications.

Feature Description
Name en_chemner
Version 1.0.0
spaCy >=3.6.1,<3.7.0
Default Pipeline tok2vec, ner
Components tok2vec, ner
Vectors 514157 keys, 514157 unique vectors (300 dimensions)
Sources n/a
License n/a
Author n/a

Label Scheme

View label scheme (7 labels for 1 components)
Component Labels
ner ALCOHOL, ALDEHYDE, ALKANE, ALKENE, ALKYNE, C_ACID, KETONE

Accuracy

Type Score
ENTS_F 97.70
ENTS_P 99.07
ENTS_R 96.36
TOK2VEC_LOSS 151.95
NER_LOSS 259.22
Downloads last month
9
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Evaluation results