MIReAD Neuro

This model is a fine-tuned version of arazd/MIReAD on a dataset of Neuroscience papers from 200 journals collected from various sources for a journal classification task. It achieves the following results on the evaluation set:

Loss: 2.7117
Accuracy: 0.4011
F1: 0.3962
Precision: 0.4066
Recall: 0.3999

Model description

This model was trained on a journal classification task.

Intended uses & limitations

The intended use of this model is to create abstract embeddings for semantic similarity search for neuroscience-related articles.

Model Usage

To load the model:

from transformers import BertForSequenceClassification, AutoTokenizer
model_path = "biodatlab/MIReAD-Neuro"
model = BertForSequenceClassification.from_pretrained(model_path)
tokenizer = AutoTokenizer.from_pretrained(model_path)

To create embeddings and for classification:

# sample abstract & title text
title = "Why Brain Criticality Is Clinically Relevant: A Scoping Review."
abstract = "The past 25 years have seen a strong increase in the number of publications related to criticality in different areas of neuroscience..."
text = title + tokenizer.sep_token + abstract
tokens = tokenizer(
    text,
    max_length=512,
    padding=True,
    truncation=True,
    return_tensors="pt"
)

# to generate an embedding from a given title and abstract
with torch.no_grad():
  output = model.bert(**tokens)
  embedding = output.last_hidden_state[:, 0, :]

# to classify (200 journals) a given title and abstract
output = model(**tokens)
class = output.logits

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 3e-05
train_batch_size: 16
eval_batch_size: 16
num_epochs: 6