This model is a fine-tuned version of arazd/MIReAD on a dataset of Neuroscience papers from 200 journals collected from various sources. It is a mirror of the model available at biodatlab/MIReAD-Neuro. Prefer using the model from that page for stability. It achieves the following results on the evaluation set:
- Loss: 2.7117
- Accuracy: 0.4011
- F1: 0.3962
- Precision: 0.4066
- Recall: 0.3999
Model description
This model was trained on a journal classification task.
Intended uses & limitations
The intended use of this model is to create abstract embeddings for semantic similarity search.
Model Usage
To load the model:
from transformers import BertForSequenceClassification, AutoTokenizer
model_path = "atrytone/MIReAD-Neuro"
model = BertForSequenceClassification.from_pretrained(model_path)
tokenizer = AutoTokenizer.from_pretrained(model_path)
To create embeddings and for classification:
# sample abstract & title text
title = "Why Brain Criticality Is Clinically Relevant: A Scoping Review."
abstract = "The past 25 years have seen a strong increase in the number of publications related to criticality in different areas of neuroscience..."
text = title + tokenizer.sep_token + abstract
tokens = tokenizer(
text,
max_length=512,
padding=True,
truncation=True,
return_tensors="pt"
)
# to generate an embedding from a given title and abstract
with torch.no_grad():
output = model.bert(**tokens)
embedding = output.last_hidden_state[:, 0, :]
# to classify (200 journals) a given title and abstract
output = model(**tokens)
class = output.logits
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 3e-05
- train_batch_size: 16
- eval_batch_size: 16
- num_epochs: 6
- Downloads last month
- 16
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.