daedra / README.md

Upload tokenizer

37859f7 verified about 1 year ago

8.78 kB

	---
	language:
	- en
	license: apache-2.0
	library_name: transformers
	tags:
	- medical
	- pharmacovigilance
	- vaccines
	datasets:
	- chrisvoncsefalvay/vaers-outcomes
	metrics:
	- accuracy
	- f1
	- precision
	- recall
	dataset: chrisvoncsefalvay/vaers-outcomes
	pipeline_tag: text-classification
	widget:
	- text: Patient is a 90 y.o. male with a PMH of IPF, HFpEF, AFib (Eliquis), Metastatic
	Prostate Cancer who presented to Hospital 10/28/2023 following an unwitnessed
	fall at his assisted living. He was found to have an AKI, pericardial effusion,
	hypoxia, AMS, and COVID-19. His hospital course was complicated by delirium and
	aspiration, leading to acute hypoxic respiratory failure requiring BiPAP and transfer
	to the ICU. Palliative Care had been following, and after goals of care conversations
	on 11/10/2023 the patient was transitioned to DNR-CC. Patient expired at 0107
	11/12/23.
	example_title: VAERS 2727645 (hospitalisation, death)
	- text: 'hospitalized for paralytic ileus a week after the vaccination; This serious
	case was reported by a physician via call center representative and described
	the occurrence of ileus paralytic in a patient who received Rota (Rotarix liquid
	formulation) for prophylaxis. On an unknown date, the patient received the 1st
	dose of Rotarix liquid formulation. On an unknown date, less than 2 weeks after
	receiving Rotarix liquid formulation, the patient experienced ileus paralytic
	(Verbatim: hospitalized for paralytic ileus a week after the vaccination) (serious
	criteria hospitalization and GSK medically significant). The outcome of the ileus
	paralytic was not reported. It was unknown if the reporter considered the ileus
	paralytic to be related to Rotarix liquid formulation. It was unknown if the company
	considered the ileus paralytic to be related to Rotarix liquid formulation. Additional
	Information: GSK Receipt Date: 27-DEC-2023 Age at vaccination and lot number were
	not reported. The patient of unknown age and gender was hospitalized for paralytic
	ileus a week after the vaccination. The reporting physician was in charge of the
	patient.'
	example_title: VAERS 2728408 (hospitalisation)
	- text: Patient received Pfizer vaccine 7 days beyond BUD. According to Pfizer manufacturer
	research data, vaccine is stable and effective up to 2 days after BUD. Waiting
	for more stability data from PFIZER to determine if revaccination is necessary.
	example_title: VAERS 2728394 (no event)
	- text: Fever of 106F rectally beginning 1 hr after immunizations and lasting <24
	hrs. Seen at ER treated w/tylenol & cool baths.
	example_title: VAERS 25042 (ER attendance)
	- text: I had the MMR shot last week, and I felt a little dizzy afterwards, but it
	passed after a few minutes and I'm doing fine now.
	example_title: 'Non-sample example: simulated informal patient narrative (no event)'
	- text: My niece had the COVID vaccine. A few weeks later, she was T-boned by a drunk
	driver. She called me from the ER. She's fully recovered now, though.
	example_title: 'Non-sample example: simulated informal patient narrative (ER attendance,
	albeit unconnected)'
	model-index:
	- name: daedra
	results:
	- task:
	type: text-classification
	dataset:
	name: vaers-outcomes
	type: vaers-outcomes
	metrics:
	- type: accuracy_microaverage
	value: 0.885
	name: Accuracy, microaveraged
	verified: false
	- type: f1_microaverage
	value: 0.885
	name: F1 score, microaveraged
	verified: false
	- type: precision_macroaverage
	value: 0.769
	name: Precision, macroaveraged
	verified: false
	- type: recall_macroaverage
	value: 0.688
	name: Recall, macroaveraged
	verified: false
	---

	# DAEDRA: Determining Adverse Event Disposition for Regulatory Affairs

	This model is a fine-tuned version of [dmis-lab/biobert-base-cased-v1.2](https://huggingface.co/dmis-lab/biobert-base-cased-v1.2) trained on the [VAERS adversome outcomes data set](https://huggingface.com/datasets/chrisvoncsefalvay/vaers-outcomes).

	# Table of Contents

	- [Model Details](#model-details)
	- [Uses](#uses)
	- [Bias, Risks, and Limitations](#bias-risks-and-limitations)
	- [Training Details](#training-details)
	- [Evaluation](#evaluation)
	- [Environmental Impact](#environmental-impact)
	- [Technical Specifications](#technical-specifications-optional)
	- [Citation](#citation)


	# Model Details

	## Model Description

	<!-- Provide a longer summary of what this model is/does. -->

	DAEDRA is a model for the identification of adverse event dispositions (outcomes) from passive pharmacovigilance data.
	The model is trained on a real-world adversomics data set spanning over three decades (1990-2023) and comprising over 1.8m records for a total corpus of 173,093,850 words constructed from a subset of reports submitted to VAERS.
	It is intended to identify, based on the narrative, whether any, or any combination, of three serious outcomes -- death, hospitalisation and ER attendance -- have occurred.


	- Developed by: Chris von Csefalvay
	- Model type: Language model
	- Language(s) (NLP): en
	- License: apache-2.0
	- Parent Model: [dmis-lab/biobert-base-cased-v1.2](https://huggingface.co/dmis-lab/biobert-base-cased-v1.2)
	- Resources for more information:
	- [GitHub Repo](https://github.com/chrisvoncsefalvay/daedra)


	# Uses

	This model was designed to facilitate the coding of passive adverse event reports into severity outcome categories.

	## Direct Use

	Load the model via the `transformers` library:

	```
	from transformers import AutoTokenizer, AutoModel

	tokenizer = AutoTokenizer.from_pretrained("chrisvoncsefalvay/daedra")
	model = AutoModel.from_pretrained("chrisvoncsefalvay/daedra")
	```

	## Out-of-Scope Use

	This model is not intended for the diagnosis or treatment of any disease.


	# Bias, Risks, and Limitations

	Significant research has explored bias and fairness issues with language models (see, e.g., [Sheng et al. (2021)](https://aclanthology.org/2021.acl-long.330.pdf) and [Bender et al. (2021)](https://dl.acm.org/doi/pdf/10.1145/3442188.3445922)). Predictions generated by the model may include disturbing and harmful stereotypes across protected classes; identity characteristics; and sensitive, social, and occupational groups.



	# Training Details

	## Training Data

	The model was trained on the [VAERS adversome outcomes data set](https://huggingface.com/datasets/chrisvoncsefalvay/vaers-outcomes), which comprises 1,814,920 reports from the FDA's Vaccine Adverse Events Reporting System (VAERS). Reports were split into a 70% training set and a 15% test set and 15% validation set after age and gender matching.

	## Training Procedure

	Training was conducted on an Azure `Standard_NC24s_v3` instance in `us-east`, with 4x Tesla V100-PCIE-16GB GPUs and 24x Intel Xeon E5-2690 v4 CPUs at 2.60GHz.

	### Speeds, Sizes, Times

	Training took 15 hours and 10 minutes.


	## Testing Data, Factors & Metrics

	### Testing Data

	The model was tested on the `test` partition of the [VAERS adversome outcomes data set](https://huggingface.com/datasets/chrisvoncsefalvay/vaers-outcomes).

	## Results

	On the test set, the model achieved the following results:

	* `f1`: 0.885
	* `precision` and `recall`, microaveraged: 0.885
	* `precision`, macroaveraged: 0.769
	* `recall`, macroaveraged: 0.688


	# Environmental Impact

	<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->

	Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).

	- Hardware Type: 4 x Tesla V100-PCIE-16GB
	- Hours used: 15.166
	- Cloud Provider: Azure
	- Compute Region: us-east
	- Carbon Emitted: 6.72 kg CO2eq (offset by provider)


	# Citation

	<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->

	BibTeX:

	Forthcoming -- watch this space.

	# Model Card Authors

	<!-- This section provides another layer of transparency and accountability. Whose views is this model card representing? How many voices were included in its construction? Etc. -->

	Chris von Csefalvay

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 2e-05
	- train_batch_size: 64
	- eval_batch_size: 64
	- seed: 42
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- num_epochs: 3

	### Framework versions

	- Transformers 4.37.2
	- Pytorch 2.1.2+cu121
	- Datasets 2.3.2
	- Tokenizers 0.15.1