|
--- |
|
license: cc-by-nc-4.0 |
|
library_name: femr |
|
tags: |
|
- healthcare |
|
- femr |
|
- medical |
|
extra_gated_prompt: "You agree to all terms outlined in 'The EHRSHOT Credentialed Health Data License' (see https://shahlab.stanford.edu/ehrshot_license). Access requires a verified CITI training certificate using the same process outlined by PhysioNet (see https://physionet.org/about/citi-course/) Please provide proof via the verification URL, which takes the form https://www.citiprogram.org/verify/?XXXXXX. You agree to not use the model to conduct experiments that cause harm to human subjects." |
|
extra_gated_fields: |
|
Full Name: text |
|
Email: text |
|
Affiliation: text |
|
CITI Certification Verification URL: text |
|
I agree to all terms outlined in 'The EHRSHOT Credentialed Health Data License': checkbox |
|
I agree to use this model for non-commercial use ONLY: checkbox |
|
--- |
|
|
|
# MOTOR-T-Base |
|
|
|
This is a 143 million parameter autoregressive foundation model pretrained on 2.57 million deidentified EHRs from Stanford Medicine. |
|
|
|
This is the model from [(Steinberg et al. 2023)](https://arxiv.org/abs/2301.03150). |
|
|
|
As input, this model expects a sequence of coded medical events that have been mapped to Standard Concepts within the [OMOP-CDM vocabulary](https://ohdsi.github.io/CommonDataModel/index.html). The model generates representations of patients which can then be used for downstream prediction tasks. |
|
|
|
## Model Details |
|
|
|
### Model Description |
|
|
|
- **Developed by:** Shah lab @ Stanford University |
|
- **Funded by:** Stanford Healthcare |
|
- **Shared by:** Shah lab @ Stanford University |
|
- **Model type:** MOTOR [(Steinberg et al. 2023)](https://arxiv.org/abs/2301.03150) |
|
- **Language(s) (NLP):** Electronic health record codes |
|
- **License:** CC-BY NC 4.0 |
|
- **Finetuned from model:** N/A -- trained from scratch |
|
|
|
### Model Sources |
|
|
|
- **Paper:** [MOTOR: A Time-To-Event Foundation Model For Structured Medical Records](https://arxiv.org/abs/2301.03150) |
|
|
|
## Uses |
|
|
|
This model is intended to generate representations for patients based on the structured data within their electronic health record. |
|
These representations are ideally used for time-to-even-modeling, but can also be used for other downstream tasks such as predicting diagnoses, detecting anomalies, or doing propensity score matching for causal inference. |
|
|
|
### Direct Use |
|
|
|
You will likely want to tune the model for your downstream use case. |
|
|
|
### Out-of-Scope Use |
|
|
|
This model is for research purposes only. It is not for use in any real-world decision making that impacts patients, providers, or hospital operations. |
|
|
|
## Bias, Risks, and Limitations |
|
|
|
This model was trained on a corpus of 2.57 million patients from Stanford Medicine. |
|
The model will thus reflect the patterns of how care is delivered at Stanford Medicine, in addition to the racial and socioeconomic makeup of Stanford Medicine's patient base. |
|
This model may not generalize well to other hospitals and demographic mixes. |
|
|
|
## How to Get Started with the Model |
|
|
|
We recommend getting started by looking at our tutorial repository: https://github.com/som-shahlab/motor_tutorial |
|
|
|
## Training Details |
|
|
|
Full training details are provided in our accompanying paper, [MOTOR: A Time-To-Event Foundation Model For Structured Medical Records](https://arxiv.org/abs/2301.03150). |
|
|
|
### Training Data |
|
|
|
The model is trained on 2.57 million patients from the [Stanford Medicine Research Data Repository (STARR)](https://academic.oup.com/jamiaopen/article/6/3/ooad054/7236015), which contains EHR data from both Stanford Health Care (primarily adult care) |
|
and Lucile Packard Children’s Hospital (primarily pediatric care). |
|
The dataset contains only structured data (i.e. no clinical text or images) and covers demographics (e.g. age, sex, race), diagnoses, procedures, laboratory results, medication prescriptions, and other coded clinical observations. |
|
The data is formatted according to the [Observational Medical Outcomes Partnership Common Data Model (OMOP-CDM)](https://ohdsi.github.io/CommonDataModel/cdm53.html). |
|
All data that we work with is deidentified. |
|
|
|
### Training Procedure |
|
|
|
We train our model using an time-to-event pretraining objective, i.e. predict the time until a particular code appears in a patient's timeline. |
|
|
|
#### Preprocessing |
|
|
|
We use the [FEMR](https://github.com/som-shahlab/femr/tree/main) Python library for data preprocessing. |
|
|
|
#### Training Hyperparameters |
|
|
|
* Learning rate: 1e-5 |
|
* Context window size: 496 |
|
* Internal dropout: 0 |
|
* Layers: 12 |
|
* Hidden dimension: 768 |
|
|
|
## Evaluation |
|
|
|
We evaluate this model on Stanford data, see [MOTOR: A Time-To-Event Foundation Model For Structured Medical Records](https://arxiv.org/abs/2301.03150). |
|
|
|
## Technical Specifications |
|
|
|
This model uses the MOTOR architecture from [MOTOR: A Time-To-Event Foundation Model For Structured Medical Records](https://arxiv.org/abs/2301.03150). |
|
|
|
## Citation |
|
|
|
**BibTeX:** |
|
``` |
|
@misc{steinberg2023motor, |
|
title={MOTOR: A Time-To-Event Foundation Model For Structured Medical Records}, |
|
author={Ethan Steinberg and Jason Fries and Yizhe Xu and Nigam Shah}, |
|
year={2023}, |
|
eprint={2301.03150}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.LG} |
|
} |
|
``` |
|
## Model Card Authors |
|
|
|
Ethan Steinberg, Michael Wornow |
|
|
|
## Model Card Contact |
|
|
|
Ethan Steinberg ([email protected]) |