--- license: apache-2.0 library_name: femr tags: - healthcare --- # CLMBR-T-Base-Random This is a CLMBR model with randomly initialized weights using a dummy vocabulary. The purpose of this model is to test code pipelines and demonstrate how to use CLMBR before applying for access to the official CLMBR release that was trained on real Stanford Hospital data. The model architecture is CLMBR-T-Base (144M params), as originally described in [the EHRSHOT paper (Wornow et al. 2023)](https://arxiv.org/abs/2307.02028), and based on the architecture originally developed in [the Clinical Language Modeling Based Representations paper (Steinberg et al. 2021)](https://www.sciencedirect.com/science/article/pii/S1532046420302653) The weights are random, so **this model has no clinical or research use.** ## Model Details ### Model Description - **Developed by:** Shah lab @ Stanford University - **Funded by:** This work was supported in part by the Mark and Debra Leslie Endowment for AI in Healthcare, the Clinical Excellence Research Center at Stanford Medicine, and Technology and Digital Solutions at Stanford Healthcare. MW is supported by an NSF Graduate Research Fellowship. JF was supported in part by a Stanford AIMI-HAI Partnership Grant. - **Shared by:** Shah lab @ Stanford University - **Model type:** CLMBR [(Steinberg et al. 2021)](https://www.sciencedirect.com/science/article/pii/S1532046420302653) - **Language(s) (NLP):** Electronic health record codes - **License:** Apache-2.0 - **Finetuned from model:** N/A -- trained from scratch ### Model Sources - **Repository:** [https://github.com/som-shahlab/ehrshot-benchmark/](https://github.com/som-shahlab/ehrshot-benchmark/) - **Paper:** [EHRSHOT: An EHR Benchmark for Few-Shot Evaluation of Foundation Models](https://arxiv.org/abs/2307.02028) ## Uses This model generates (random) dense representations for patients based on the structured data within their electronic health record. These representations can then be used for downstream tasks such as predicting diagnoses, detecting anomalies, or doing propensity score matching for causal inference. Again, please note that **this version of the model has random weights.** Thus, the outputs should be meaningless. ## How to Get Started with the Model Use the code below to get started with the model. First, download the necessary libraries. ```bash # Create Python 3.10 environment conda create --name ehrshot_env python=3.10 -y conda activate ehrshot_env # Install requirements pip install torch==2.1.1 femr==0.2.0 datasets==2.15.0 flash_attn==2.3.6 transformers==4.35.2 ``` Second, run the following Python script to run inference on a single patient: ```python import femr.models.transformer import torch import femr.models.tokenizer import femr.models.dataloader import datetime model_name = "StanfordShahLab/clmbr-t-base-random" # Load tokenizer / batch loader tokenizer = femr.models.tokenizer.FEMRTokenizer.from_pretrained(model_name) batch_processor = femr.models.dataloader.FEMRBatchProcessor(tokenizer) # Load model model = femr.models.transformer.FEMRModel.from_pretrained(model_name) # Create an example patient to run inference on example_patient = { 'patient_id': 30, 'events': [{ 'time': datetime.datetime(2011, 5, 8), 'measurements': [ {'code': 'SNOMED/1'}, ], }, { 'time': datetime.datetime(2012, 6, 9), 'measurements': [ {'code': 'SNOMED/30'}, {'code': 'SNOMED/103'} ], }] } batch = batch_processor.convert_patient(example_patient, tensor_type="pt") # Run model with torch.no_grad(): patient_ids, times, reprs = model(batch) print(patient_ids) print(times) print(reprs) ``` ## Training Details This model is not trained. ## Evaluation None, as the weights are random. ## Technical Specifications Please see [the EHRSHOT paper (Wornow et al. 2023)](https://arxiv.org/abs/2307.02028) for details on the model architecture and objective. ### Compute Infrastructure This model was not trained. #### Software For data loading / processing, this model leverages [FEMR](https://github.com/som-shahlab/femr/tree/main), a Python library for doing machine learning on EHR data at scale. ## Citation **BibTeX:** ``` @article{wornow2023ehrshot, title={EHRSHOT: An EHR Benchmark for Few-Shot Evaluation of Foundation Models}, author={Michael Wornow and Rahul Thapa and Ethan Steinberg and Jason Fries and Nigam Shah}, year={2023}, eprint={2307.02028}, archivePrefix={arXiv}, primaryClass={cs.LG} } ``` ## Model Card Authors Michael Wornow, Ethan Steinberg, Rahul Thapa, Jason Fries, Nigam H. Shah ## Model Card Contact Michael Wornow (mwornow@stanford.edu)