--- library_name: uce-100m tags: - model_hub_mixin - pytorch_model_hub_mixin - biology license: mit language: - en --- # Model Card: Universal Cell Embeddings (UCE) ## Model Description **Universal Cell Embeddings (UCE)** is a foundation model designed for single-cell RNA sequencing data analysis. UCE generates a universal representation of cells that captures the molecular diversity across different cell types, tissues, and species. The model leverages extensive single-cell transcriptomic data, creating a unified biological latent space that can represent any cell without additional annotations or fine-tuning. ## Colab Notebook Demo (100M): [Make a Copy of the Notebook](https://colab.research.google.com/drive/1opud0BVWr76IM8UnGgTomVggui_xC4p0?usp=sharing) ## Model Details - **Model type**: Transformer-based foundation model - **Authors**: Yanay Rosen, Yusuf Roohani, Ayush Agarwal, Leon SamotorĨan, Tabula Sapiens Consortium, Stephen R. Quake, Jure Leskovec - **Institution**: Stanford University, Chan Zuckerberg BioHub, Chan Zuckerberg Initiative - **License**: CC-BY-NC-ND 4.0 - **Paper**: [Universal Cell Embeddings: A Foundation Model for Cell Biology](https://www.biorxiv.org/content/10.1101/2023.11.28.568918v1.full.pdf) - **Github Repo**: [Link to Github Repo](https://github.com/snap-stanford/UCE) - **Hugging Face Model Repo**: ## Intended Use UCE is intended for researchers and practitioners in cell biology and computational biology. It enables the integration and analysis of single-cell RNA sequencing data from diverse experiments and species, facilitating the discovery of new cell types and states. ### Use Cases - **Cell type annotation**: Automatically annotate new single-cell datasets without the need for retraining. - **Cross-dataset analysis**: Integrate and compare single-cell data from different studies. - **Novel cell discovery**: Identify and characterize previously unknown cell types. - **Biological insights**: Gain insights into cellular organization and developmental lineages. ## Training Data UCE was trained on a large corpus of single-cell RNA sequencing data, including datasets from multiple species such as human, mouse, zebrafish, and more. The training data was sourced from publicly available single-cell atlases and processed to ensure consistency and robustness across experiments. ## Evaluation UCE was evaluated on various single-cell datasets not included in the training set. The model's performance was assessed based on its ability to accurately embed and classify cell types, integrate new datasets, and identify novel cell types. ## Ethical Considerations - **Data privacy**: Ensure that all single-cell data used with UCE complies with relevant privacy regulations and ethical guidelines. - **Research transparency**: When using UCE in published research, provide clear descriptions of methods and data used. ## Citation If you use the UCE model in your research, please cite the following paper: @article{rosen2023uce, title={Universal Cell Embeddings: A Foundation Model for Cell Biology}, author={Rosen, Yanay and Roohani, Yusuf and Agarwal, Ayush and SamotorĨan, Leon and Quake, Stephen R and Leskovec, Jure}, journal={bioRxiv}, year={2023}, doi={10.1101/2023.11.28.568918} } For more detailed instructions and use cases, refer to the [UCE paper](https://www.biorxiv.org/content/10.1101/2023.11.28.568918v1.full.pdf).