--- title: README emoji: 🐨 colorFrom: purple colorTo: blue sdk: static pinned: true license: bsd-3-clause short_description: Ensemble of experts for cell-type annotation --- # **popV** Welcome to the **popV** framework. We provide state-of-the-art performance in cell-type label transfer using an ensemble of experts approach. We provide here pre-trained models to transfer cell-types to your own query dataset. Cell-type definition is a tedious process. Using reference data can significantly accelerate this process. By using several tools for label transfer, we provide a certainty score that is well calibrated and allows to detect cell-types, where automatic annotation has high uncertainty. We recommend to manually check transferred cell-type labels by plotting marker or differentially expressed genes before blindly trusting them. This is an open science initiative, please contribute your own models to allow the single-cell community to leverage your reference datasets by asking in our [GitHub repository](https://github.com/YosefLab/popV) to add your dataset. --- ## **Model Overview** popV trains up to 9 different algorithms for automatic label transfer and computes a consensus score. We provide an automatic report. To learn how to apply popV to your own dataset, please refer to our [tutorial]() ### Algorithms Currently implemented algorithms are: - K-nearest neighbor classification after dataset integration with [BBKNN](https://github.com/Teichlab/bbknn) - K-nearest neighbor classification after dataset integration with [SCANORAMA](https://github.com/brianhie/scanorama) - K-nearest neighbor classification after dataset integration with [scVI](https://github.com/scverse/scvi-tools) - K-nearest neighbor classification after dataset integration with [Harmony](https://github.com/lilab-bcb/harmony-pytorch) - Random forest classification - Support vector machine classification - [OnClass](https://github.com/wangshenguiuc/OnClass) cell type classification - [scANVI](https://github.com/scverse/scvi-tools) label transfer - [Celltypist](https://www.celltypist.org) cell type classification --- ## **Key Applications** The purpose of these models is to perform cell-type label transfer. We provide models with (CUML support)[collection] for large-scale reference mapping and (without CUML support)[collection] if no GPU is available. PopV without GPU scales well to 100k cells. PopV has three levels of prediction complexities: - retrain will train all classifiers from scratch. For 50k cells this takes up to an hour of computing time using a GPU. - inference will use pretrained classifiers to annotate query as well as reference cells and construct a joint embedding using all integration methods from above. For 50k cells this takes in our hands up to half an hour of computing time using a GPU. - fast will use only methods with pretrained classifiers to annotate only query cells. For 50k cells this takes 5 minutes without a GPU (without UMAP embedding). --- ## **Publications** - **[Original popV paper](https://www.nature.com/articles/s41588-024-01993-3)**: - Published in *Nature Genetics*, this paper introduces popV and benchmarks it. ## **Contact** - GitHub: [https://github.com/YosefLab/popV](https://github.com/YosefLab/popV) - User questions: [Discourse](https://discourse.scverse.org)