Organization Card

popV

Welcome to the popV framework. We provide state-of-the-art performance in cell-type label transfer using an ensemble of experts approach. We provide here pre-trained models to transfer cell-types to your own query dataset. Cell-type definition is a tedious process. Using reference data can significantly accelerate this process. By using several tools for label transfer, we provide a certainty score that is well calibrated and allows to detect cell-types, where automatic annotation has high uncertainty. We recommend to manually check transferred cell-type labels by plotting marker or differentially expressed genes before blindly trusting them. This is an open science initiative, please contribute your own models to allow the single-cell community to leverage your reference datasets by asking in our GitHub repository to add your dataset.

Model Overview

popV trains up to 9 different algorithms for automatic label transfer and computes a consensus score. We provide an automatic report. To learn how to apply popV to your own dataset, please refer to our tutorial

Algorithms

Currently implemented algorithms are:

K-nearest neighbor classification after dataset integration with BBKNN
K-nearest neighbor classification after dataset integration with SCANORAMA
K-nearest neighbor classification after dataset integration with scVI
K-nearest neighbor classification after dataset integration with Harmony
Random forest classification
Support vector machine classification
OnClass cell type classification
scANVI label transfer
Celltypist cell type classification

Key Applications

The purpose of these models is to perform cell-type label transfer. We provide models with (CUML support)[collection] for large-scale reference mapping and (without CUML support)[collection] if no GPU is available. PopV without GPU scales well to 100k cells. PopV has three levels of prediction complexities:

retrain will train all classifiers from scratch. For 50k cells this takes up to an hour of computing time using a GPU.
inference will use pretrained classifiers to annotate query as well as reference cells and construct a joint embedding using all integration methods from above. For 50k cells this takes in our hands up to half an hour of computing time using a GPU.
fast will use only methods with pretrained classifiers to annotate only query cells. For 50k cells this takes 5 minutes without a GPU (without UMAP embedding).

Publications

Original popV paper:
- Published in Nature Genetics, this paper introduces popV and benchmarks it.

Contact

GitHub: https://github.com/YosefLab/popV
User questions: Discourse

Collections 2

models 78

datasets 1

popV/ontology

Updated Jan 5, 2025 • 171 • 1

popV

AI & ML interests

popV

Model Overview

Algorithms

Key Applications

Publications

Contact

Collections 2

popV/tabula_sapiens_Endothelium

popV/tabula_sapiens_Lung

popV/tabula_sapiens_Germline

popV/tabula_sapiens_Neural

popV/tabula_muris_All

popV/tabula_muris_All_10x

popV/tabula_muris_All_Smart-seq2

popV/tabula_muris_Bone_marrow_10x

popV/tabula_sapiens_Endothelium

popV/tabula_sapiens_Lung

popV/tabula_sapiens_Germline

popV/tabula_sapiens_Neural

popV/tabula_muris_All

popV/tabula_muris_All_10x

popV/tabula_muris_All_Smart-seq2

popV/tabula_muris_Bone_marrow_10x

models 78

popV/tabula_muris_Aorta

popV/tabula_muris_Kidney_Smart-seq2

popV/tabula_muris_Diaphragm

popV/tabula_muris_Large_intestine_Smart-seq2

popV/tabula_muris_Brown_adipose_tissue

popV/tabula_muris_Bladder_lumen_Smart-seq2

popV/tabula_muris_Liver_Smart-seq2

popV/tabula_muris_Mesenteric_fat_pad

popV/tabula_muris_Trachea_Smart-seq2

popV/tabula_muris_Mammary_gland_Smart-seq2

datasets 1

popV/ontology

AI & ML interests

Team members 2

popV

Model Overview

Algorithms

Key Applications

Publications

Contact

Collections 2

models 78 Sort: Recently updated

datasets 1

models 78