File size: 3,714 Bytes
a38995c
 
169b2fa
 
 
a38995c
169b2fa
 
 
ada01ff
 
a38995c
169b2fa
a38995c
169b2fa
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
---
title: README
emoji: 🐨
colorFrom: purple
colorTo: blue
sdk: static
pinned: true
license: bsd-3-clause
short_description: Ensemble of experts for cell-type annotation
thumbnail: >-
  https://cdn-uploads.huggingface.co/production/uploads/63d7697f2e397d9f8e30e677/tvABibiml6K2sccfXLybG.png
---
# **popV**

Welcome to the **popV** framework. We provide state-of-the-art performance in cell-type label transfer using an ensemble of experts approach. We provide here pre-trained
models to transfer cell-types to your own query dataset. Cell-type definition is a tedious process. Using reference data can significantly accelerate this process.
By using several tools for label transfer, we provide a certainty score that is well calibrated and allows to detect cell-types, where automatic annotation has high
uncertainty. We recommend to manually check transferred cell-type labels by plotting marker or differentially expressed genes before blindly trusting them. 
This is an open science initiative, please contribute your own models to allow the single-cell community to leverage your reference datasets by asking in our [GitHub 
repository](https://github.com/YosefLab/popV) to add your dataset.

---

## **Model Overview**
popV trains up to 9 different algorithms for automatic label transfer and computes a consensus score. We provide an automatic report. To learn how to apply popV to your
own dataset, please refer to our [tutorial]()

### Algorithms

Currently implemented algorithms are:

-   K-nearest neighbor classification after dataset integration with [BBKNN](https://github.com/Teichlab/bbknn)
-   K-nearest neighbor classification after dataset integration with [SCANORAMA](https://github.com/brianhie/scanorama)
-   K-nearest neighbor classification after dataset integration with [scVI](https://github.com/scverse/scvi-tools)
-   K-nearest neighbor classification after dataset integration with [Harmony](https://github.com/lilab-bcb/harmony-pytorch)
-   Random forest classification
-   Support vector machine classification
-   [OnClass](https://github.com/wangshenguiuc/OnClass) cell type classification
-   [scANVI](https://github.com/scverse/scvi-tools) label transfer
-   [Celltypist](https://www.celltypist.org) cell type classification

---

## **Key Applications**
The purpose of these models is to perform cell-type label transfer.
We provide models with (CUML support)[collection] for large-scale reference mapping and (without CUML support)[collection] if no GPU is available. PopV without GPU scales
well to 100k cells. PopV has three levels of prediction complexities:

-   retrain will train all classifiers from scratch. For 50k cells this takes up to an hour of computing time using a GPU.
-   inference will use pretrained classifiers to annotate query as well as reference cells and construct a joint embedding using all integration methods from above. For 50k cells this takes in our hands up to half an hour of computing time using a GPU.
-   fast will use only methods with pretrained classifiers to annotate only query cells. For 50k cells this takes 5 minutes without a GPU (without UMAP embedding).

---

## **Publications**
- **[Original popV paper](https://www.nature.com/articles/s41588-024-01993-3)**:
  - Published in *Nature Genetics*, this paper introduces popV and benchmarks it.

## **Contact**
- GitHub: [https://github.com/YosefLab/popV](https://github.com/YosefLab/popV)
- User questions: [Discourse](https://discourse.scverse.org)


<!---
- **[MultiVI](https://docs.scvi-tools.org/en/stable/user_guide/models/multivi.html)**:
- A multi-modal model for joint analysis of RNA, ATAC and protein data, enabling integrative insights from diverse omics data.
-->