Spaces:
Sleeping
Sleeping
#!/usr/bin/env python | |
# coding: utf-8 | |
# # Data integration and batch correction with SIMBA | |
# | |
# Here we will use three scRNA-seq human pancreas datasets of different studies as an example to illustrate how SIMBA performs scRNA-seq batch correction for multiple batches | |
# | |
# We follow the corresponding tutorial at [SIMBA](https://simba-bio.readthedocs.io/en/latest/rna_human_pancreas.html). We do not provide much explanation, and instead refer to the original tutorial. | |
# | |
# Paper: [SIMBA: single-cell embedding along with features](https://www.nature.com/articles/s41592-023-01899-8) | |
# | |
# Code: https://github.com/huidongchen/simba | |
# In[1]: | |
import omicverse as ov | |
from omicverse.utils import mde | |
workdir = 'result_human_pancreas' | |
ov.utils.ov_plot_set() | |
# We need to install simba at first | |
# | |
# ``` | |
# conda install -c bioconda simba | |
# ``` | |
# | |
# or | |
# | |
# ``` | |
# pip install git+https://github.com/huidongchen/simba | |
# pip install git+https://github.com/pinellolab/simba_pbg | |
# ``` | |
# ## Read data | |
# | |
# The anndata object was concat from three anndata in simba: `simba.datasets.rna_baron2016()`, `simba.datasets.rna_segerstolpe2016()`, and `simba.datasets.rna_muraro2016()` | |
# | |
# It can be downloaded from figshare: https://figshare.com/ndownloader/files/41418600 | |
# In[2]: | |
adata=ov.utils.read('simba_adata_raw.h5ad') | |
# We need to set workdir to initiate the pySIMBA object | |
# In[3]: | |
simba_object=ov.single.pySIMBA(adata,workdir) | |
# ## Preprocess | |
# | |
# Follow the raw tutorial, we set the paragument as default. | |
# In[4]: | |
simba_object.preprocess(batch_key='batch',min_n_cells=3, | |
method='lib_size',n_top_genes=3000,n_bins=5) | |
# ## Generate a graph for training | |
# | |
# Observations and variables within each Anndata object are both represented as nodes (entities). | |
# | |
# the data store in `simba_object.uns['simba_batch_edge_dict']` | |
# In[5]: | |
simba_object.gen_graph() | |
# ## PBG training | |
# | |
# Before training, let’s take a look at the current parameters: | |
# | |
# - dict_config['workers'] = 12 #The number of CPUs. | |
# In[10]: | |
simba_object.train(num_workers=6) | |
# In[6]: | |
simba_object.load('result_human_pancreas/pbg/graph0') | |
# ## Batch correction | |
# | |
# Here, we use `simba_object.batch_correction()` to perform the batch correction | |
# | |
# <div class="admonition note"> | |
# <p class="admonition-title">Note</p> | |
# <p> | |
# If the batch is greater than 10, then the batch correction is less effective | |
# </p> | |
# </div> | |
# In[7]: | |
adata=simba_object.batch_correction() | |
adata | |
# ## Visualize | |
# | |
# We also use `mde` instead `umap` to visualize the result | |
# In[8]: | |
adata.obsm["X_mde"] = mde(adata.obsm["X_simba"]) | |
# In[11]: | |
sc.pl.embedding(adata,basis='X_mde',color=['cell_type1','batch']) | |
# Certainly, umap can also be used to visualize | |
# In[10]: | |
import scanpy as sc | |
sc.pp.neighbors(adata, use_rep="X_simba") | |
sc.tl.umap(adata) | |
sc.pl.umap(adata,color=['cell_type1','batch']) | |
# In[ ]: | |