Spaces:

KeTuTu
/

OV_Agentic_EXP_SambaNova

Sleeping

App Files Files Community

OV_Agentic_EXP_SambaNova / ovrawm /t_simba.txt

KeTuTu

Upload 46 files

2999286 verified 27 days ago

raw

history blame contribute delete

2.93 kB

	#!/usr/bin/env python
	# coding: utf-8

	# # Data integration and batch correction with SIMBA
	#
	# Here we will use three scRNA-seq human pancreas datasets of different studies as an example to illustrate how SIMBA performs scRNA-seq batch correction for multiple batches
	#
	# We follow the corresponding tutorial at [SIMBA](https://simba-bio.readthedocs.io/en/latest/rna_human_pancreas.html). We do not provide much explanation, and instead refer to the original tutorial.
	#
	# Paper: [SIMBA: single-cell embedding along with features](https://www.nature.com/articles/s41592-023-01899-8)
	#
	# Code: https://github.com/huidongchen/simba

	# In[1]:


	import omicverse as ov
	from omicverse.utils import mde
	workdir = 'result_human_pancreas'
	ov.utils.ov_plot_set()


	# We need to install simba at first
	#
	# ```
	# conda install -c bioconda simba
	# ```
	#
	# or
	#
	# ```
	# pip install git+https://github.com/huidongchen/simba
	# pip install git+https://github.com/pinellolab/simba_pbg
	# ```

	# ## Read data
	#
	# The anndata object was concat from three anndata in simba: `simba.datasets.rna_baron2016()`, `simba.datasets.rna_segerstolpe2016()`, and `simba.datasets.rna_muraro2016()`
	#
	# It can be downloaded from figshare: https://figshare.com/ndownloader/files/41418600

	# In[2]:


	adata=ov.utils.read('simba_adata_raw.h5ad')


	# We need to set workdir to initiate the pySIMBA object

	# In[3]:


	simba_object=ov.single.pySIMBA(adata,workdir)


	# ## Preprocess
	#
	# Follow the raw tutorial, we set the paragument as default.

	# In[4]:


	simba_object.preprocess(batch_key='batch',min_n_cells=3,
	method='lib_size',n_top_genes=3000,n_bins=5)


	# ## Generate a graph for training
	#
	# Observations and variables within each Anndata object are both represented as nodes (entities).
	#
	# the data store in `simba_object.uns['simba_batch_edge_dict']`

	# In[5]:


	simba_object.gen_graph()


	# ## PBG training
	#
	# Before training, let’s take a look at the current parameters:
	#
	# - dict_config['workers'] = 12 #The number of CPUs.

	# In[10]:


	simba_object.train(num_workers=6)


	# In[6]:


	simba_object.load('result_human_pancreas/pbg/graph0')


	# ## Batch correction
	#
	# Here, we use `simba_object.batch_correction()` to perform the batch correction
	#
	# <div class="admonition note">
	# <p class="admonition-title">Note</p>
	# <p>
	# If the batch is greater than 10, then the batch correction is less effective
	# </p>
	# </div>

	# In[7]:


	adata=simba_object.batch_correction()
	adata


	# ## Visualize
	#
	# We also use `mde` instead `umap` to visualize the result

	# In[8]:


	adata.obsm["X_mde"] = mde(adata.obsm["X_simba"])


	# In[11]:


	sc.pl.embedding(adata,basis='X_mde',color=['cell_type1','batch'])


	# Certainly, umap can also be used to visualize

	# In[10]:


	import scanpy as sc
	sc.pp.neighbors(adata, use_rep="X_simba")
	sc.tl.umap(adata)
	sc.pl.umap(adata,color=['cell_type1','batch'])


	# In[ ]: