Spaces:
Sleeping
Sleeping
File size: 2,926 Bytes
2999286 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 |
#!/usr/bin/env python # coding: utf-8 # # Data integration and batch correction with SIMBA # # Here we will use three scRNA-seq human pancreas datasets of different studies as an example to illustrate how SIMBA performs scRNA-seq batch correction for multiple batches # # We follow the corresponding tutorial at [SIMBA](https://simba-bio.readthedocs.io/en/latest/rna_human_pancreas.html). We do not provide much explanation, and instead refer to the original tutorial. # # Paper: [SIMBA: single-cell embedding along with features](https://www.nature.com/articles/s41592-023-01899-8) # # Code: https://github.com/huidongchen/simba # In[1]: import omicverse as ov from omicverse.utils import mde workdir = 'result_human_pancreas' ov.utils.ov_plot_set() # We need to install simba at first # # ``` # conda install -c bioconda simba # ``` # # or # # ``` # pip install git+https://github.com/huidongchen/simba # pip install git+https://github.com/pinellolab/simba_pbg # ``` # ## Read data # # The anndata object was concat from three anndata in simba: `simba.datasets.rna_baron2016()`, `simba.datasets.rna_segerstolpe2016()`, and `simba.datasets.rna_muraro2016()` # # It can be downloaded from figshare: https://figshare.com/ndownloader/files/41418600 # In[2]: adata=ov.utils.read('simba_adata_raw.h5ad') # We need to set workdir to initiate the pySIMBA object # In[3]: simba_object=ov.single.pySIMBA(adata,workdir) # ## Preprocess # # Follow the raw tutorial, we set the paragument as default. # In[4]: simba_object.preprocess(batch_key='batch',min_n_cells=3, method='lib_size',n_top_genes=3000,n_bins=5) # ## Generate a graph for training # # Observations and variables within each Anndata object are both represented as nodes (entities). # # the data store in `simba_object.uns['simba_batch_edge_dict']` # In[5]: simba_object.gen_graph() # ## PBG training # # Before training, let’s take a look at the current parameters: # # - dict_config['workers'] = 12 #The number of CPUs. # In[10]: simba_object.train(num_workers=6) # In[6]: simba_object.load('result_human_pancreas/pbg/graph0') # ## Batch correction # # Here, we use `simba_object.batch_correction()` to perform the batch correction # # <div class="admonition note"> # <p class="admonition-title">Note</p> # <p> # If the batch is greater than 10, then the batch correction is less effective # </p> # </div> # In[7]: adata=simba_object.batch_correction() adata # ## Visualize # # We also use `mde` instead `umap` to visualize the result # In[8]: adata.obsm["X_mde"] = mde(adata.obsm["X_simba"]) # In[11]: sc.pl.embedding(adata,basis='X_mde',color=['cell_type1','batch']) # Certainly, umap can also be used to visualize # In[10]: import scanpy as sc sc.pp.neighbors(adata, use_rep="X_simba") sc.tl.umap(adata) sc.pl.umap(adata,color=['cell_type1','batch']) # In[ ]: |