KeTuTu's picture
Upload 46 files
2999286 verified
#!/usr/bin/env python
# coding: utf-8
# # Drug response predict with scDrug
#
# scDrug is a database that can be used to predict the drug sensitivity of single cells based on an existing database of drug responses. In the downstream tasks of single cell analysis, especially in tumours, we are fully interested in potential drugs and combination therapies. To this end, we have integrated scDrug's IC50 prediction and inferCNV to infer the function of tumour cells to build a drug screening pipeline.
#
# Paper: [scDrug: From single-cell RNA-seq to drug response prediction](https://www.sciencedirect.com/science/article/pii/S2001037022005505)
#
# Code: https://github.com/ailabstw/scDrug
#
# Colab_Reproducibility:https://colab.research.google.com/drive/1mayoMO7I7qjYIRjrZEi8r5zuERcxAEcF?usp=sharing
# In[1]:
import omicverse as ov
import scanpy as sc
import infercnvpy as cnv
import matplotlib.pyplot as plt
import os
sc.settings.verbosity = 3 # verbosity: errors (0), warnings (1), info (2), hints (3)
sc.settings.set_figure_params(dpi=80, facecolor='white')
# ## Infer the Tumor from scRNA-seq
#
# Here we use Infercnvpy's example data to complete the tumour analysis, you can also refer to the official tutorial for this step: https://infercnvpy.readthedocs.io/en/latest/notebooks/tutorial_3k.html
#
# So, we provide a utility function ov.utils.get_gene_annotation to supplement the coordinate information from GTF files. The following usage assumes that the adata.var_names correspond to “gene_name” attribute in the GTF file. For other cases, please check the function documentation.
#
# The GTF file used here can be downloaded from [GENCODE](http://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M25/).
#
# T2T-CHM13 gtf file can be download from [figshare](https://figshare.com/ndownloader/files/40628072)
# In[3]:
adata = cnv.datasets.maynard2020_3k()
ov.utils.get_gene_annotation(
adata, gtf="gencode.v43.basic.annotation.gtf.gz",
gtf_by="gene_name"
)
# In[ ]:
adata=adata[:,~adata.var['chrom'].isnull()]
adata.var['chromosome']=adata.var['chrom']
adata.var['start']=adata.var['chromStart']
adata.var['end']=adata.var['chromEnd']
adata.var['ensg']=adata.var['gene_id']
adata.var.loc[:, ["ensg", "chromosome", "start", "end"]].head()
# We noted that infercnvpy need to normalize and log the matrix at first
# In[4]:
adata
# We use the immune cells as reference and infer the cnv score of each cells in scRNA-seq
# In[5]:
# We provide all immune cell types as "normal cells".
cnv.tl.infercnv(
adata,
reference_key="cell_type",
reference_cat=[
"B cell",
"Macrophage",
"Mast cell",
"Monocyte",
"NK cell",
"Plasma cell",
"T cell CD4",
"T cell CD8",
"T cell regulatory",
"mDC",
"pDC",
],
window_size=250,
)
cnv.tl.pca(adata)
cnv.pp.neighbors(adata)
cnv.tl.leiden(adata)
cnv.tl.umap(adata)
cnv.tl.cnv_score(adata)
# In[6]:
sc.pl.umap(adata, color="cnv_score", show=False)
# We set an appropriate threshold for the cnv_score, here we set it to 0.03 and identify cells greater than 0.03 as tumour cells
# In[7]:
adata.obs["cnv_status"] = "normal"
adata.obs.loc[
adata.obs["cnv_score"]>0.03, "cnv_status"
] = "tumor"
# In[8]:
sc.pl.umap(adata, color="cnv_status", show=False)
# We extracted tumour cells separately for drug prediction response
# In[11]:
tumor=adata[adata.obs['cnv_status']=='tumor']
tumor.X.max()
# ## Tumor preprocessing
#
# We need to extract the highly variable genes in the tumour for further analysis, and found out the sub-cluster in tumor
# In[12]:
adata=tumor
print('Preprocessing...')
sc.pp.filter_cells(adata, min_genes=200)
sc.pp.filter_genes(adata, min_cells=3)
adata.var['mt'] = adata.var_names.str.startswith('MT-')
sc.pp.calculate_qc_metrics(adata, qc_vars=['mt'], percent_top=None, log1p=False, inplace=True)
if not (adata.obs.pct_counts_mt == 0).all():
adata = adata[adata.obs.pct_counts_mt < 30, :]
adata.raw = adata.copy()
sc.pp.highly_variable_genes(adata)
adata = adata[:, adata.var.highly_variable]
sc.pp.scale(adata)
sc.tl.pca(adata, svd_solver='arpack')
# In[13]:
sc.pp.neighbors(adata, n_pcs=20)
sc.tl.umap(adata)
# Here, we need to download the scDrug database and mods so that the subsequent predictions can be made properly
# In[27]:
ov.utils.download_GDSC_data()
ov.utils.download_CaDRReS_model()
# Then, we apply Single-Cell Data Analysis once again to carry out sub-clustering on the tumor clusters at automatically determined resolution.
# In[18]:
adata, res,plot_df = ov.single.autoResolution(adata,cpus=4)
# Don't forget to save your data
# In[20]:
results_file = os.path.join('./', 'scanpyobj.h5ad')
adata.write(results_file)
# In[21]:
results_file = os.path.join('./', 'scanpyobj.h5ad')
adata=sc.read(results_file)
# ## IC50 predicted
#
# Drug Response Prediction examined scanpyobj.h5ad generated in Single-Cell Data Analysis, reported clusterwise IC50 and cell death percentages to drugs in the GDSC database via CaDRReS-Sc (a recommender system framework for in silico drug response prediction), or drug sensitivity AUC in the PRISM database from [DepMap Portal PRISM-19Q4](https://doi.org/10.1038/s43018-019-0018-6).
#
# Note we need to download the CaDRReS-Sc from github by `git clone https://github.com/CSB5/CaDRReS-Sc`
# In[24]:
get_ipython().system('git clone https://github.com/CSB5/CaDRReS-Sc')
# To run drug response predicted, we need to set:
#
# - scriptpath: the CaDRReS-Sc path we downloaded just now
# - modelpath: the model path we downloaded just now
# - output: the save path of drug response predicted result
# In[25]:
import ov
job=ov.single.Drug_Response(adata,scriptpath='CaDRReS-Sc',
modelpath='models/',
output='result')
# In[ ]: