Question about standard when selecting genes from perturbation output
Thank you for such a great model.
I'm now trying the perturbation. I noticed that in the manuscript, you mentioned you found two genes from the cardiomyocytes dataset. But when I'm looking at the excel table, I noticed that there are different genes that got similar patterns like GSN and PLN. Also there are other genes like KLRD1 that leads to shift to NF, non shift to DCM, low FDR and high detection from dataset but seems are not related to cardiomyocytes. So I'm curious about the standard when selecting genes from the output of perturbation?
Thanks!
Thank you for your interest in Geneformer! To be clear for others, you are referring to Supplementary Table 12, tab "DCM_del_tx". All genes in that table were expressed in the cardiomyocytes. The model prioritized candidate therapeutic targets from the >17K detected genes to a couple hundred genes. From there, candidates can be selected based on your scientific question and experimental setup. For example, since we were testing the candidates in an iPSC disease model of DCM, we cross-referenced to genes that were expressed in the model cells in vitro so that we could meaningfully test inhibiting the candidate in the disease model. Also, for the purposes of discovering new biology as a demonstration of the potential of Geneformer to accelerate discoveries, we chose not to test genes that may have had clear prior knowledge relevant to cardiomyopathy. We also further prioritized based on thinking about the biology of each gene and knowledge of the disease pathology and which gene may be an effective candidate therapeutic target. One may choose to prioritize based on specificity of the expression of that gene in cardiomyocytes to avoid potential side effects in other tissues where the gene is expressed. One may also choose to prioritize based on genes with existing small molecules that target their expression. Alternatively, one may choose to test all the candidates predicted by Geneformer to be most comprehensive and provide further options for viable treatments. Although we tested only a few genes for this experimental validation of Geneformer-predicted targets, our collaborators are currently testing the remainder of the Geneformer-predicted genes for additional potential therapeutic success.