Question about rank-based gene ordering in Geneformer
I believe that the gene ranking values are contextually important and should be appropriately reflected in the model.
Since Geneformer is based on a BERT-style Transformer, which processes tokens in parallel without explicit positional encoding,
I'm curious whether the model can truly recognize and utilize this rank-based importance solely from the input order during pretraining.
Specifically, has there been any investigation or ablation study comparing ranked versus non-ranked gene ordering during pretraining?
Thanks for your question. Yes, the in silico overexpression studies are based on shifting the overexpressed gene to the front of the rank value encoding, so the rank does impact the model embeddings in the model that is pretrained with ranking.