in_silico_perturbation: too many indices for tensor of dimension 2

#169
by pchiang5 - opened

Thank you for your help with the tool.

However, I encountered another error too many indices for tensor of dimension 2 below with the updated version. Was it due to the empty [] that includes all the other 27 classes in cell_states_to_model={'state_key': 'cell_type', 'start_state': 'A', 'goal_state': 'B', 'alt_states': []},?

.
.
.
.
.
Map (num_proc=20): 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1682/1682 [00:00<00:00, 3219.35 examples/s]
Map (num_proc=20): 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1681/1681 [00:01<00:00, 853.88 examples/s]
Map (num_proc=20): 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1681/1681 [00:00<00:00, 3065.67 examples/s]
---------------------------------------------------------------------------| 1334/1681 [00:00<00:00, 4718.28 examples/s]
IndexError                                Traceback (most recent call last)
Cell In[3], line 18
      2 from geneformer import InSilicoPerturberStats
      3 isp = InSilicoPerturber(perturb_type="overexpress",
      4                         perturb_rank_shift=None,
      5                         genes_to_perturb="all",
   (...)
     16                         forward_batch_size=16,
     17                         nproc=20)
---> 18 isp.perturb_data("/mnt/c/Users/pc/Downloads/GF/230803_geneformer_CellClassifier_bone_marrow_L2048_B8_LR0.00028292192255361916_LSlinear_WU559.5193527108581_E1_Oadamw_F2",
     19                  "/GD/Sequencing_data/benchmark/transformer.dataset",
     20                  "/mnt/c/Users/pc/Downloads/GF",
     21                  "M7")

File /home/pc/miniconda3/envs/geneformer/lib/python3.10/site-packages/geneformer/in_silico_perturber.py:952, in InSilicoPerturber.perturb_data(self, model_directory, input_data_file, output_directory, output_prefix)
    948         return example[state_name] in [start_state]
    950     filtered_input_data = filtered_input_data.filter(filter_for_origin, num_proc=self.nproc)
--> 952 self.in_silico_perturb(model,
    953                       filtered_input_data,
    954                       layer_to_quant,
    955                       state_embs_dict,
    956                       output_directory,
    957                       output_prefix)

File /home/pc/miniconda3/envs/geneformer/lib/python3.10/site-packages/geneformer/in_silico_perturber.py:1096, in InSilicoPerturber.in_silico_perturb(self, model, filtered_input_data, layer_to_quant, state_embs_dict, output_directory, output_prefix)
   1089 for combo_lvl in range(self.combos+1):
   1090     perturbation_batch, indices_to_perturb = make_perturbation_batch(example_cell,
   1091                                                                     self.perturb_type,
   1092                                                                     self.tokens_to_perturb,
   1093                                                                     self.anchor_token,
   1094                                                                     combo_lvl,
   1095                                                                     self.nproc)
-> 1096     cos_sims_data = quant_cos_sims(model,
   1097                                    self.perturb_type,
   1098                                    perturbation_batch,
   1099                                    self.forward_batch_size,
   1100                                    layer_to_quant,
   1101                                    original_emb,
   1102                                    self.tokens_to_perturb,
   1103                                    indices_to_perturb,
   1104                                    self.perturb_group,
   1105                                    self.cell_states_to_model,
   1106                                    state_embs_dict,
   1107                                    self.pad_token_id,
   1108                                    model_input_size,
   1109                                    self.nproc)
   1111     if self.cell_states_to_model is None:
   1112         # update cos sims dict
   1113         # key is tuple of (perturbed_gene, affected_gene)
   1114         # or (perturbed_gene, "cell_emb") for avg cell emb change
   1115         cos_sims_data = cos_sims_data.to("cuda")

File /home/pc/miniconda3/envs/geneformer/lib/python3.10/site-packages/geneformer/in_silico_perturber.py:389, in quant_cos_sims(model, perturb_type, perturbation_batch, forward_batch_size, layer_to_quant, original_emb, tokens_to_perturb, indices_to_perturb, perturb_group, cell_states_to_model, state_embs_dict, pad_token_id, model_input_size, nproc)
    387     if perturb_group == True:
    388         overexpressed_to_remove = len(tokens_to_perturb)
--> 389     minibatch_emb = minibatch_emb[:,overexpressed_to_remove:,:]
    391 # if quantifying single perturbation in multiple different cells, pad original batch and extract embs
    392 if perturb_group == True:
    393     # pad minibatch of original batch to extract embeddings
    394     # truncate to the (model input size - # tokens to overexpress) to ensure comparability
    395     # since max input size of perturb batch will be reduced by # tokens to overexpress

IndexError: too many indices for tensor of dimension 2

Thank you for your question. Please update your comment with the arguments you are using to set up the in silico perturber so we can reproduce the error and help troubleshoot.

Thank you for the reminder. Please see below:

import os
os.environ['TRANSFORMERS_CACHE'] = '/mnt/c/Users/pc/Downloads/cache/'
from geneformer import InSilicoPerturber
from geneformer import InSilicoPerturberStats
isp = InSilicoPerturber(perturb_type="overexpress",
                        perturb_rank_shift=None,
                        genes_to_perturb="all",
                        combos=0,
                        anchor_gene=None,
                        model_type="CellClassifier",
                        num_classes=29,
                        emb_mode="cell",
                        cell_emb_style="mean_pool",
                        cell_states_to_model={'state_key': 'cell_type', 'start_state': 'A', 'goal_state': 'B', 'alt_states': []},
                        max_ncells=2000,
                        emb_layer=0,
                        forward_batch_size=16,
                        nproc=20)
isp.perturb_data("/mnt/c/Users/pc/Downloads/GF/230803_geneformer_CellClassifier_bone_marrow_L2048_B8_LR0.00028292192255361916_LSlinear_WU559.5193527108581_E1_Oadamw_F2",
                 "/GD/Sequencing_data/benchmark/transformer.dataset",
                 "/mnt/c/Users/pc/Downloads/GF",
                 "M7")

Thank you for including the set up you are using. I am not encountering this error when running with your arguments. I assume your cell classification fine-tuned model was trained to distinguish 29 classes and I assume you are using the current updated version. I also assume that the number of genes detected in all of your cells is >2. Assuming all that is true, please email me a link to your dataset and fine-tuned model so I can reproduce the error and help you troubleshoot.

Thank you for being so willing to help. I have emailed the fine-tuned model and the corresponding dataset to you.

Thank you for being so willing to help again. After switching to the cpu mode, I got it to work by setting forward_batch_size=400. Thus, the issue might be related to low batch size with 4GB VRAM.

pchiang5 changed discussion status to closed

Thank you for this information!

Sign up or log in to comment