ctheodoris/Geneformer · in_silico_perturber UnboundLocalError: local variable 'original_max

Hello,

Seems I'm facing the same error described in Discussion #182; however, pulling the latest version of Geneformer did not fix this issue:

gc.collect()
torch.cuda.empty_cache()
isp = InSilicoPerturber(perturb_type="delete",
                        perturb_rank_shift=None,
                        # HNF4A: ENSG00000101076
                        genes_to_perturb=["ENSG00000101076"],
                        combos=0,
                        anchor_gene=None,
                        model_type="Pretrained",
                        num_classes=0,
                        emb_mode="cell",
                        cell_emb_style="mean_pool",
                        filter_data=None,
                        cell_states_to_model=None,
                        max_ncells=10,
                        emb_layer=-1,
                        forward_batch_size=10,
                        nproc=16,
                        token_dictionary_file = "/home/ubuntu/Geneformer/geneformer/token_dictionary.pkl")

# Perturb data
isp.perturb_data("/home/ubuntu/Geneformer/",
                 "/data/genecorpus_filtered_hep/",
                 "/data/genecorpus_filtered_hep/delete_cell/",
                 "delete_cell_HNF4A")

Filter (num_proc=16): 100%|█████████████████████████████████████████████| 12775/12775 [00:13<00:00, 981.99 examples/s]
Map (num_proc=16): 100%|██████████████████████████████████████████████████| 3000/3000 [00:12<00:00, 231.00 examples/s]
Map (num_proc=16): 100%|█████████████████████████████████████████████████| 3000/3000 [00:00<00:00, 8204.96 examples/s]
Traceback (most recent call last):                                                                                    
  File "<stdin>", line 1, in <module>                                                                                 
  File "/opt/tensorflow/lib/python3.10/site-packages/geneformer/in_silico_perturber.py", line 974, in perturb_data    
    self.in_silico_perturb(model,                                                                                     
  File "/opt/tensorflow/lib/python3.10/site-packages/geneformer/in_silico_perturber.py", line 1052, in in_silico_pertu
rb                                                                                                                    
    cos_sims_data = quant_cos_sims(model,                                                                             
  File "/opt/tensorflow/lib/python3.10/site-packages/geneformer/in_silico_perturber.py", line 411, in quant_cos_sims  
    attention_mask = gen_attention_mask(original_minibatch, original_max_len)                                         
UnboundLocalError: local variable 'original_max_len' referenced before assignment

When I change max_ncells=30 and keep forward_batch_size=30, I receive a new error (see below). I'm unsure of whether these 2 errors are related but I thought I'd post both anyways. I've had previous success running the code with these parameters on Aug 31, 2023 so it may be due to changes implemented after that date?

Filter (num_proc=16): 100%|█████████████████████████████████████████████| 12775/12775 [00:12<00:00, 986.85 examples/s]
Map (num_proc=16): 100%|███████████████████████████████████████████████████████| 30/30 [00:13<00:00,  2.30 examples/s]
Map (num_proc=16): 100%|██████████████████████████████████████████████████████| 30/30 [00:00<00:00, 172.79 examples/s]
Map (num_proc=16): 100%|██████████████████████████████████████████████████████| 30/30 [00:00<00:00, 170.27 examples/s]
Map (num_proc=16): 100%|██████████████████████████████████████████████████████| 30/30 [00:00<00:00, 177.20 examples/s]
Traceback (most recent call last):                                                                                    
  File "<stdin>", line 1, in <module>                                                                                 
  File "/opt/tensorflow/lib/python3.10/site-packages/geneformer/in_silico_perturber.py", line 974, in perturb_data    
    self.in_silico_perturb(model,                                                                                     
  File "/opt/tensorflow/lib/python3.10/site-packages/geneformer/in_silico_perturber.py", line 1052, in in_silico_pertu
rb                                                                                                                    
    cos_sims_data = quant_cos_sims(model,                                                                             
  File "/opt/tensorflow/lib/python3.10/site-packages/geneformer/in_silico_perturber.py", line 444, in quant_cos_sims  
    cos_sims += [cos(minibatch_emb, minibatch_comparison).to("cpu")]                                                  
  File "/opt/tensorflow/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl            
    return forward_call(*args, **kwargs)                                                                              
  File "/opt/tensorflow/lib/python3.10/site-packages/torch/nn/modules/distance.py", line 87, in forward               
    return F.cosine_similarity(x1, x2, self.dim, self.eps)                                                            
RuntimeError: The size of tensor a (2047) must match the size of tensor b (2046) at non-singleton dimension 1

ctheodoris
/

Geneformer

in_silico_perturber UnboundLocalError: local variable 'original_max_len' referenced before assignment