torch.cuda.OutOfMemoryError

#230
by swang12 - opened

Hi, I was running InSilicoPerturber for each gene one by one in a loop and I got torch.cuda.OutOfMemoryError for the first gene. I reduced the number of cells to 200 but this didn't resolve the issue. In comparison, previously when I ran InSilicoPerturber for all genes with 2,000 cells, I didn't have this issue. I'm pasting the code and error message below:

for gene in tqdm(genes_to_perturb):
output_directory = f'{root_directory}{gene}/'
os.mkdir(output_directory)

shutil.copytree('tokenized_.dataset', 'tokenized_copy.dataset')

isp = InSilicoPerturber(perturb_type='delete', 
                        perturb_rank_shift=None, 
                        genes_to_perturb=[gene], 
                        combos=0, 
                        anchor_gene=None, 
                        model_type='Pretrained', 
                        num_classes=0, 
                        emb_mode='cell_and_gene', 
                        cell_emb_style='mean_pool', 
                        cell_states_to_model=None, 
                        max_ncells=200, 
                        emb_layer=-1, 
                        forward_batch_size=32, 
                        nproc=16
                        )

isp.perturb_data(model_directory='Geneformer/', 
                input_data_file='tokenized_copy.dataset', 
                output_directory=output_directory, 
                output_prefix='perturbed'
                )

ispstats = InSilicoPerturberStats(mode='mixture_model', 
                                combos=0, 
                                anchor_gene=None, 
                                cell_states_to_model=None
                                )

ispstats.get_stats(input_data_directory=output_directory,
                null_dist_data_directory=None,
                output_directory=output_directory,
                output_prefix='stats_emb_mode_gene'
                )

shutil.rmtree('tokenized_copy.dataset')

gc.collect()
torch.cuda.empty_cache()

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 2.00 GiB (GPU 0; 31.74 GiB total capacity; 7.07 GiB already allocated; 1.25 GiB free; 7.08 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

I was wondering if you could help me resolve this issue. Thank you very much!

Sincerely,
Su Wang

I've also been running into this issue lately where I receive the same error message when running with large forward_batch_size. When I reduce forward_batch_size=10, I receive a new error message:

RuntimeError: Sizes of tensors must match except in dimension 0. Expected size 2047 but got size 2004 for tensor number 7 in the list.

However, I'm thinking this may be related to Discussion #85. Have you tried changing emb_mode='gene'?

A couple of things I've tried but haven't worked:

  1. Setting max_ncells=50 and forward_batch_size=50
  2. torch.cuda.empty_cache() and gc.collect(), as shown above
  3. Modifying in_silico_perturber.py to clear the memory every 500 cells instead of every 1000 cells
  4. Setting os.environ["PYTORCH_CUDA_ALLOC_CONF"]="max_split_size_mb: 512"
  5. Restarting my Python session and only loading in necessary objects

I make sure to have the latest version of Geneformer pulled every time as well. Open to any suggestions anyone has!

Thank you for your interest in Geneformer! A few notes:

  • if you are running the analysis for many genes, it will likely be more efficient to run it with the genes_to_perturb="all" option because the batches will be run as all genes for an individual cell, which will omit the need to pad for variable size cells so that there is no computation spent on padding.

  • when you run the mode as "cell_and_gene", gene embeddings are also outputted in addition to cell embeddings, which increases the memory requirements

  • you are just barely out of memory so by decreasing the batch size a bit more from 32 you may be able to fit your analysis on the GPU. I would suggest successively reducing the batch sizes to choose the largest possible without running out of memory. The cells are sorted to encounter memory errors sooner so the initial batch size should work for the remainder.

  • emptying the cache of GPUs is often non-trivial. Rather than running each gene as a loop, I would suggest you run a script with xargs, for example, so that the GPU is completely reset between runs.

  • if you are using the 12 layer model, you may consider using the 6 layer model (outer directory of this repository), which will be more memory-efficient.

ctheodoris changed discussion status to closed

Sign up or log in to comment