How to run model (InSilicoPerturber) on different GPU than where PyTorch is allocated
Hello,
I am quite new to machine learning and would like some help on how to run the model on a different GPU than where PyTorch is allocated. I have 4GPUs, 24GB each. PyTorch is currently on GPU 0, reserving ~17GB of memory:
β+---------------------------------------------------------------------------------------+
β| NVIDIA-SMI 535.54.03 Driver Version: 535.54.03 CUDA Version: 12.2 |
β|-----------------------------------------+----------------------+----------------------+
β| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
β| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
β| | | MIG M. |
β|=========================================+======================+======================|
β| 0 NVIDIA A10G On | 00000000:00:1B.0 Off | 0 |
β| 0% 34C P0 60W / 300W | 17106MiB / 23028MiB | 0% Default |
β| | | N/A |
β+-----------------------------------------+----------------------+----------------------+
β| 1 NVIDIA A10G On | 00000000:00:1C.0 Off | 0 |
β| 0% 27C P8 15W / 300W | 5MiB / 23028MiB | 0% Default |
β| | | N/A |
β+-----------------------------------------+----------------------+----------------------+
β| 2 NVIDIA A10G On | 00000000:00:1D.0 Off | 0 |
β| 0% 28C P8 18W / 300W | 5MiB / 23028MiB | 0% Default |
β| | | N/A |
β+-----------------------------------------+----------------------+----------------------+
β| 3 NVIDIA A10G On | 00000000:00:1E.0 Off | 0 |
β| 0% 27C P8 15W / 300W | 5MiB / 23028MiB | 0% Default |
β| | | N/A |
β+-----------------------------------------+----------------------+----------------------+
β
β+---------------------------------------------------------------------------------------+
β| Processes: |
β| GPU GI CI PID Type Process name GPU Memory |
β| ID ID Usage |
β|=======================================================================================|
β| 0 N/A N/A 2024 C python3 17098MiB |
β+---------------------------------------------------------------------------------------+
I would like to run the model on GPU 1, 2, or 3. I've already tried to edit in_silico_perturber.py
and have replaced every instance of 'cuda'
with 'cuda:1'
. However, it seems the model is still trying to run on GPU 0 (error posted at bottom).
These are the parameters I am using for in silico perturbation:
isp = InSilicoPerturber(perturb_type="delete",
perturb_rank_shift=None,
# HNF4A: ENSG00000101076
genes_to_perturb=["ENSG00000101076"],
combos=0,
anchor_gene=None,
model_type="Pretrained",
num_classes=0,
emb_mode="cell",
cell_emb_style="mean_pool",
filter_data=None,
cell_states_to_model=None,
max_ncells=None,
emb_layer=-1,
forward_batch_size=200,
nproc=16,
token_dictionary_file = "/home/ubuntu/Geneformer/geneformer/token_dictionary.pkl")
# Perturb data
isp.perturb_data("/home/ubuntu/Geneformer/",
"/data/genecorpus_filtered_hep/",
"/data/genecorpus_filtered_hep/delete_cell/",
"delete_cell_HNF4A")
And changing forward_batch_size
to smaller numbers still raises the same torch.cuda.OutOfMemoryError
error:
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 12.49 GiB (GPU 0; 22.19 GiB total capacity; 3.22 Gi
B already allocated; 5.49 GiB free; 16.40 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory
try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_
CONF
Any advice on what I should try next would be very helpful. Thank you!
Thank you for your interest in Geneformer! The OOM error is referring to any entities related to PyTorch (e.g. the model), not PyTorch itself. PyTorch is required on all GPUs running the model. You should be able to run the code on a single GPU - you should reduce the forward_batch_size until it fits on your resources. Additionally, if you are using the 12L model, you can consider using the 6L model which will be less resource-intensive. If you'd like to distribute the job to multiple GPUs, there are multiple ways to do this, but I would recommend either running separate batches of cells on each GPU or using a method like Deepspeed if you'd like to distribute the model itself.