Error in subprocess: concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.
#3
by
BwandoWando
- opened
Im trying to run modernBERT in an Azure Databricks A100 compute (Standard_NC24ads_A100_v4) and Im getting the error(s) below, take note though the same code runs flawlessly in my local machine. Though what I've noticed is that only the subprocess errors out and the main thread is alive, thus the output
I've installed the recommended transformers version from git as of 20-Dec-2024 pip install git+https://github.com/huggingface/transformers.git
Here are the versions of my libraries in the compute, in case someone may be interested
- flash-attn==2.7.2.post1
- peft==0.14.0
- tokenizers==0.21.0
- torch==2.5.0
- torchaudio==2.5.0
- torchvision==0.20.0
- transformers @ file:///xxxxxxxx/20241220_modernBERT_transformers/transformers.zip#sha256=07162208eb951e2019e3c3abd116e8e672deff5b72fa1aa7ccfff94da62ccd4f
Here's the nvidia-smi output from within the compute
- CUDA Version: 12.2
- Driver Version: 535.161.07
- NVIDIA-SMI 535.161.07
Can anyone help point me to the right direction on how to fix the error?