AttributeError: /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1: undefined symbol: nvmlDeviceGetComputeRunningProcesses_v3

#2
by MLLife - opened

getting this error while deploying whisper using SageMaker endpoint, not able to load model in GPU memory

@sanchit-gandhi , have you seen this issue before?

Hey @MLLife - could you provide a reproducible code snippet for this error?

Update: I was able to fix the issue by adding these to the requirements.txt (in order)
nvgpu==0.9.0
pynvml==11.4.1
It seems the issue is happening because of mismatch of nvidia drivers version in sagemaker notebooks (v515) and end-point deployment base container image (v475)

Has this been resolved? I'm having the same issue with a different model

@kofi , within in your code initialisation; try to check; nvgpu & pynvml versions

this issue is happening because of mismatch of nvidia drivers versions; try to use the model in simple sagemaker notebook and then freeze the versions in requirements with the same in model build as well

hope this helps

Sign up or log in to comment