AttributeError: /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1: undefined symbol: nvmlDeviceGetComputeRunningProcesses_v3
getting this error while deploying whisper using SageMaker endpoint, not able to load model in GPU memory
@sanchit-gandhi , have you seen this issue before?
Hey @MLLife - could you provide a reproducible code snippet for this error?
Update: I was able to fix the issue by adding these to the requirements.txt (in order)
nvgpu==0.9.0
pynvml==11.4.1
It seems the issue is happening because of mismatch of nvidia drivers version in sagemaker notebooks (v515) and end-point deployment base container image (v475)
Has this been resolved? I'm having the same issue with a different model
@kofi , within in your code initialisation; try to check; nvgpu & pynvml versions
this issue is happening because of mismatch of nvidia drivers versions; try to use the model in simple sagemaker notebook and then freeze the versions in requirements with the same in model build as well
hope this helps