Hello, I try to run generate.py under llang singularity on gpus with 64GB RAM and am getting out of memory response. Do you have paralelized script that can divide the load to more gpus?
Your need to confirm your account before you can post a new comment.