ctheodoris's picture madhavanvenkatesh's picture
CUDA kernels incompatible with standard PyTorch device movement with 4bit/8bit, necessitating device-specific handling (#416)
b6d28c3 verified