how is it used?
When entering the test an error appears.
Will they place one to test the operation?
Will they place one to test the operation?
The model is 314 B. I don't think yet, as this is still a raw model. They need to quantise it, as the model is around 228 GB.
If someone creates a 4-bit quantized model of the 314B it would be around ~60gb which means it will still need around that much VRAM minimum right?
Considering that you need about 30-40GB VRAM to engage in inference with a 70b model.
If someone creates a 4-bit quantized model of the 314B it would be around ~60gb which means it will still need around that much VRAM minimum right?
Considering that you need about 30-40GB VRAM to engage in inference with a 70b model.
Yes, probably. Probably even more VRAM.