Updated model card on instructions to quantize the model/
#34
by
andrewqian123
- opened
Added sample code in model card showing how to quantize the model down to 4 bits. Some things still need to be done (quantizing model.named_buffers, and the embed_tokens layer), but vram usage is now 3.6 vs 9+ GB from before. Should be a fairly simple addition that can optionally be used.