Updated model card on instructions to quantize the model/

#34

Added sample code in model card showing how to quantize the model down to 4 bits. Some things still need to be done (quantizing model.named_buffers, and the embed_tokens layer), but vram usage is now 3.6 vs 9+ GB from before. Should be a fairly simple addition that can optionally be used.

Cannot merge
This branch has merge conflicts in the following files:
  • README.md

Sign up or log in to comment