Updated model card on instructions to quantize the model/

#34

by andrewqian123 - opened 28 days ago

base: refs/heads/main

←

from: refs/pr/34

Discussion Files changed

+5663

-5594

andrewqian123

28 days ago

•

edited 28 days ago

Added sample code in model card showing how to quantize the model down to 4 bits. Some things still need to be done (quantizing model.named_buffers, and the embed_tokens layer), but vram usage is now 3.6 vs 9+ GB from before. Should be a fairly simple addition that can optionally be used.

Updated model card on instructions to quantize the model/70a57be2

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Cannot merge

This branch has merge conflicts in the following files:

README.md

· Sign up or log in to comment