dpfried commited on
Commit
2955601
1 Parent(s): a98b072

update with description of half precision model

Browse files
Files changed (1) hide show
  1. README.md +10 -0
README.md CHANGED
@@ -35,6 +35,16 @@ pip install git+https://github.com/huggingface/transformers
35
 
36
  See [https://github.com/dpfried/incoder](https://github.com/dpfried/incoder) for example code.
37
 
 
 
 
 
 
 
 
 
 
 
38
  ## Credits
39
 
40
  The model was developed by Daniel Fried, Armen Aghajanyan, Jessy Lin, Sida Wang, Eric Wallace, Freda Shi, Ruiqi Zhong, Wen-tau Yih, Luke Zettlemoyer and Mike Lewis.
 
35
 
36
  See [https://github.com/dpfried/incoder](https://github.com/dpfried/incoder) for example code.
37
 
38
+ This 6B model comes in two versions: with weights in full-precision (float32) (branch `main`) and weights in half-precision (float16) (branch `float16`). The versions can be loaded as follows:
39
+
40
+ - Full-precision (float32): This should be used if you are fine-tuning the model (note: this will take a lot of GPU memory, probably multiple GPUs, and we have not tried training the model in `transformers` --- it was trained in Fairseq)
41
+
42
+ `model = AutoModelForCausalLM.from_pretrained("facebook/incoder-6B")`
43
+
44
+ - Half-precision (float16): This can be used if you are only doing inference (i.e. generating from the model). It will use less GPU memory, and less RAM when loading the model. With this version it should be able to perform inference on a 16 GB GPU (with a batch size of 1, to sequence lengths of at least 256).
45
+
46
+ `model = AutoModelForCausalLM.from_pretrained("facebook/incoder-6B", revision="float16", torch_dtype=torch.float16, low_cpu_mem_usage=True)`
47
+
48
  ## Credits
49
 
50
  The model was developed by Daniel Fried, Armen Aghajanyan, Jessy Lin, Sida Wang, Eric Wallace, Freda Shi, Ruiqi Zhong, Wen-tau Yih, Luke Zettlemoyer and Mike Lewis.