facebook
/

incoder-6B

Text Generation

Inference Endpoints

Model card Files Files and versions Community

dpfried commited on Apr 16, 2022

Commit

24f56a4

•

1 Parent(s): 2955601

Update README.md

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -35,13 +35,13 @@ pip install git+https://github.com/huggingface/transformers
 See [https://github.com/dpfried/incoder](https://github.com/dpfried/incoder) for example code.
-This 6B model comes in two versions: with weights in full-precision (float32) (branch `main`) and weights in half-precision (float16) (branch `float16`). The versions can be loaded as follows:
-- Full-precision (float32): This should be used if you are fine-tuning the model (note: this will take a lot of GPU memory, probably multiple GPUs, and we have not tried training the model in `transformers` --- it was trained in Fairseq)
 `model = AutoModelForCausalLM.from_pretrained("facebook/incoder-6B")`
-- Half-precision (float16): This can be used if you are only doing inference (i.e. generating from the model). It will use less GPU memory, and less RAM when loading the model. With this version it should be able to perform inference on a 16 GB GPU (with a batch size of 1, to sequence lengths of at least 256).
 `model = AutoModelForCausalLM.from_pretrained("facebook/incoder-6B", revision="float16", torch_dtype=torch.float16, low_cpu_mem_usage=True)`

 See [https://github.com/dpfried/incoder](https://github.com/dpfried/incoder) for example code.
+This 6B model comes in two versions: with weights in full-precision (float32, stored on branch `main`) and weights in half-precision (float16, stored on branch `float16`). The versions can be loaded as follows:
+*Full-precision* (float32): This should be used if you are fine-tuning the model (note: this will take a lot of GPU memory, probably multiple GPUs, and we have not tried training the model in `transformers` --- it was trained in Fairseq). Load with:
 `model = AutoModelForCausalLM.from_pretrained("facebook/incoder-6B")`
+*Half-precision* (float16): This can be used if you are only doing inference (i.e. generating from the model). It will use less GPU memory, and less RAM when loading the model. With this version it should be able to perform inference on a 16 GB GPU (with a batch size of 1, to sequence lengths of at least 256). Load with:
 `model = AutoModelForCausalLM.from_pretrained("facebook/incoder-6B", revision="float16", torch_dtype=torch.float16, low_cpu_mem_usage=True)`