loubnabnl HF staff commited on
Commit
2c0f3e0
·
verified ·
1 Parent(s): d914289

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -64
README.md CHANGED
@@ -29,70 +29,6 @@ To build SmolLM-Instruct, we instruction tuned the models using publicly availab
29
 
30
  This is the SmolLM-360M-Instruct.
31
 
32
- ### Generation
33
- ```bash
34
- pip install transformers
35
- ```
36
-
37
- #### Running the model on CPU/GPU/multi GPU
38
- * _Using full precision_
39
- ```python
40
- # pip install git+https://github.com/huggingface/transformers.git # TODO: merge PR to main
41
- from transformers import AutoModelForCausalLM, AutoTokenizer
42
- checkpoint = "HuggingFaceTB/SmolLM-135M"
43
- device = "cuda" # for GPU usage or "cpu" for CPU usage
44
- tokenizer = AutoTokenizer.from_pretrained(checkpoint)
45
- # for multiple GPUs install accelerate and do `model = AutoModelForCausalLM.from_pretrained(checkpoint, device_map="auto")`
46
- model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device)
47
- inputs = tokenizer.encode("def print_hello_world():", return_tensors="pt").to(device)
48
- outputs = model.generate(inputs)
49
- print(tokenizer.decode(outputs[0]))
50
- ```
51
- ```bash
52
- >>> print(f"Memory footprint: {model.get_memory_footprint() / 1e6:.2f} MB")
53
- Memory footprint: 12624.81 MB
54
- ```
55
- * _Using `torch.bfloat16`_
56
- ```python
57
- # pip install accelerate
58
- import torch
59
- from transformers import AutoTokenizer, AutoModelForCausalLM
60
- checkpoint = "HuggingFaceTB/SmolLM-135M"
61
- tokenizer = AutoTokenizer.from_pretrained(checkpoint)
62
- # for fp16 use `torch_dtype=torch.float16` instead
63
- model = AutoModelForCausalLM.from_pretrained(checkpoint, device_map="auto", torch_dtype=torch.bfloat16)
64
- inputs = tokenizer.encode("def print_hello_world():", return_tensors="pt").to("cuda")
65
- outputs = model.generate(inputs)
66
- print(tokenizer.decode(outputs[0]))
67
- ```
68
- ```bash
69
- >>> print(f"Memory footprint: {model.get_memory_footprint() / 1e6:.2f} MB")
70
- Memory footprint: 269.03 MB
71
- ```
72
-
73
- #### Quantized Versions through `bitsandbytes`
74
- * _Using 8-bit precision (int8)_
75
-
76
- ```python
77
- # pip install bitsandbytes accelerate
78
- from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
79
- # to use 4bit use `load_in_4bit=True` instead
80
- quantization_config = BitsAndBytesConfig(load_in_8bit=True)
81
- checkpoint = "HuggingFaceTB/SmolLM-135M"
82
- tokenizer = AutoTokenizer.from_pretrained(checkpoint)
83
- model = AutoModelForCausalLM.from_pretrained(checkpoint, quantization_config=quantization_config)
84
- inputs = tokenizer.encode("def print_hello_world():", return_tensors="pt").to("cuda")
85
- outputs = model.generate(inputs)
86
- print(tokenizer.decode(outputs[0]))
87
- ```
88
- ```bash
89
- >>> print(f"Memory footprint: {model.get_memory_footprint() / 1e6:.2f} MB")
90
- # load_in_8bit
91
- Memory footprint: 162.87 MB
92
- # load_in_4bit
93
- >>> print(f"Memory footprint: {model.get_memory_footprint() / 1e6:.2f} MB")
94
- Memory footprint: 109.78 MB
95
- ```
96
 
97
  # Limitations
98
 
 
29
 
30
  This is the SmolLM-360M-Instruct.
31
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
32
 
33
  # Limitations
34