Commit
·
1d7499b
1
Parent(s):
c55e703
Update README.md
Browse files
README.md
CHANGED
@@ -9,10 +9,9 @@ How to run with instructions: https://github.com/BBC-Esq
|
|
9 |
|
10 |
Learn more about the amazing "ctranslate2" technology:"
|
11 |
- https://github.com/OpenNMT/CTranslate2
|
12 |
-
- https://pypi.org/project/ctranslate2/
|
13 |
- https://opennmt.net/CTranslate2/index.html
|
14 |
|
15 |
-
|
16 |
- The VRAM numbers includes other programs running and a second monitor so people can get a realistic idea of how much VRAM/RAM is needed.
|
17 |
- THE FASTER AND HIGHER-QUALITY INT8 "ctranslate2" 7b model uses the same amount of VRAM as the far-inferior 3-bit "k_m" GGML version!!
|
18 |
|
@@ -56,9 +55,9 @@ Information:
|
|
56 |
| `int16` | 51.37% | 1.0 | Same as `int8` but with a larger range. |
|
57 |
| `float16` | 50.00% | 5.3 (e.g. Nvidia 10 Series and Higher) | Suitable for scientific computations; balance between precision and memory. |
|
58 |
| `bfloat16` | 50.00% | 8.0 (e.g. Nvidia 30 Series and Higher) | Often used in neural network training; larger exponent range than `float16`. |
|
59 |
-
| `int8_float32` | 27.47% |
|
60 |
-
| `int8_float16` | 26.10% |
|
61 |
-
| `int8_bfloat16` | 26.10% |
|
62 |
| `int8` | 25% | 1.0 | Lower precision, suitable for whole numbers within a specific range. Often used where memory is crucial. |
|
63 |
|
64 |
| Web Link | Description |
|
|
|
9 |
|
10 |
Learn more about the amazing "ctranslate2" technology:"
|
11 |
- https://github.com/OpenNMT/CTranslate2
|
|
|
12 |
- https://opennmt.net/CTranslate2/index.html
|
13 |
|
14 |
+
COMPARED to GGML:
|
15 |
- The VRAM numbers includes other programs running and a second monitor so people can get a realistic idea of how much VRAM/RAM is needed.
|
16 |
- THE FASTER AND HIGHER-QUALITY INT8 "ctranslate2" 7b model uses the same amount of VRAM as the far-inferior 3-bit "k_m" GGML version!!
|
17 |
|
|
|
55 |
| `int16` | 51.37% | 1.0 | Same as `int8` but with a larger range. |
|
56 |
| `float16` | 50.00% | 5.3 (e.g. Nvidia 10 Series and Higher) | Suitable for scientific computations; balance between precision and memory. |
|
57 |
| `bfloat16` | 50.00% | 8.0 (e.g. Nvidia 30 Series and Higher) | Often used in neural network training; larger exponent range than `float16`. |
|
58 |
+
| `int8_float32` | 27.47% | test manually (see below) | Combines low precision integer with high precision float. Useful for mixed data. |
|
59 |
+
| `int8_float16` | 26.10% | test manually (see below) | Combines low precision integer with medium precision float. Saves memory. |
|
60 |
+
| `int8_bfloat16` | 26.10% | test manually (see below) | Combines low precision integer with reduced precision float. Efficient for neural nets. |
|
61 |
| `int8` | 25% | 1.0 | Lower precision, suitable for whole numbers within a specific range. Often used where memory is crucial. |
|
62 |
|
63 |
| Web Link | Description |
|