ctranslate2-4you
/

Llama-2-7b-chat-hf-ct2-int8

Transformers

ctranslate2

Inference Endpoints

Model card Files Files and versions Community

ctranslate2-4you commited on Aug 22, 2023

Commit

1d7499b

1 Parent(s): c55e703

Update README.md

Browse files

Files changed (1) hide show

README.md +4 -5

README.md CHANGED Viewed

@@ -9,10 +9,9 @@ How to run with instructions: https://github.com/BBC-Esq
 Learn more about the amazing "ctranslate2" technology:"
 - https://github.com/OpenNMT/CTranslate2
-- https://pypi.org/project/ctranslate2/
 - https://opennmt.net/CTranslate2/index.html
-COMPARISONS below:
 - The VRAM numbers includes other programs running and a second monitor so people can get a realistic idea of how much VRAM/RAM is needed.
 - THE FASTER AND HIGHER-QUALITY INT8 "ctranslate2" 7b model uses the same amount of VRAM as the far-inferior 3-bit "k_m" GGML version!!
@@ -56,9 +55,9 @@ Information:
 | `int16`         | 51.37%                     | 1.0        | Same as `int8` but with a larger range. |
 | `float16`       | 50.00%                     | 5.3  (e.g. Nvidia 10 Series and Higher)      | Suitable for scientific computations; balance between precision and memory. |
 | `bfloat16`      | 50.00%                     | 8.0  (e.g. Nvidia 30 Series and Higher)      | Often used in neural network training; larger exponent range than `float16`. |
-| `int8_float32`  | 27.47%                     | N/A             | Combines low precision integer with high precision float. Useful for mixed data. |
-| `int8_float16`  | 26.10%                     | N/A             | Combines low precision integer with medium precision float. Saves memory. |
-| `int8_bfloat16` | 26.10%                     | N/A             | Combines low precision integer with reduced precision float. Efficient for neural nets. |
 | `int8`          | 25%                        | 1.0        | Lower precision, suitable for whole numbers within a specific range. Often used where memory is crucial. |
 | Web Link                                                                      | Description                                                                                                         |

 Learn more about the amazing "ctranslate2" technology:"
 - https://github.com/OpenNMT/CTranslate2
 - https://opennmt.net/CTranslate2/index.html
+COMPARED to GGML:
 - The VRAM numbers includes other programs running and a second monitor so people can get a realistic idea of how much VRAM/RAM is needed.
 - THE FASTER AND HIGHER-QUALITY INT8 "ctranslate2" 7b model uses the same amount of VRAM as the far-inferior 3-bit "k_m" GGML version!!
 | `int16`         | 51.37%                     | 1.0        | Same as `int8` but with a larger range. |
 | `float16`       | 50.00%                     | 5.3  (e.g. Nvidia 10 Series and Higher)      | Suitable for scientific computations; balance between precision and memory. |
 | `bfloat16`      | 50.00%                     | 8.0  (e.g. Nvidia 30 Series and Higher)      | Often used in neural network training; larger exponent range than `float16`. |
+| `int8_float32`  | 27.47%                     | test manually (see below)             | Combines low precision integer with high precision float. Useful for mixed data. |
+| `int8_float16`  | 26.10%                     | test manually (see below)             | Combines low precision integer with medium precision float. Saves memory. |
+| `int8_bfloat16` | 26.10%                     | test manually (see below)             | Combines low precision integer with reduced precision float. Efficient for neural nets. |
 | `int8`          | 25%                        | 1.0        | Lower precision, suitable for whole numbers within a specific range. Often used where memory is crucial. |
 | Web Link                                                                      | Description                                                                                                         |