carecodeconnect commited on
Commit
d589b8b
1 Parent(s): 4a64513

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -2
README.md CHANGED
@@ -50,6 +50,14 @@ Training resulted in a model capable of generating coherent and contextually rel
50
  - Datasets: 2.18.0
51
  - Tokenizers: 0.15.2
52
 
53
- ## Axolotl Fine-Tuning Details
 
 
 
 
 
 
 
 
 
54
 
55
- The model was fine-tuned using the Axolotl toolkit, with specific emphasis on low-resource environments. Key aspects of the fine-tuning process include utilizing QLoRA for efficient learning and adapting to the guided meditation domain, employing mixed precision training for enhanced performance, and custom tokenization to fit the unique structure of meditation scripts. The entire process emphasizes resource efficiency and model effectiveness in generating serene and contextually appropriate meditation guides.
 
50
  - Datasets: 2.18.0
51
  - Tokenizers: 0.15.2
52
 
53
+ ## Quantization with llama.cpp
54
+
55
+ The model was quantized to enhance its efficiency and reduce its size, making it more suitable for deployment in various environments, including those with limited resources. The quantization process was performed using `llama.cpp`, following the steps outlined by Maxime Labonne in [Quantize Llama models with GGUF and llama.cpp](https://mlabonne.github.io/blog/posts/Quantize_Llama_2_models_using_ggml.html).
56
+
57
+ The process involved:
58
+ - Cloning the `llama.cpp` repository and setting it up with the required dependencies.
59
+ - Downloading the model to be quantized.
60
+ - Using the `llama.cpp/convert.py` script to convert the model to fp16 format, followed by quantization, significantly reducing the model's size while retaining its performance capabilities.
61
+
62
+ The quantization resulted in a compressed model with a significant reduction in size from 13813.02 MB to 4892.99 MB, enhancing its loading and inference speeds without compromising on the generation quality.
63