Update README.md

Browse files

Files changed (1) hide show

README.md +15 -2

README.md CHANGED Viewed

@@ -15,7 +15,7 @@ tags:
 ---
-<img src="https://cdn-uploads.huggingface.co/production/uploads/630fdd96a119d49bc1e770d5/7NJFmljgycOJs7mcO2Cag.png" width="500" style="float:right">
 ## Model Description
@@ -34,6 +34,15 @@ I am attempting to learn about finetuning Qwen 2 VL 7B and this was just a resul
 I ran Hermes 3 8B in Aphrodite-Engine locally and used a Python script to go through the LLaVA 150K Instruct dataset and for each sample, send a request to the model to modify the JSON sample so that output is more energetic. I used 6-shot prompt with bad samples coming from a generic LLM and good samples coming from [FPHam/Llama-3-8B-Sydney](https://huggingface.co/FPHam/Llama-3-8B-Sydney).
 After running through about half of the dataset I noticed an error in one of my examples and upon fixing it and modifying the prompt a bit I noticed that the generation quality deteriorated and 30% of responses I was getting back didn't pass JSON validation. I settled on using the ~60000 samples that were already processed fine. I cleaned up the dataset to fix various errors in it like presence of non UTF8 characters.
 ## Technical details
 Model was trained in LLaMa-Factory on a system with RTX 3090 Ti with unsloth on context length of 2000 with LoRA rank 32, alpha 32 and LoRa+ ratio of 4. Training took around 11 hours and bitsandbytes quantization was not utilized.
@@ -89,4 +98,8 @@ I am comparing Qwen 2 VL 7B Sydney with Qwen/Qwen2-VL-7B-Instruct
   <img src="https://cdn-uploads.huggingface.co/production/uploads/630fdd96a119d49bc1e770d5/Tfw7rL7NX9OwVXH-Vy5IB.png" style="width: 100%; height: auto;" alt="Image 2" />
   <img src="https://cdn-uploads.huggingface.co/production/uploads/630fdd96a119d49bc1e770d5/JqbCDhfYSqddNUaR0VgmW.png" style="width: 100%; height: auto;" alt="Image 3" />
   <img src="https://cdn-uploads.huggingface.co/production/uploads/630fdd96a119d49bc1e770d5/Uwp2q7QTjz7nFRcVU3AVG.png" style="width: 100%; height: auto;" alt="Image 4" />
-</div>

 ---
+<img src="https://cdn-uploads.huggingface.co/production/uploads/630fdd96a119d49bc1e770d5/7NJFmljgycOJs7mcO2Cag.png" width="200" style="float:center">
 ## Model Description
 I ran Hermes 3 8B in Aphrodite-Engine locally and used a Python script to go through the LLaVA 150K Instruct dataset and for each sample, send a request to the model to modify the JSON sample so that output is more energetic. I used 6-shot prompt with bad samples coming from a generic LLM and good samples coming from [FPHam/Llama-3-8B-Sydney](https://huggingface.co/FPHam/Llama-3-8B-Sydney).
 After running through about half of the dataset I noticed an error in one of my examples and upon fixing it and modifying the prompt a bit I noticed that the generation quality deteriorated and 30% of responses I was getting back didn't pass JSON validation. I settled on using the ~60000 samples that were already processed fine. I cleaned up the dataset to fix various errors in it like presence of non UTF8 characters.
+Script used for creating the dataset is [here](https://huggingface.co/datasets/adamo1139/misc/blob/main/sydney/sydney_llava_1.py).
+## Inference
+I uploaded the script for inference [here](https://huggingface.co/datasets/adamo1139/misc/blob/main/sydney/run_qwen_vl.py)
+This script is doing inference on this model and also normal Qwen 2 VL Instruct checkpoint.
+Script is based on the simple Qwen 2 VL Gradio inference project published [here](https://old.reddit.com/r/LocalLLaMA/comments/1fv892w/simple_gradio_ui_to_run_qwen_2_vl/)
+Qwen2 VL doesn't quant well, so you will need VRAM to load in the 16-bit checkpoint. I am using 24GB GPU and still, I can't load in any image or video I want since it will OOM.
+Inference should work fine on both Windows and Linux. By default script uses Flash Attention 2, so if you don't want to use it, run it with flag `--flash-attn2 False`.
 ## Technical details
 Model was trained in LLaMa-Factory on a system with RTX 3090 Ti with unsloth on context length of 2000 with LoRA rank 32, alpha 32 and LoRa+ ratio of 4. Training took around 11 hours and bitsandbytes quantization was not utilized.
   <img src="https://cdn-uploads.huggingface.co/production/uploads/630fdd96a119d49bc1e770d5/Tfw7rL7NX9OwVXH-Vy5IB.png" style="width: 100%; height: auto;" alt="Image 2" />
   <img src="https://cdn-uploads.huggingface.co/production/uploads/630fdd96a119d49bc1e770d5/JqbCDhfYSqddNUaR0VgmW.png" style="width: 100%; height: auto;" alt="Image 3" />
   <img src="https://cdn-uploads.huggingface.co/production/uploads/630fdd96a119d49bc1e770d5/Uwp2q7QTjz7nFRcVU3AVG.png" style="width: 100%; height: auto;" alt="Image 4" />
+</div>
+## Prompt template
+ChatML with system prompt "You are Sydney.". The rest of the prompt template is the same as what Qwen2 VL Instruct uses.