adamo1139 commited on
Commit
91e1318
1 Parent(s): e91b964

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +15 -2
README.md CHANGED
@@ -15,7 +15,7 @@ tags:
15
  ---
16
 
17
 
18
- <img src="https://cdn-uploads.huggingface.co/production/uploads/630fdd96a119d49bc1e770d5/7NJFmljgycOJs7mcO2Cag.png" width="500" style="float:right">
19
 
20
  ## Model Description
21
 
@@ -34,6 +34,15 @@ I am attempting to learn about finetuning Qwen 2 VL 7B and this was just a resul
34
  I ran Hermes 3 8B in Aphrodite-Engine locally and used a Python script to go through the LLaVA 150K Instruct dataset and for each sample, send a request to the model to modify the JSON sample so that output is more energetic. I used 6-shot prompt with bad samples coming from a generic LLM and good samples coming from [FPHam/Llama-3-8B-Sydney](https://huggingface.co/FPHam/Llama-3-8B-Sydney).
35
  After running through about half of the dataset I noticed an error in one of my examples and upon fixing it and modifying the prompt a bit I noticed that the generation quality deteriorated and 30% of responses I was getting back didn't pass JSON validation. I settled on using the ~60000 samples that were already processed fine. I cleaned up the dataset to fix various errors in it like presence of non UTF8 characters.
36
 
 
 
 
 
 
 
 
 
 
37
  ## Technical details
38
 
39
  Model was trained in LLaMa-Factory on a system with RTX 3090 Ti with unsloth on context length of 2000 with LoRA rank 32, alpha 32 and LoRa+ ratio of 4. Training took around 11 hours and bitsandbytes quantization was not utilized.
@@ -89,4 +98,8 @@ I am comparing Qwen 2 VL 7B Sydney with Qwen/Qwen2-VL-7B-Instruct
89
  <img src="https://cdn-uploads.huggingface.co/production/uploads/630fdd96a119d49bc1e770d5/Tfw7rL7NX9OwVXH-Vy5IB.png" style="width: 100%; height: auto;" alt="Image 2" />
90
  <img src="https://cdn-uploads.huggingface.co/production/uploads/630fdd96a119d49bc1e770d5/JqbCDhfYSqddNUaR0VgmW.png" style="width: 100%; height: auto;" alt="Image 3" />
91
  <img src="https://cdn-uploads.huggingface.co/production/uploads/630fdd96a119d49bc1e770d5/Uwp2q7QTjz7nFRcVU3AVG.png" style="width: 100%; height: auto;" alt="Image 4" />
92
- </div>
 
 
 
 
 
15
  ---
16
 
17
 
18
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/630fdd96a119d49bc1e770d5/7NJFmljgycOJs7mcO2Cag.png" width="200" style="float:center">
19
 
20
  ## Model Description
21
 
 
34
  I ran Hermes 3 8B in Aphrodite-Engine locally and used a Python script to go through the LLaVA 150K Instruct dataset and for each sample, send a request to the model to modify the JSON sample so that output is more energetic. I used 6-shot prompt with bad samples coming from a generic LLM and good samples coming from [FPHam/Llama-3-8B-Sydney](https://huggingface.co/FPHam/Llama-3-8B-Sydney).
35
  After running through about half of the dataset I noticed an error in one of my examples and upon fixing it and modifying the prompt a bit I noticed that the generation quality deteriorated and 30% of responses I was getting back didn't pass JSON validation. I settled on using the ~60000 samples that were already processed fine. I cleaned up the dataset to fix various errors in it like presence of non UTF8 characters.
36
 
37
+ Script used for creating the dataset is [here](https://huggingface.co/datasets/adamo1139/misc/blob/main/sydney/sydney_llava_1.py).
38
+ ## Inference
39
+
40
+ I uploaded the script for inference [here](https://huggingface.co/datasets/adamo1139/misc/blob/main/sydney/run_qwen_vl.py)
41
+ This script is doing inference on this model and also normal Qwen 2 VL Instruct checkpoint.
42
+ Script is based on the simple Qwen 2 VL Gradio inference project published [here](https://old.reddit.com/r/LocalLLaMA/comments/1fv892w/simple_gradio_ui_to_run_qwen_2_vl/)
43
+ Qwen2 VL doesn't quant well, so you will need VRAM to load in the 16-bit checkpoint. I am using 24GB GPU and still, I can't load in any image or video I want since it will OOM.
44
+ Inference should work fine on both Windows and Linux. By default script uses Flash Attention 2, so if you don't want to use it, run it with flag `--flash-attn2 False`.
45
+
46
  ## Technical details
47
 
48
  Model was trained in LLaMa-Factory on a system with RTX 3090 Ti with unsloth on context length of 2000 with LoRA rank 32, alpha 32 and LoRa+ ratio of 4. Training took around 11 hours and bitsandbytes quantization was not utilized.
 
98
  <img src="https://cdn-uploads.huggingface.co/production/uploads/630fdd96a119d49bc1e770d5/Tfw7rL7NX9OwVXH-Vy5IB.png" style="width: 100%; height: auto;" alt="Image 2" />
99
  <img src="https://cdn-uploads.huggingface.co/production/uploads/630fdd96a119d49bc1e770d5/JqbCDhfYSqddNUaR0VgmW.png" style="width: 100%; height: auto;" alt="Image 3" />
100
  <img src="https://cdn-uploads.huggingface.co/production/uploads/630fdd96a119d49bc1e770d5/Uwp2q7QTjz7nFRcVU3AVG.png" style="width: 100%; height: auto;" alt="Image 4" />
101
+ </div>
102
+
103
+ ## Prompt template
104
+
105
+ ChatML with system prompt "You are Sydney.". The rest of the prompt template is the same as what Qwen2 VL Instruct uses.