JosefAlbers
commited on
Commit
•
0d1c555
1
Parent(s):
00619d6
Update README.md
Browse files
README.md
CHANGED
@@ -9,7 +9,8 @@ tags:
|
|
9 |
- llm
|
10 |
- phi
|
11 |
---
|
12 |
-
|
|
|
13 |
|
14 |
This project brings the powerful phi-3-vision VLM to Apple's MLX framework, offering a comprehensive solution for various text and image processing tasks. With a focus on simplicity and efficiency, this implementation offers a straightforward and minimalistic integration of the VLM model. It seamlessly incorporates essential functionalities such as generating quantized model weights, optimizing KV cache quantization during inference, facilitating LoRA/QLoRA training, and conducting model benchmarking, all encapsulated within a single file for convenient access and usage.
|
15 |
|
@@ -27,6 +28,28 @@ This project brings the powerful phi-3-vision VLM to Apple's MLX framework, offe
|
|
27 |
|
28 |
## Quick Start
|
29 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
30 |
### **VLM Agent** (WIP)
|
31 |
|
32 |
VLM's understanding of both text and visuals enables interactive generation and modification of plots/images, opening up new possibilities for GUI development and data visualization.
|
@@ -189,7 +212,7 @@ Generation: 8.56 tokens-per-sec (100 tokens / 11.6 sec)
|
|
189 |
### **LoRA Testing** (WIP)
|
190 |
|
191 |
```python
|
192 |
-
# from phi_3_vision_mlx import
|
193 |
|
194 |
test_lora(dataset_path="JosefAlbers/akemiH_MedQA_Reason"):
|
195 |
```
|
@@ -321,4 +344,4 @@ This project is licensed under the [MIT License](LICENSE).
|
|
321 |
|
322 |
## Citation
|
323 |
|
324 |
-
<a href="https://zenodo.org/doi/10.5281/zenodo.11403221"><img src="https://zenodo.org/badge/806709541.svg" alt="DOI"></a>
|
|
|
9 |
- llm
|
10 |
- phi
|
11 |
---
|
12 |
+
|
13 |
+
# Phi-3-Vision for Apple MLX
|
14 |
|
15 |
This project brings the powerful phi-3-vision VLM to Apple's MLX framework, offering a comprehensive solution for various text and image processing tasks. With a focus on simplicity and efficiency, this implementation offers a straightforward and minimalistic integration of the VLM model. It seamlessly incorporates essential functionalities such as generating quantized model weights, optimizing KV cache quantization during inference, facilitating LoRA/QLoRA training, and conducting model benchmarking, all encapsulated within a single file for convenient access and usage.
|
16 |
|
|
|
28 |
|
29 |
## Quick Start
|
30 |
|
31 |
+
**1. Install Phi-3 Vision MLX:**
|
32 |
+
|
33 |
+
```bash
|
34 |
+
git clone https://github.com/JosefAlbers/Phi-3-Vision-MLX.git
|
35 |
+
```
|
36 |
+
|
37 |
+
**2. Launch Phi-3 Vision MLX:**
|
38 |
+
|
39 |
+
```bash
|
40 |
+
phi3v
|
41 |
+
```
|
42 |
+
|
43 |
+
Or,
|
44 |
+
|
45 |
+
```python
|
46 |
+
from phi_3_vision_mlx import chatui
|
47 |
+
|
48 |
+
chatui()
|
49 |
+
```
|
50 |
+
|
51 |
+
## Usage
|
52 |
+
|
53 |
### **VLM Agent** (WIP)
|
54 |
|
55 |
VLM's understanding of both text and visuals enables interactive generation and modification of plots/images, opening up new possibilities for GUI development and data visualization.
|
|
|
212 |
### **LoRA Testing** (WIP)
|
213 |
|
214 |
```python
|
215 |
+
# from phi_3_vision_mlx import test_lora
|
216 |
|
217 |
test_lora(dataset_path="JosefAlbers/akemiH_MedQA_Reason"):
|
218 |
```
|
|
|
344 |
|
345 |
## Citation
|
346 |
|
347 |
+
<a href="https://zenodo.org/doi/10.5281/zenodo.11403221"><img src="https://zenodo.org/badge/806709541.svg" alt="DOI"></a>
|