JosefAlbers commited on
Commit
00619d6
1 Parent(s): d7001e1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +18 -1
README.md CHANGED
@@ -1,3 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
1
  # Phi-3-Vision VLM Model for Apple MLX: An All-in-One Port
2
 
3
  This project brings the powerful phi-3-vision VLM to Apple's MLX framework, offering a comprehensive solution for various text and image processing tasks. With a focus on simplicity and efficiency, this implementation offers a straightforward and minimalistic integration of the VLM model. It seamlessly incorporates essential functionalities such as generating quantized model weights, optimizing KV cache quantization during inference, facilitating LoRA/QLoRA training, and conducting model benchmarking, all encapsulated within a single file for convenient access and usage.
@@ -28,7 +39,13 @@ chatui()
28
 
29
  ![Alt text](https://raw.githubusercontent.com/JosefAlbers/Phi-3-Vision-MLX/main/assets/chatui_2.png)
30
 
31
- ### **Image Captioning**
 
 
 
 
 
 
32
 
33
  ```python
34
  # from phi_3_vision_mlx import chat
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - en
5
+ library_name: mlx
6
+ tags:
7
+ - vqa
8
+ - vlm
9
+ - llm
10
+ - phi
11
+ ---
12
  # Phi-3-Vision VLM Model for Apple MLX: An All-in-One Port
13
 
14
  This project brings the powerful phi-3-vision VLM to Apple's MLX framework, offering a comprehensive solution for various text and image processing tasks. With a focus on simplicity and efficiency, this implementation offers a straightforward and minimalistic integration of the VLM model. It seamlessly incorporates essential functionalities such as generating quantized model weights, optimizing KV cache quantization during inference, facilitating LoRA/QLoRA training, and conducting model benchmarking, all encapsulated within a single file for convenient access and usage.
 
39
 
40
  ![Alt text](https://raw.githubusercontent.com/JosefAlbers/Phi-3-Vision-MLX/main/assets/chatui_2.png)
41
 
42
+ ### **Visual Question Answering (VQA)**
43
+
44
+ Simply drag and drop screenshot images from clipboard into the chatui textbox or upload images files for VQA.
45
+
46
+ ![Alt text](https://raw.githubusercontent.com/JosefAlbers/Phi-3-Vision-MLX/main/assets/chatui_caption.png)
47
+
48
+ Or,
49
 
50
  ```python
51
  # from phi_3_vision_mlx import chat