JosefAlbers
/

Phi-3-vision-128k-instruct-mlx

Model card Files Files and versions Community

JosefAlbers commited on Jun 16

Commit

00619d6

•

1 Parent(s): d7001e1

Update README.md

Files changed (1) hide show

README.md +18 -1

README.md CHANGED Viewed

@@ -1,3 +1,14 @@
 # Phi-3-Vision VLM Model for Apple MLX: An All-in-One Port
 This project brings the powerful phi-3-vision VLM to Apple's MLX framework, offering a comprehensive solution for various text and image processing tasks. With a focus on simplicity and efficiency, this implementation offers a straightforward and minimalistic integration of the VLM model. It seamlessly incorporates essential functionalities such as generating quantized model weights, optimizing KV cache quantization during inference, facilitating LoRA/QLoRA training, and conducting model benchmarking, all encapsulated within a single file for convenient access and usage.
@@ -28,7 +39,13 @@ chatui()
 ![Alt text](https://raw.githubusercontent.com/JosefAlbers/Phi-3-Vision-MLX/main/assets/chatui_2.png)
-### **Image Captioning**
 ```python
 # from phi_3_vision_mlx import chat

+---
+license: mit
+language:
+- en
+library_name: mlx
+tags:
+- vqa
+- vlm
+- llm
+- phi
+---
 # Phi-3-Vision VLM Model for Apple MLX: An All-in-One Port
 This project brings the powerful phi-3-vision VLM to Apple's MLX framework, offering a comprehensive solution for various text and image processing tasks. With a focus on simplicity and efficiency, this implementation offers a straightforward and minimalistic integration of the VLM model. It seamlessly incorporates essential functionalities such as generating quantized model weights, optimizing KV cache quantization during inference, facilitating LoRA/QLoRA training, and conducting model benchmarking, all encapsulated within a single file for convenient access and usage.
 ![Alt text](https://raw.githubusercontent.com/JosefAlbers/Phi-3-Vision-MLX/main/assets/chatui_2.png)
+### **Visual Question Answering (VQA)**
+Simply drag and drop screenshot images from clipboard into the chatui textbox or upload images files for VQA.
+![Alt text](https://raw.githubusercontent.com/JosefAlbers/Phi-3-Vision-MLX/main/assets/chatui_caption.png)
+Or,
 ```python
 # from phi_3_vision_mlx import chat