Akbartus commited on
Commit
9c2bf4f
Β·
verified Β·
1 Parent(s): fa1be75

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +54 -1
README.md CHANGED
@@ -7,5 +7,58 @@ sdk: docker
7
  pinned: false
8
  short_description: Api endpoint for SMOL VLM 256M
9
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
 
11
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
7
  pinned: false
8
  short_description: Api endpoint for SMOL VLM 256M
9
  ---
10
+ # 🧠 SmolVLM-256M: Vision + Language Inference API
11
+
12
+ This Space demonstrates how to deploy and serve the **SmolVLM-256M-Instruct** multimodal language model using a Docker-based backend. The API provides OpenAI-style `chat/completions` endpoints for image + text understanding β€” similar to how ChatGPT Vision works.
13
+ Example frontend app could be found here: https://text-rec-api.glitch.me/
14
+
15
+ ## πŸš€ Docker Setup
16
+
17
+ This Space uses a custom Dockerfile that downloads and launches the SmolVLM model with vision support using [llama.cpp](https://github.com/ggerganov/llama.cpp).
18
+
19
+ ### Dockerfile
20
+
21
+ ```Dockerfile
22
+ FROM ghcr.io/ggml-org/llama.cpp:full
23
+
24
+ # Install wget
25
+ RUN apt update && apt install wget -y
26
+
27
+ # Download the GGUF model file
28
+ RUN wget "https://huggingface.co/ggml-org/SmolVLM-256M-Instruct-GGUF/resolve/main/SmolVLM-256M-Instruct-Q8_0.gguf" -O /smoll.gguf
29
+
30
+ # Download the mmproj (multimodal projection) file
31
+ RUN wget "https://huggingface.co/ggml-org/SmolVLM-256M-Instruct-GGUF/resolve/main/mmproj-SmolVLM-256M-Instruct-Q8_0.gguf" -O /mmproj.gguf
32
+
33
+ # Run the server on port 7860 with moderate generation settings
34
+ CMD [ "--server", "-m", "/smoll.gguf", "--mmproj", "/mmproj.gguf", "--port", "7860", "--host", "0.0.0.0", "-n", "512", "-t", "2" ]
35
+ ```
36
+ ## 🧠 API Usage
37
+
38
+ The server exposes a `POST /v1/chat/completions` endpoint compatible with the OpenAI API format.
39
+
40
+ ### πŸ” Request Format
41
+
42
+ Send a JSON payload structured like this:
43
+
44
+ ```json
45
+ {
46
+ "model": "SmolVLM-256M-Instruct",
47
+ "messages": [
48
+ {
49
+ "role": "user",
50
+ "content": [
51
+ { "type": "text", "text": "What is in this image?" },
52
+ {
53
+ "type": "image_url",
54
+ "image_url": {
55
+ "url": "data:image/jpeg;base64,/9j/4AAQSkZJRgABAQABAAD..."
56
+ }
57
+ }
58
+ ]
59
+ }
60
+ ]
61
+ }
62
+ ```
63
+
64