Update README.md
Browse files
README.md
CHANGED
@@ -7,5 +7,58 @@ sdk: docker
|
|
7 |
pinned: false
|
8 |
short_description: Api endpoint for SMOL VLM 256M
|
9 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
10 |
|
11 |
-
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|
|
|
7 |
pinned: false
|
8 |
short_description: Api endpoint for SMOL VLM 256M
|
9 |
---
|
10 |
+
# π§ SmolVLM-256M: Vision + Language Inference API
|
11 |
+
|
12 |
+
This Space demonstrates how to deploy and serve the **SmolVLM-256M-Instruct** multimodal language model using a Docker-based backend. The API provides OpenAI-style `chat/completions` endpoints for image + text understanding β similar to how ChatGPT Vision works.
|
13 |
+
Example frontend app could be found here: https://text-rec-api.glitch.me/
|
14 |
+
|
15 |
+
## π Docker Setup
|
16 |
+
|
17 |
+
This Space uses a custom Dockerfile that downloads and launches the SmolVLM model with vision support using [llama.cpp](https://github.com/ggerganov/llama.cpp).
|
18 |
+
|
19 |
+
### Dockerfile
|
20 |
+
|
21 |
+
```Dockerfile
|
22 |
+
FROM ghcr.io/ggml-org/llama.cpp:full
|
23 |
+
|
24 |
+
# Install wget
|
25 |
+
RUN apt update && apt install wget -y
|
26 |
+
|
27 |
+
# Download the GGUF model file
|
28 |
+
RUN wget "https://huggingface.co/ggml-org/SmolVLM-256M-Instruct-GGUF/resolve/main/SmolVLM-256M-Instruct-Q8_0.gguf" -O /smoll.gguf
|
29 |
+
|
30 |
+
# Download the mmproj (multimodal projection) file
|
31 |
+
RUN wget "https://huggingface.co/ggml-org/SmolVLM-256M-Instruct-GGUF/resolve/main/mmproj-SmolVLM-256M-Instruct-Q8_0.gguf" -O /mmproj.gguf
|
32 |
+
|
33 |
+
# Run the server on port 7860 with moderate generation settings
|
34 |
+
CMD [ "--server", "-m", "/smoll.gguf", "--mmproj", "/mmproj.gguf", "--port", "7860", "--host", "0.0.0.0", "-n", "512", "-t", "2" ]
|
35 |
+
```
|
36 |
+
## π§ API Usage
|
37 |
+
|
38 |
+
The server exposes a `POST /v1/chat/completions` endpoint compatible with the OpenAI API format.
|
39 |
+
|
40 |
+
### π Request Format
|
41 |
+
|
42 |
+
Send a JSON payload structured like this:
|
43 |
+
|
44 |
+
```json
|
45 |
+
{
|
46 |
+
"model": "SmolVLM-256M-Instruct",
|
47 |
+
"messages": [
|
48 |
+
{
|
49 |
+
"role": "user",
|
50 |
+
"content": [
|
51 |
+
{ "type": "text", "text": "What is in this image?" },
|
52 |
+
{
|
53 |
+
"type": "image_url",
|
54 |
+
"image_url": {
|
55 |
+
"url": "data:image/jpeg;base64,/9j/4AAQSkZJRgABAQABAAD..."
|
56 |
+
}
|
57 |
+
}
|
58 |
+
]
|
59 |
+
}
|
60 |
+
]
|
61 |
+
}
|
62 |
+
```
|
63 |
+
|
64 |
|
|