Moondream is a small vision language model designed to run efficiently on edge devices.

Website / Demo / GitHub

This repository contains the latest (2025-03-27) release of Moondream, as well as historical releases. The model is updated frequently, so we recommend specifying a revision as shown below if you're using it in a production application.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from PIL import Image

model = AutoModelForCausalLM.from_pretrained(
    "vikhyatk/moondream2",
    revision="2025-03-27",
    trust_remote_code=True,
    # Uncomment to run on GPU.
    # device_map={"": "cuda"}
)

# Captioning
print("Short caption:")
print(model.caption(image, length="short")["caption"])

print("\nNormal caption:")
for t in model.caption(image, length="normal", stream=True)["caption"]:
    # Streaming generation example, supported for caption() and detect()
    print(t, end="", flush=True)
print(model.caption(image, length="normal"))

# Visual Querying
print("\nVisual query: 'How many people are in the image?'")
print(model.query(image, "How many people are in the image?")["answer"])

# Object Detection
print("\nObject detection: 'face'")
objects = model.detect(image, "face")["objects"]
print(f"Found {len(objects)} face(s)")

# Pointing
print("\nPointing: 'person'")
points = model.point(image, "person")["points"]
print(f"Found {len(points)} person(s)")

Changelog

2025-03-27

  1. Added support for long-form captioning
  2. Open vocabulary image tagging
  3. Improved counting accuracy (e.g. CountBenchQA increased from 80 to 86.4)
  4. Improved text understanding (e.g. OCRBench increased from 58.3 to 61.2)
  5. Improved object detection, especially for small objects (e.g. COCO mAP up from 30.5 to 51.2)
  6. Fixed token streaming bug affecting multi-byte unicode characters
  7. gpt-fast style compile() now supported in HF Transformers implementation
Downloads last month
168,707
Safetensors
Model size
1.93B params
Tensor type
FP16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for vikhyatk/moondream2

Finetunes
3 models
Quantizations
2 models

Spaces using vikhyatk/moondream2 75