Quantization of the original Molmo-7B-D-0924 model using bitsandbytes.

This model differs from the one located here in that it includes modified source code to reduce dependencies to achieve the same results and works out of the box.

NOTE:

The example script below requires an Nvidia GPU and that you pip install the CUDA libraries into your virtual environment. This is NOT NECESSARY if you plan to install CUDA on a systemwide basis (as most people do). If you install CUDA systemwide, simply remove the set_cuda_paths function from the example script, but make sure that you've installed a proper version of CUDA and a compatible version of the Pytorch libraries.

COMPATIBLE CUDA AND PYTORCH 2.2.2 VERSIONS

Pytorch is only tested with specific versions of CUDA. When using pytorch 2.2.2, the following CUDA versions are required:

pip install nvidia-cublas-cu12==12.1.3.1
pip install nvidia-cuda-runtime-cu12==12.1.105
pip install nvidia-cuda-nvrtc-cu12==12.1.105
pip install nvidia-cufft-cu12==11.0.2.54
pip install nvidia-cudnn-cu12==8.9.2.26
Then install torch==2.2.2, torchvision==0.17, and torchaudio==2.2.2 by visiting each of these three links and creating a pip install command based on the link for your Python version and platform.

For example, for Windows using Python 3.11 you would use the following:

pip install https://download.pytorch.org/whl/cu121/torch-2.2.2%2Bcu121-cp311-cp311-win_amd64.whl#sha256=efbcfdd4399197d06b32f7c0e1711c615188cdd65427b933648c7478fb880b3f

pip install https://download.pytorch.org/whl/cu121/torchvision-0.17.2%2Bcu121-cp311-cp311-win_amd64.whl#sha256=10ad542aab6b47dbe73c441381986d50a7ed5021cbe01d593a14477ec1f067a0

pip install https://download.pytorch.org/whl/cu121/torchaudio-2.2.2%2Bcu121-cp311-cp311-win_amd64.whl#sha256=c7dee68cd3d2b889bab71d4a0c345bdc3ea2fe79a62b921a6b49292c605b6071

COMPATIBLE CUDA AND PYTORCH 2.5.1 VERSIONS

Pytorch is only tested with specific versions of CUDA. When using pytorch 2.5.1, the following CUDA versions are required:

pip install nvidia-cublas-cu12==12.4.5.8
pip install nvidia-cuda-runtime-cu12==12.4.127
pip install nvidia-cuda-nvrtc-cu12==12.4.127
pip install nvidia-cufft-cu12==11.2.1.3
pip install nvidia-cudnn-cu12==9.1.0.70
Then install torch==2.5.1, torchvision==0.20.1, and torchaudio==2.5.1 by visiting each of these three links and creating a pip install command based on the link for your Python version and platform.

For example, for Windows using Python 3.11 you would use the following:

pip install https://download.pytorch.org/whl/cu124/torch-2.5.1%2Bcu124-cp311-cp311-win_amd64.whl#sha256=6c8a7003ef1327479ede284b6e5ab3527d3900c2b2d401af15bcc50f2245a59f

pip install https://download.pytorch.org/whl/cu124/torchvision-0.20.1%2Bcu124-cp311-cp311-win_amd64.whl#sha256=15796b453a99ed0f0cbc249d129685ddc88157310135fb3addaf738a15db5306

pip install https://download.pytorch.org/whl/cu124/torchaudio-2.5.1%2Bcu124-cp311-cp311-win_amd64.whl#sha256=b3d75f4e6efc5412fe78c7f2787ee4f39cea1317652e1a47785879cde109f5c4

Example script (process single image):

import sys
import os
from pathlib import Path

def set_cuda_paths():
   venv_base = Path(sys.executable).parent.parent
   nvidia_base_path = venv_base / 'Lib' / 'site-packages' / 'nvidia'
   cuda_path = nvidia_base_path / 'cuda_runtime' / 'bin'
   cublas_path = nvidia_base_path / 'cublas' / 'bin'
   cudnn_path = nvidia_base_path / 'cudnn' / 'bin'
   nvrtc_path = nvidia_base_path / 'cuda_nvrtc' / 'bin'
   
   paths_to_add = [
       str(cuda_path),
       str(cublas_path),
       str(cudnn_path),
       str(nvrtc_path),
   ]
   env_vars = ['CUDA_PATH', 'PATH']
   
   for env_var in env_vars:
       current_value = os.environ.get(env_var, '')
       new_value = os.pathsep.join(paths_to_add + [current_value] if current_value else paths_to_add)
       os.environ[env_var] = new_value

set_cuda_paths()

import torch
from PIL import Image
from transformers import AutoModelForCausalLM, AutoProcessor, GenerationConfig

model_path = r"[INSERT THE PATH TO THE FOLDER HOLDING THE MODEL FILES HERE]"

class VisionModel:
   def __init__(self):
       self.model = None
       self.processor = None
       self.device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

   def initialize_model_and_processor(self):
       self.processor = AutoProcessor.from_pretrained(
           model_path,
           trust_remote_code=True,
           torch_dtype='auto',
           device_map='auto'
       )
       self.model = AutoModelForCausalLM.from_pretrained(
           model_path,
           trust_remote_code=True,
           torch_dtype='auto',
           device_map='auto'
       )
       
   def process_single_image(self, image_path):
       image = Image.open(image_path)
       if image.mode != "RGB":
           image = image.convert("RGB")
       text = "Describe this image in detail as possible but be succinct and don't repeat yourself."

       inputs = self.processor.process(images=[image], text=text)
       inputs = {k: v.to(self.device).unsqueeze(0) for k, v in inputs.items()}

       output = self.model.generate_from_batch(
           inputs,
           GenerationConfig(max_new_tokens=500, stop_strings=["<|endoftext|>"]),
           tokenizer=self.processor.tokenizer
       )

       generated_text = self.processor.tokenizer.decode(output[0, inputs['input_ids'].size(1):], skip_special_tokens=True)
       print(f"\nGenerated Text:\n{generated_text}\n")

if __name__ == "__main__":
   image_path = r"[INSERT THE PATH TO THE IMAGE YOU WANT TO PROCESS HERE]"
   vision_model = VisionModel()
   vision_model.initialize_model_and_processor()
   vision_model.process_single_image(image_path)

ctranslate2-4you
/

molmo-7B-O-bnb-4bit

NOTE:

Model tree for ctranslate2-4you/molmo-7B-O-bnb-4bit