Mozilla/Mistral-Nemo-Instruct-2407-llamafile · My Experience with Mistral AI's Nemo Instruct Model

Computer configuration:

Manufacturer: Dell Inc.
Product Name: 0VHWTR, Version: A02
Memory: 16 GB DDR3 RAM
GPU: 4GB VRAM with NVIDIA Corporation GP107 [GeForce GTX 1050 Ti]

Running Mistral with success as:

$ ./Mistral-Nemo-Instruct-2407.Q4_K_M.llamafile -ngl 15

with following status:

llm_load_tensors: offloading 15 repeating layers to GPU
llm_load_tensors: offloaded 15/41 layers to GPU
llm_load_tensors:        CPU buffer size =  7123.30 MiB
llm_load_tensors:      CUDA0 buffer size =  2368.36 MiB

I am getting 2.44 tokens per second which is not much, though functional. I am fine with it, as for time being do not have better configuration. I have started using it today for reason that on this low end configuration I need better quality text scanning and replacements. Can't wait to switch to 24GB GPU and 128 RAM, with better configuration.

I wonder if functions may be used through llamafile, anybody knows it?

Thanks to all contributors in Mozilla and Mistral team who made some of their models free software.