My Experience with Mistral AI's Nemo Instruct Model

#1
by JLouisBiz - opened

Computer configuration:

  • Manufacturer: Dell Inc.
  • Product Name: 0VHWTR, Version: A02
  • Memory: 16 GB DDR3 RAM
  • GPU: 4GB VRAM with NVIDIA Corporation GP107 [GeForce GTX 1050 Ti]

Running Mistral with success as:

$ ./Mistral-Nemo-Instruct-2407.Q4_K_M.llamafile -ngl 15

with following status:

llm_load_tensors: offloading 15 repeating layers to GPU
llm_load_tensors: offloaded 15/41 layers to GPU
llm_load_tensors:        CPU buffer size =  7123.30 MiB
llm_load_tensors:      CUDA0 buffer size =  2368.36 MiB

I am getting 2.44 tokens per second which is not much, though functional. I am fine with it, as for time being do not have better configuration. I have started using it today for reason that on this low end configuration I need better quality text scanning and replacements. Can't wait to switch to 24GB GPU and 128 RAM, with better configuration.

I wonder if functions may be used through llamafile, anybody knows it?

Thanks to all contributors in Mozilla and Mistral team who made some of their models free software.

Sign up or log in to comment