Benchmarking models
To use bench-TriLMs.sh
, you need to
- Place it in a
llama.cpp
checkout - Have
cmake
,gcc
, and other dependencies ofllama.cpp
- If you want to benchmark on GPUs, the script checks if
nvidia-smi
is present, and you'll also need the necessary compile-time dependencies
The script will automatically download the models and quantize different variants.
It will then produce 2 result files, one called results-$(date +%s).json
and the other called results-$(date +%s)-cpuinfo.txt
. Both will use the exact same date.
The intention is to eventually read the produced .json
in a Python script with
from __future__ import annotations
from typing import Any
import json
with open("result-1234567890.json") as f:
data: list[list[dict[str, Any]]] = json.loads("[" + f.read() + "]")
# Then use that data
...