Benchmarking models

To use bench-TriLMs.sh, you need to

Place it in a llama.cpp checkout
Have cmake, gcc, and other dependencies of llama.cpp
If you want to benchmark on GPUs, the script checks if nvidia-smi is present, and you'll also need the necessary compile-time dependencies

The script will automatically download the models and quantize different variants.

It will then produce 2 result files, one called results-$(date +%s).json and the other called results-$(date +%s)-cpuinfo.txt. Both will use the exact same date.

The intention is to eventually read the produced .json in a Python script with

from __future__ import annotations

from typing import Any
import json

with open("result-1234567890.json") as f:
    data: list[list[dict[str, Any]]] = json.loads("[" + f.read() + "]")

    # Then use that data
    ...