compilade
/

quant-tests

Inference Endpoints

Model card Files Files and versions Community

quant-tests / BENCHMARKING.md

compilade's picture

Add benchmarking script

bc00f19 about 2 months ago

|

history blame contribute delete

872 Bytes

	# Benchmarking models

	To use `bench-TriLMs.sh`, you need to

	- Place it in a `llama.cpp` checkout
	- Have `cmake`, `gcc`, and other dependencies of `llama.cpp`
	- If you want to benchmark on GPUs, the script checks if `nvidia-smi` is present, and you'll also need the necessary compile-time dependencies

	The script will automatically download the models and quantize different variants.

	It will then produce 2 result files, one called `results-$(date +%s).json` and the other called `results-$(date +%s)-cpuinfo.txt`. Both will use the exact same date.

	The intention is to eventually read the produced `.json` in a Python script with

	```python3
	from __future__ import annotations

	from typing import Any
	import json

	with open("result-1234567890.json") as f:
	data: list[list[dict[str, Any]]] = json.loads("[" + f.read() + "]")

	# Then use that data
	...
	```