Spaces:

duckdb-nsql-hub
/

DuckDB-SQL-Eval

Running

App Files Files Community

DuckDB-SQL-Eval / duckdb-nsql /eval /README.md

tdoehmen

added duckdb-nsql benchmark setup

b247dc4 about 1 month ago

preview code

raw

history blame

2.45 kB

	This folder contains the suite for evaluating the DuckDB-Text2SQL model.

	Please install the dependencies listed in the requirements.txt file located in the parent folder.

	## Setup
	To evaluate against the benchmark dataset, you need to prepare the evaluation script using this benchmark.

	```
	mkdir metrics
	cd metrics
	git clone [email protected]:ElementAI/test-suite-sql-eval.git test_suite_sql_eval
	cd ..
	```

	You need to add a new remote to evaluate against duckdb in the test-suite-sql-eval folder. And check the latest duckdb-only branch (640a12975abf75a94e917caca149d56dbc6bcdd7).

	```
	git remote add till https://github.com/tdoehmen/test-suite-sql-eval.git
	git fetch till
	git checkout till/duckdb-only
	```

	Next, prepare the docs for retrieval.
	```
	mkdir docs
	cd docs
	git clone https://github.com/duckdb/duckdb-web.git
	cd ..
	```

	#### Dataset
	The benchmark dataset is located in the `data/` folder and includes all databases (`data/databases`), table schemas (`data/tables.json`), and examples (`data/dev.json`).

	#### Eval
	Start a manifest session with the model you want to evaluate.

	```bash
	python -m manifest.api.app \
	--model_type huggingface \
	--model_generation_type text-generation \
	--model_name_or_path motherduckdb/DuckDB-NSQL-7B-v0.1 \
	--fp16 \
	--device 0
	```

	Then, from the `DuckDB-NSQL` main folder, run:

	```bash
	python eval/predict.py \
	predict \
	eval/data/dev.json \
	eval/data/tables.json \
	--output-dir output/ \
	--stop-tokens ';' \
	--stop-tokens '--' \
	--stop-tokens '```' \
	--stop-tokens '###' \
	--overwrite-manifest \
	--manifest-client huggingface \
	--manifest-connection http://localhost:5000 \
	--prompt-format duckdbinst
	```
	This will format the prompt using the duckdbinst style.

	To evaluate the prediction, first run the following in a Python shell:

	```python
	try:
	import duckdb

	con = duckdb.connect()
	con.install_extension("httpfs")
	con.load_extension("httpfs")
	except Exception as e:
	print(f"Error loading duckdb extensions: {e}")
	```

	Then, run the evaluation script:

	```bash
	python eval/evaluate.py \
	evaluate \
	--gold eval/data/dev.json \
	--db eval/data/databases/ \
	--tables eval/data/tables.json \
	--output-dir output/ \
	--pred [PREDICITON_FILE]
	```

	To view the output, all the information is located in the prediction file in the [output-dir]. Here, `query` is gold and `pred` is predicted.