tdoehmen's picture
added duckdb-nsql benchmark setup
b247dc4
|
raw
history blame
2.45 kB
This folder contains the suite for evaluating the DuckDB-Text2SQL model.
Please install the dependencies listed in the requirements.txt file located in the parent folder.
## Setup
To evaluate against the benchmark dataset, you need to prepare the evaluation script using this benchmark.
```
mkdir metrics
cd metrics
git clone [email protected]:ElementAI/test-suite-sql-eval.git test_suite_sql_eval
cd ..
```
You need to add a new remote to evaluate against duckdb in the test-suite-sql-eval folder. And check the latest duckdb-only branch (640a12975abf75a94e917caca149d56dbc6bcdd7).
```
git remote add till https://github.com/tdoehmen/test-suite-sql-eval.git
git fetch till
git checkout till/duckdb-only
```
Next, prepare the docs for retrieval.
```
mkdir docs
cd docs
git clone https://github.com/duckdb/duckdb-web.git
cd ..
```
#### Dataset
The benchmark dataset is located in the `data/` folder and includes all databases (`data/databases`), table schemas (`data/tables.json`), and examples (`data/dev.json`).
#### Eval
Start a manifest session with the model you want to evaluate.
```bash
python -m manifest.api.app \
--model_type huggingface \
--model_generation_type text-generation \
--model_name_or_path motherduckdb/DuckDB-NSQL-7B-v0.1 \
--fp16 \
--device 0
```
Then, from the `DuckDB-NSQL` main folder, run:
```bash
python eval/predict.py \
predict \
eval/data/dev.json \
eval/data/tables.json \
--output-dir output/ \
--stop-tokens ';' \
--stop-tokens '--' \
--stop-tokens '```' \
--stop-tokens '###' \
--overwrite-manifest \
--manifest-client huggingface \
--manifest-connection http://localhost:5000 \
--prompt-format duckdbinst
```
This will format the prompt using the duckdbinst style.
To evaluate the prediction, first run the following in a Python shell:
```python
try:
import duckdb
con = duckdb.connect()
con.install_extension("httpfs")
con.load_extension("httpfs")
except Exception as e:
print(f"Error loading duckdb extensions: {e}")
```
Then, run the evaluation script:
```bash
python eval/evaluate.py \
evaluate \
--gold eval/data/dev.json \
--db eval/data/databases/ \
--tables eval/data/tables.json \
--output-dir output/ \
--pred [PREDICITON_FILE]
```
To view the output, all the information is located in the prediction file in the [output-dir]. Here, `query` is gold and `pred` is predicted.