duckdb-nsql/eval/README.md · duckdb-nsql-hub/DuckDB-SQL-Eval at f0b0c7e0569810f2ea38c09ec0bc0f68236dd14a

This folder contains the suite for evaluating the DuckDB-Text2SQL model.

Please install the dependencies listed in the requirements.txt file located in the parent folder.

Setup

To evaluate against the benchmark dataset, you need to prepare the evaluation script using this benchmark.

mkdir metrics
cd metrics
git clone [email protected]:ElementAI/test-suite-sql-eval.git test_suite_sql_eval
cd ..

You need to add a new remote to evaluate against duckdb in the test-suite-sql-eval folder. And check the latest duckdb-only branch (640a12975abf75a94e917caca149d56dbc6bcdd7).

git remote add till https://github.com/tdoehmen/test-suite-sql-eval.git
git fetch till
git checkout till/duckdb-only

Next, prepare the docs for retrieval.

mkdir docs
cd docs
git clone https://github.com/duckdb/duckdb-web.git
cd ..

Dataset

The benchmark dataset is located in the data/ folder and includes all databases (data/databases), table schemas (data/tables.json), and examples (data/dev.json).

Eval

Start a manifest session with the model you want to evaluate.

python -m manifest.api.app \
    --model_type huggingface \
    --model_generation_type text-generation \
    --model_name_or_path motherduckdb/DuckDB-NSQL-7B-v0.1 \
    --fp16 \
    --device 0

Then, from the DuckDB-NSQL main folder, run:

python eval/predict.py \
    predict \
    eval/data/dev.json \
    eval/data/tables.json \
    --output-dir output/ \
    --stop-tokens ';' \
    --stop-tokens '--' \
    --stop-tokens '```' \
    --stop-tokens '###' \
    --overwrite-manifest \
    --manifest-client huggingface \
    --manifest-connection http://localhost:5000 \
    --prompt-format duckdbinst

This will format the prompt using the duckdbinst style.

To evaluate the prediction, first run the following in a Python shell:

try:
    import duckdb

    con = duckdb.connect()
    con.install_extension("httpfs")
    con.load_extension("httpfs")
except Exception as e:
    print(f"Error loading duckdb extensions: {e}")

Then, run the evaluation script:

python eval/evaluate.py \
    evaluate \
    --gold eval/data/dev.json \
    --db eval/data/databases/ \
    --tables eval/data/tables.json \
    --output-dir output/ \
    --pred [PREDICITON_FILE]

To view the output, all the information is located in the prediction file in the [output-dir]. Here, query is gold and pred is predicted.