File size: 2,447 Bytes
b247dc4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
This folder contains the suite for evaluating the DuckDB-Text2SQL model.

Please install the dependencies listed in the requirements.txt file located in the parent folder.

## Setup
To evaluate against the benchmark dataset, you need to prepare the evaluation script using this benchmark.

```
mkdir metrics
cd metrics
git clone [email protected]:ElementAI/test-suite-sql-eval.git test_suite_sql_eval
cd ..
```

You need to add a new remote to evaluate against duckdb in the test-suite-sql-eval folder. And check the latest duckdb-only branch (640a12975abf75a94e917caca149d56dbc6bcdd7).

```
git remote add till https://github.com/tdoehmen/test-suite-sql-eval.git
git fetch till
git checkout till/duckdb-only
```

Next, prepare the docs for retrieval.
```
mkdir docs
cd docs
git clone https://github.com/duckdb/duckdb-web.git
cd ..
```

#### Dataset
The benchmark dataset is located in the `data/` folder and includes all databases (`data/databases`), table schemas (`data/tables.json`), and examples (`data/dev.json`).

#### Eval
Start a manifest session with the model you want to evaluate.

```bash
python -m manifest.api.app \
    --model_type huggingface \
    --model_generation_type text-generation \
    --model_name_or_path motherduckdb/DuckDB-NSQL-7B-v0.1 \
    --fp16 \
    --device 0
```

Then, from the `DuckDB-NSQL` main folder, run:

```bash
python eval/predict.py \
    predict \
    eval/data/dev.json \
    eval/data/tables.json \
    --output-dir output/ \
    --stop-tokens ';' \
    --stop-tokens '--' \
    --stop-tokens '```' \
    --stop-tokens '###' \
    --overwrite-manifest \
    --manifest-client huggingface \
    --manifest-connection http://localhost:5000 \
    --prompt-format duckdbinst
```
This will format the prompt using the duckdbinst style.

To evaluate the prediction, first run the following in a Python shell:

```python
try:
    import duckdb

    con = duckdb.connect()
    con.install_extension("httpfs")
    con.load_extension("httpfs")
except Exception as e:
    print(f"Error loading duckdb extensions: {e}")
```

Then, run the evaluation script:

```bash
python eval/evaluate.py \
    evaluate \
    --gold eval/data/dev.json \
    --db eval/data/databases/ \
    --tables eval/data/tables.json \
    --output-dir output/ \
    --pred [PREDICITON_FILE]
```

To view the output, all the information is located in the prediction file in the [output-dir]. Here, `query` is gold and `pred` is predicted.