Spaces:

duckdb-nsql-hub
/

DuckDB-SQL-Eval

Running

App Files Files Community

tdoehmen commited on Oct 3, 2024

Commit

e9713ec

1 Parent(s): ea4a3ce

added test suite

Browse files

Files changed (18) hide show

duckdb-nsql/eval/metrics/test_suite_sql_eval +0 -1
duckdb-nsql/eval/metrics/test_suite_sql_eval/README.md +144 -0
duckdb-nsql/eval/metrics/test_suite_sql_eval/__init__.py +0 -0
duckdb-nsql/eval/metrics/test_suite_sql_eval/alter_michigan_databases.sh +66 -0
duckdb-nsql/eval/metrics/test_suite_sql_eval/classical_provenance.ipynb +301 -0
duckdb-nsql/eval/metrics/test_suite_sql_eval/classical_test.pkl +3 -0
duckdb-nsql/eval/metrics/test_suite_sql_eval/database/readme.txt +2 -0
duckdb-nsql/eval/metrics/test_suite_sql_eval/evaluate_classical.py +202 -0
duckdb-nsql/eval/metrics/test_suite_sql_eval/evaluation.py +1210 -0
duckdb-nsql/eval/metrics/test_suite_sql_eval/evaluation_examples/academic_gold.txt +196 -0
duckdb-nsql/eval/metrics/test_suite_sql_eval/evaluation_examples/classical_test_gold.txt +0 -0
duckdb-nsql/eval/metrics/test_suite_sql_eval/evaluation_examples/gold.txt +453 -0
duckdb-nsql/eval/metrics/test_suite_sql_eval/evaluation_examples/predict.txt +453 -0
duckdb-nsql/eval/metrics/test_suite_sql_eval/exec_eval.py +313 -0
duckdb-nsql/eval/metrics/test_suite_sql_eval/parse.py +252 -0
duckdb-nsql/eval/metrics/test_suite_sql_eval/process_sql.py +644 -0
duckdb-nsql/eval/metrics/test_suite_sql_eval/tables.json +0 -0
duckdb-nsql/eval/metrics/test_suite_sql_eval/tmp/readme.txt +1 -0

duckdb-nsql/eval/metrics/test_suite_sql_eval DELETED Viewed

	@@ -1 +0,0 @@
1	- Subproject commit 640a12975abf75a94e917caca149d56dbc6bcdd7

duckdb-nsql/eval/metrics/test_suite_sql_eval/README.md ADDED Viewed

	@@ -0,0 +1,144 @@

+# Semantic Evaluation for Text-to-SQL with Test Suites
+This repo contains test suite evaluation metric for 11 text-to-SQL tasks. Compared to other current metrics, test suite calculates a tighter upper-bound for semantic accuracy efficiently. It is proposed in our EMNLP 2020 paper: [Semantic Evaluation for Text-to-SQL with Distilled Test Suites](https://arxiv.org/abs/2010.02840). It is now the official metric of [Spider](https://yale-lily.github.io/spider), [SParC](https://yale-lily.github.io/sparc), and [CoSQL](https://yale-lily.github.io/cosql), and is also now available for Academic, ATIS, Advising, Geography, IMDB, Restaurants, Scholar, and Yelp (building on the amazing work by [Catherine and Jonathan](https://github.com/jkkummerfeld/text2sql-data)).
+Notice: Please refer to [Ruiqi's repo](https://github.com/ruiqi-zhong/TestSuiteEval) for the code to generate neighbor queries and random databases as defined in the paper. We look forward to similar evaluations in other semantic parsing domains.
+## Setting Up
+To run the test suite (execution) evaluation, first download the test suites (databases) for the 11 text-to-SQL tasks from [here](https://drive.google.com/file/d/1mkCx2GOFIqNesD4y8TDAO1yX1QZORP5w/view?usp=sharing), and put them in `database/` directory.
+You also need to install sqlparse and nltk to run the evaluation.
+```
+pip3 install sqlparse
+pip3 install nltk
+```
+## Official Evaluation for Spider, SParC, and CoSQL
+We will report the test suite accuracy for the official [Spider](https://yale-lily.github.io/spider), [SParC](https://yale-lily.github.io/sparc), and [CoSQL](https://yale-lily.github.io/cosql) leaderboards (starting Oct. 2020). The original exact set match accuracy will be reported as a reference.
+Below is the example command to calculate the test suite accuracy for development sets of Spider, CoSQL and SParC.
+```
+python3 evaluation.py --gold [gold file] --pred [predicted file] --etype [evaluation type] --db [database dir] --table [table file] --plug_value --keep_distinct --progress_bar_for_each_datapoint
+arguments:
+     [gold file]       gold file where each line is `a gold SQL \t db_id` for Spider, SParC, and CoSQL, and interactions are seperated by one empty line for SParC and CoSQL. See an example at evaluation_examples/gold.txt
+    [predicted file]   predicted sql file where each line is a predicted SQL, and interactions are seperated by one empty line. See an example at evaluation_examples/predict.txt
+    [database dir]     the directory that contains all the databases and test suites
+    [table file]       table.json file which includes foreign key info of each database.
+    [evaluation type]  "exec" for test suite accuracy (default), "match" for the original exact set match accuracy, and "all" for both
+    --plug_value       whether to plug in the gold value into the predicted query; suitable if your model does not predict values.
+    --keep_distinct    whether to keep distinct keyword during evaluation. default is false.
+    --progress_bar_for_each_datapoint   whether to print progress bar of running test inputs for each datapoint
+```
+#### Test Suite Execution Accuracy without Values
+If your system does NOT predict values in the SQL queries, you should add the `--plug value` flag, which will extract the values used in the gold query and plug them into the predicted query.
+```
+python3 evaluation.py
+    --gold [gold file]
+    --pred [predicted file]
+    --db [database dir]
+    --etype exec
+    --plug_value
+```
+To also compute the original set match accuracy:
+```
+python3 evaluation.py
+    --gold [gold file]
+    --pred [predicted file]
+    --db [database dir]
+    --table [table file]
+    --etype all
+    --plug_value
+```
+#### Test Suite Execution Accuracy with Values
+We encourage people to report performances with value predictions and do not include `--plug value` argument.
+```
+python3 evaluation.py
+    --gold [gold file]
+    --pred [predicted file]
+    --db [database dir]
+    --etype exec
+```
+#### Other Agruments
+If `--keep_distinct` is included, the distinct keywords will NOT be removed during evaluation. To make a fair comparison with the original exact set match metric, `--keep_distinct` should not be added.
+Include `--progress_bar_for_each_datapoint` if you suspect that the execution got stuck on a specific test input; it will print the progress of running on each test input.
+## Evaluation for Other Classical Text-to-SQL Datasets
+*UPDATE:* we fixed the issue mentioned in https://github.com/taoyds/test-suite-sql-eval/issues/1 . We also added additional features to evaluate on a subset and cache the results to speed up evaluation.
+The prior work on classical text-to-sql datasets (ATIS, Academic, Advising, Geography, IMDB, Restaurants, Scholar, Yelp) usually reports the exact string match accuracy and execution accuracy over a single database content, which either exaggerates or deflates the real semantic accuracy.
+The test set for classical text-to-sql datasets are adopted from [this repo](https://github.com/jkkummerfeld/text2sql-data). We used all the test splits if the test split is defined, and the entire dataset otherwise. We also rewrite the SQLs to conform with the style in the Spider dataset.
+All the test datapoints are saved in `classical_test.pkl`. Each test datapoint is represented as a dictonary have the following keys and values:
+- `db_id`: which one of the eight original classical datasets does it belong to. database/[db_id]/[db_id].sqlite contains an empty database with the associated schema.
+- `query`: the ground truth SQL query (or any semantically equivalent variant) the model needs to predict.
+- `variables`: the constants that are used in the SQL query. We also include a field called `ancestor_of_occuring_column`, where we find out all the column that contains this value and recursively find its `ancestor column` (if a column refers to a parent column/has a foreign key reference). This field is especially useful if your algorithm originally uses database content to help generate model predictions.
+- `testsuite`: a set of database paths on which we will compare denotation on
+- `texts`: the associated natural language descriptions, with the constant value extracted.
+- `orig_id`: the original data id from jonathan's repo. it is a tulple of two elements (db_id, idx) - referring to the idx^th element of the list encoded by text2sql-data/data/[db_id].json .
+You can evaluate your model in whatever configurations you want. For example, you may choose to plug in the values into the text and ask the model itself to figure out which constants the user has given;
+or you can relax the modelling assumption and assume the model has oracle access to the ground truth constant value; or you can further relax the assumption of knowing which "ancestor column" contains the constant provided.
+However, in any case, you **SHOULD NOT** change the gold query, since test suite generation is dependent on it.
+The `judge` function in evaluate_classical.py contains what you need to evaluate a single model prediction.
+It takes in the ground truth information of a datapoint (an element in `classical_test.pkl`, represented as a dictionary) and a model prediction (as  a string) and returns True/False - whether the prediction is semantically correct.
+Suppose you have made a model prediction for every datapoint and write it into a `.txt` file (one prediction per line), you can use the following example command to calculate the accuracy:
+```
+python3 evaluate_classical.py --gold [gold file] --pred [predicted file] --out_file [output file] --num_processes [process number]
+arguments:
+    [gold file]        path to gold file. The default is classical_test.pkl, and is hence this argument is optional.
+    [predicted file]   the path to the predicted file. See an example evaluation_examples/classical_test_gold.txt
+    [output file]      the output file path. e.g. goldclassicaltest.pkl
+    [process number]   number of processes to use. By default, it is set to cpu_count() // 3, and is hence optional.
+    [subset]           which subset to evaluate on. can be one of {atis,advising,academic,imdb,restaurants,geography,scholar,yelp,full}
+    [disable_cache]    whether to directly apply previously computed result and cache the current results. Use this flag to disable caching.
+```
+Here is an example command that evaluates the gold prediction file:
+```
+python3 evaluate_classical.py --pred=evaluation_examples/classical_test_gold.txt --out_file=all_eval_results.json
+```
+You can also choose to evaluate only on a subset of the datapoints, for example
+```
+python3 evaluate_classical.py --pred=evaluation_examples/academic_gold.txt --subset=academic --out_file=out/out_academic_test.json
+```
+By default, the evaluation script will save the results of evaluation in cache.pkl, and use it in the future (since these evaluation take a long time to run).
+Use the ``disable_cache`` flag otherwise.
+The process through which data are transformed can be seen in classical_provenance.ipynb.
+## Citation
+```
+@InProceedings{ruiqi20,
+  author =  {Ruiqi Zhong and Tao Yu and Dan Klein},
+  title =   {Semantic Evaluation for Text-to-SQL with Distilled Test Suite},
+  year =    {2020},
+  booktitle =   {The 2020 Conference on Empirical Methods in Natural Language Processing},
+  publisher = {Association for Computational Linguistics},
+}
+```

duckdb-nsql/eval/metrics/test_suite_sql_eval/__init__.py ADDED Viewed

File without changes

duckdb-nsql/eval/metrics/test_suite_sql_eval/alter_michigan_databases.sh ADDED Viewed

	@@ -0,0 +1,66 @@

+#~/usr/bin/env bash
+set -e
+DATABASE_DIR=.
+copy_databases () {
+  db=$1
+  # Copy to *_test directory
+  altered=$DATABASE_DIR/${db}_test
+  cp -r "$DATABASE_DIR/$db" "$altered"
+  # Rename .sqlite files
+  cd "$altered"
+  for f in ${db}*.sqlite
+  do
+    mv "$f" "${db}_test${f#${db}}"
+  done
+  cd -
+}
+alter_yelp () {
+  for f in `ls $DATABASE_DIR/yelp_test/*.sqlite`
+  do
+    echo "ALTER TABLE neighbourhood RENAME TO neighborhood" | sqlite3 "$f"
+    echo "ALTER TABLE neighborhood RENAME COLUMN neighbourhood_name TO neighborhood_name" | sqlite3 "$f"
+  done
+}
+alter_imdb () {
+  for f in `ls $DATABASE_DIR/imdb_test/*.sqlite`
+  do
+    echo "ALTER TABLE cast RENAME TO cast2" | sqlite3 "$f"
+  done
+}
+alter_academic () {
+  :
+}
+alter_geo () {
+  :
+}
+alter_scholar () {
+  :
+}
+# geo is an exception in that we want to change the name from "geography" to "geo_test"
+# it is easiest to achieve this is by copying "geography" to "geo" first
+if [ ! -d $DATABASE_DIR/geo ]
+then
+  cp -r $DATABASE_DIR/geography $DATABASE_DIR/geo
+  mv $DATABASE_DIR/geo/geography.sqlite $DATABASE_DIR/geo/geo.sqlite
+fi
+for DB in imdb yelp academic geo scholar
+do
+  echo $DB
+  if [ ! -d "$DATABASE_DIR/${DB}_test" ]
+  then
+    copy_databases $DB
+    alter_"$DB"
+  else
+    echo "$DATABASE_DIR/${DB}_test already exists"
+  fi
+done

duckdb-nsql/eval/metrics/test_suite_sql_eval/classical_provenance.ipynb ADDED Viewed

	@@ -0,0 +1,301 @@

+{
+ "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import json\n",
+    "import sqlparse\n",
+    "import pickle as pkl\n",
+    "dataset_names = ['academic', 'atis', 'advising', 'geography', 'imdb', 'restaurants', 'scholar', 'yelp']\n",
+    "\n",
+    "# these datasets are small, so we use the full set. \n",
+    "new_split_defined = {'restaurants', 'academic', 'imdb', 'yelp'} "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# loading the original datasets from the paper:\n",
+    "# Improving Text-to-SQL Evaluation Methodology\n",
+    "\n",
+    "# a dataset is a list of dictionaries\n",
+    "# in the original dictionary, each datapoint might consist of several natural language sentences or SQL\n",
+    "orig_datasets = []\n",
+    "for dataset_name in dataset_names:\n",
+    "    orig_dataset = json.load(open('text2sql-data/data/%s.json' % dataset_name))\n",
+    "    for idx, d in enumerate(orig_dataset):\n",
+    "        \n",
+    "        d['orig_id'] = (dataset_name, idx)\n",
+    "        \n",
+    "        # fixing annotations here\n",
+    "        \n",
+    "        # change \"company_name\" to producer name, otherwise there is no variable to replace\n",
+    "        if dataset_name == 'imdb' and idx == 27:\n",
+    "            d['sql'][0] = 'SELECT MOVIEalias0.TITLE FROM COMPANY AS COMPANYalias0 , COPYRIGHT AS COPYRIGHTalias0 , MOVIE AS MOVIEalias0 WHERE COMPANYalias0.NAME = \"producer_name0\" AND COPYRIGHTalias0.CID = COMPANYalias0.ID AND MOVIEalias0.MID = COPYRIGHTalias0.MSID AND MOVIEalias0.RELEASE_YEAR > movie_release_year0 ;'\n",
+    "    \n",
+    "        # removing the extra space surrounding the variable actor_name0\n",
+    "        if dataset_name == 'imdb' and idx == 78:\n",
+    "            d['sql'][0] = 'SELECT MAX( DERIVED_TABLEalias0.DERIVED_FIELDalias0 ) FROM ( SELECT COUNT( DISTINCT ( MOVIEalias0.TITLE ) ) AS DERIVED_FIELDalias0 FROM ACTOR AS ACTORalias0 , CAST AS CASTalias0 , MOVIE AS MOVIEalias0 WHERE ACTORalias0.NAME = \"actor_name0\" AND CASTalias0.AID = ACTORalias0.AID AND MOVIEalias0.MID = CASTalias0.MSID GROUP BY MOVIEalias0.RELEASE_YEAR ) AS DERIVED_TABLEalias0 ;'\n",
+    "    \n",
+    "        # there was a scoping error; changed AUTHORalias1 to AUTHORalias0, PUBLICATIONalias1 to PUBLICATIONalias0\n",
+    "        if dataset_name == 'academic' and idx == 182:\n",
+    "            d['sql'][0] = 'SELECT DERIVED_FIELDalias0 FROM ( SELECT AUTHORalias0.NAME AS DERIVED_FIELDalias0 , COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) AS DERIVED_FIELDalias1 FROM AUTHOR AS AUTHORalias0 , CONFERENCE AS CONFERENCEalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE CONFERENCEalias0.NAME = \"conference_name0\" AND PUBLICATIONalias0.CID = CONFERENCEalias0.CID AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID GROUP BY AUTHORalias0.NAME ) AS DERIVED_TABLEalias0 , ( SELECT AUTHORalias1.NAME AS DERIVED_FIELDalias2 , COUNT( DISTINCT ( PUBLICATIONalias1.TITLE ) ) AS DERIVED_FIELDalias3 FROM AUTHOR AS AUTHORalias1 , CONFERENCE AS CONFERENCEalias1 , PUBLICATION AS PUBLICATIONalias1 , WRITES AS WRITESalias1 WHERE CONFERENCEalias1.NAME = \"conference_name1\" AND PUBLICATIONalias1.CID = CONFERENCEalias1.CID AND WRITESalias1.AID = AUTHORalias1.AID AND WRITESalias1.PID = PUBLICATIONalias1.PID GROUP BY AUTHORalias1.NAME ) AS DERIVED_TABLEalias1 WHERE DERIVED_TABLEalias0.DERIVED_FIELDalias1 > DERIVED_TABLEalias1.DERIVED_FIELDalias3 AND DERIVED_TABLEalias1.DERIVED_FIELDalias2 = DERIVED_TABLEalias0.DERIVED_FIELDalias0 ;'\n",
+    "        \n",
+    "        # wrong number of arguments to function COUNT(), change from \",\" to \"||\" for sqlite3 to recognize and execute\n",
+    "        if dataset_name == 'advising' and idx == 107:\n",
+    "            d['sql'][0] = 'SELECT COUNT( DISTINCT COURSEalias1.DEPARTMENT || COURSEalias0.NUMBER ) FROM COURSE AS COURSEalias0 , COURSE AS COURSEalias1 , COURSE_PREREQUISITE AS COURSE_PREREQUISITEalias0 , STUDENT_RECORD AS STUDENT_RECORDalias0 WHERE COURSEalias0.COURSE_ID = COURSE_PREREQUISITEalias0.PRE_COURSE_ID AND COURSEalias1.COURSE_ID = COURSE_PREREQUISITEalias0.COURSE_ID AND COURSEalias1.DEPARTMENT = \"department0\" AND COURSEalias1.NUMBER = number0 AND STUDENT_RECORDalias0.COURSE_ID = COURSEalias0.COURSE_ID AND STUDENT_RECORDalias0.STUDENT_ID = 1 ;'\n",
+    "        \n",
+    "        # there was not example given for level1 and hence replacing variable with values leads to errors\n",
+    "        if dataset_name == 'advising' and idx == 132:\n",
+    "            d['variables'][0]['example'] = '300'\n",
+    "        \n",
+    "        # cannot use count and order without group by; added grouping by actor_id\n",
+    "        if dataset_name == 'imdb' and idx == 79:\n",
+    "            d['sql'][0] = 'SELECT ACTORalias0.NAME FROM ACTOR AS ACTORalias0 , CAST AS CASTalias0 , MOVIE AS MOVIEalias0 WHERE CASTalias0.AID = ACTORalias0.AID AND MOVIEalias0.MID = CASTalias0.MSID GROUP BY ACTORalias0.AID ORDER BY COUNT( DISTINCT ( MOVIEalias0.TITLE ) ) DESC LIMIT 1 ;'\n",
+    "    \n",
+    "        # cannot use count and order without group by; added grouping by actor_id\n",
+    "        if dataset_name == 'imdb' and idx == 80:\n",
+    "            d['sql'][0] = 'SELECT ACTORalias0.NAME FROM ACTOR AS ACTORalias0 , CAST AS CASTalias0 , DIRECTED_BY AS DIRECTED_BYalias0 , DIRECTOR AS DIRECTORalias0 , MOVIE AS MOVIEalias0 WHERE CASTalias0.AID = ACTORalias0.AID AND DIRECTORalias0.DID = DIRECTED_BYalias0.DID AND MOVIEalias0.MID = CASTalias0.MSID AND MOVIEalias0.MID = DIRECTED_BYalias0.MSID GROUP BY ACTORalias0.AID ORDER BY COUNT( DISTINCT ( MOVIEalias0.TITLE ) ) DESC LIMIT 1 ;'\n",
+    "        \n",
+    "        # table has \"u\" in the neighborhood spelling.\n",
+    "        n_before, n_after = 'NEIGHBORHOOD', 'NEIGHBOURHOOD'\n",
+    "        if dataset_name == 'yelp':\n",
+    "            d['sql'][0] = d['sql'][0].replace(n_before, n_after)\n",
+    "        \n",
+    "        if dataset_name == 'yelp' and idx == 42:\n",
+    "            d['sql'][0] = 'SELECT NEIGHBOURHOODalias0.NEIGHBOURHOOD_NAME FROM BUSINESS AS BUSINESSalias0 , NEIGHBOURHOOD AS NEIGHBOURHOODalias0 , REVIEW AS REVIEWalias0 , USER AS USERalias0 WHERE NEIGHBOURHOODalias0.BUSINESS_ID = BUSINESSalias0.BUSINESS_ID AND REVIEWalias0.BUSINESS_ID = BUSINESSalias0.BUSINESS_ID AND USERalias0.NAME = \"user_name0\" AND USERalias0.USER_ID = REVIEWalias0.USER_ID ;'\n",
+    "\n",
+    "    orig_datasets.extend(orig_dataset)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "There are 3509 datapoints in the new testset\n"
+     ]
+    }
+   ],
+   "source": [
+    "# we create the new testset here\n",
+    "new_testset = []\n",
+    "for d in orig_datasets:\n",
+    "    orig_id = d['orig_id']\n",
+    "    db_id, idx = orig_id\n",
+    "    \n",
+    "    # we only incorporate the test split if the dataset is large enough\n",
+    "    # otherwise we incorporate the entire dataset\n",
+    "    if d['query-split'] != 'test' and db_id not in new_split_defined:\n",
+    "        continue\n",
+    "    sql = d['sql'][0]\n",
+    "    instance_variables = d['variables']\n",
+    "    instance_name2examples = {d['name']: d['example'] for d in instance_variables}\n",
+    "    \n",
+    "    # we create a new datapoint for each natural language query\n",
+    "    for sentence in d['sentences']:\n",
+    "        new_datapoint = {\n",
+    "            'text': sentence['text'],\n",
+    "            'query': sql,\n",
+    "            'variables': instance_variables,\n",
+    "            'orig_id': orig_id,\n",
+    "            'db_id': db_id,\n",
+    "            'db_path': 'database/{db_id}/{db_id}.sqlite'.format(db_id=db_id)\n",
+    "        }\n",
+    "        new_testset.append(new_datapoint)\n",
+    "print('There are %d datapoints in the new testset' % len(new_testset))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import re\n",
+    "\n",
+    "# this block implements a function that extract variable names from text and sql\n",
+    "# later we use it to ensure that every variable is replaced\n",
+    "\n",
+    "variable_pattern = re.compile('^[a-z_]+[0-9]+$')\n",
+    "\n",
+    "def extract_variable_names(t):\n",
+    "    tokens = t.replace('\"', '').replace('%', '').split(' ')\n",
+    "    var_names = {v for v in tokens if variable_pattern.match(v) and 'alias' not in v}\n",
+    "    return var_names\n",
+    "\n",
+    "test = False\n",
+    "if test:\n",
+    "    sql = 'SELECT BUSINESSalias0.NAME FROM BUSINESS AS BUSINESSalias0 , REVIEW AS REVIEWalias0 WHERE REVIEWalias0.BUSINESS_ID = BUSINESSalias0.BUSINESS_ID AND REVIEWalias0.MONTH = \"review_month0\" GROUP BY BUSINESSalias0.NAME ORDER BY COUNT( DISTINCT ( REVIEWalias0.TEXT ) ) DESC LIMIT 1 ;'\n",
+    "    print(extract_variable_names(sql))\n",
+    "    text = 'return me the homepage of journal_name0 .'\n",
+    "    print(extract_variable_names(text))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# this block removes extra space surrounding variable names\n",
+    "def remove_extra_space_around_variable(t):\n",
+    "    var_names = extract_variable_names(t)\n",
+    "    result = str(t)\n",
+    "    for v in var_names:\n",
+    "        result = result.replace('\" ' + v + ' \"', v)\n",
+    "    return result"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "set()\n"
+     ]
+    }
+   ],
+   "source": [
+    "problematic = set()\n",
+    "\n",
+    "for datapoint in new_testset:\n",
+    "    orig_id = datapoint['orig_id']\n",
+    "    \n",
+    "    # remove extra whitespace surrounding the text\n",
+    "    datapoint['text'] = remove_extra_space_around_variable(datapoint['text'])\n",
+    "    \n",
+    "    # there should not be extra whitespace surrounding the sql variables\n",
+    "    if datapoint['query'] != remove_extra_space_around_variable(datapoint['query']):\n",
+    "        problematic.add(orig_id)\n",
+    "\n",
+    "    text_vars = extract_variable_names(datapoint['text'])\n",
+    "    sql_vars = extract_variable_names(datapoint['query'])\n",
+    "    \n",
+    "    instance_variables = {d['name']: d for d in datapoint['variables']}\n",
+    "    \n",
+    "    # we ensure that all the variables in the sql query and the text can be replaced\n",
+    "    # by some variable in the variable dictionary\n",
+    "    if len(text_vars - instance_variables.keys()) != 0 or len(sql_vars - instance_variables.keys()):\n",
+    "        problematic.add(orig_id)\n",
+    "        \n",
+    "    # replace the variables with the examples in the variable dictionary\n",
+    "    for text_var in text_vars:\n",
+    "        datapoint['text'] = datapoint['text'].replace(text_var, instance_variables[text_var]['example'])\n",
+    "    \n",
+    "    for sql_var in sql_vars:\n",
+    "        datapoint['query'] = datapoint['query'].replace(sql_var, instance_variables[sql_var]['example'])\n",
+    "\n",
+    "# we can trace back which datapoints do not satisfy the assumption,\n",
+    "# then go back and fix it manually\n",
+    "print(problematic)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[{'db_id': 'academic',\n",
+      "  'db_path': 'database/academic/academic.sqlite',\n",
+      "  'orig_id': ('academic', 0),\n",
+      "  'query': 'SELECT JOURNALalias0.HOMEPAGE FROM JOURNAL AS JOURNALalias0 WHERE '\n",
+      "           'JOURNALalias0.NAME = \"PVLDB\" ;',\n",
+      "  'text': 'return me the homepage of PVLDB .',\n",
+      "  'variables': [{'example': 'PVLDB',\n",
+      "                 'location': 'both',\n",
+      "                 'name': 'journal_name0',\n",
+      "                 'type': 'journal_name'}]},\n",
+      " {'db_id': 'academic',\n",
+      "  'db_path': 'database/academic/academic.sqlite',\n",
+      "  'orig_id': ('academic', 1),\n",
+      "  'query': 'SELECT AUTHORalias0.HOMEPAGE FROM AUTHOR AS AUTHORalias0 WHERE '\n",
+      "           'AUTHORalias0.NAME = \"H. V. Jagadish\" ;',\n",
+      "  'text': 'return me the homepage of H. V. Jagadish .',\n",
+      "  'variables': [{'example': 'H. V. Jagadish',\n",
+      "                 'location': 'both',\n",
+      "                 'name': 'author_name0',\n",
+      "                 'type': 'author_name'}]},\n",
+      " {'db_id': 'academic',\n",
+      "  'db_path': 'database/academic/academic.sqlite',\n",
+      "  'orig_id': ('academic', 2),\n",
+      "  'query': 'SELECT PUBLICATIONalias0.ABSTRACT FROM PUBLICATION AS '\n",
+      "           'PUBLICATIONalias0 WHERE PUBLICATIONalias0.TITLE = \"Making database '\n",
+      "           'systems usable\" ;',\n",
+      "  'text': 'return me the abstract of Making database systems usable .',\n",
+      "  'variables': [{'example': 'Making database systems usable',\n",
+      "                 'location': 'both',\n",
+      "                 'name': 'publication_title0',\n",
+      "                 'type': 'publication_title'}]},\n",
+      " {'db_id': 'academic',\n",
+      "  'db_path': 'database/academic/academic.sqlite',\n",
+      "  'orig_id': ('academic', 3),\n",
+      "  'query': 'SELECT PUBLICATIONalias0.YEAR FROM PUBLICATION AS '\n",
+      "           'PUBLICATIONalias0 WHERE PUBLICATIONalias0.TITLE = \"Making database '\n",
+      "           'systems usable\" ;',\n",
+      "  'text': 'return me the year of Making database systems usable',\n",
+      "  'variables': [{'example': 'Making database systems usable',\n",
+      "                 'location': 'both',\n",
+      "                 'name': 'publication_title0',\n",
+      "                 'type': 'publication_title'}]},\n",
+      " {'db_id': 'academic',\n",
+      "  'db_path': 'database/academic/academic.sqlite',\n",
+      "  'orig_id': ('academic', 3),\n",
+      "  'query': 'SELECT PUBLICATIONalias0.YEAR FROM PUBLICATION AS '\n",
+      "           'PUBLICATIONalias0 WHERE PUBLICATIONalias0.TITLE = \"Making database '\n",
+      "           'systems usable\" ;',\n",
+      "  'text': 'return me the year of Making database systems usable .',\n",
+      "  'variables': [{'example': 'Making database systems usable',\n",
+      "                 'location': 'both',\n",
+      "                 'name': 'publication_title0',\n",
+      "                 'type': 'publication_title'}]}]\n"
+     ]
+    }
+   ],
+   "source": [
+    "from pprint import pprint\n",
+    "\n",
+    "pprint(new_testset[:5])"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.7.4"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}

duckdb-nsql/eval/metrics/test_suite_sql_eval/classical_test.pkl ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:256f797cea587044881ceffb408185fd2dbd70682c53c788ef02f3cf59dad1ab
+size 3607809

duckdb-nsql/eval/metrics/test_suite_sql_eval/database/readme.txt ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ Please download the database from the goolge drive link mentioned in the repo-level readme and decompress in this directory.
2	+ After this step, "test-suite-sql-eval/database/atis/atis.sqlite" should be a valid file path.

duckdb-nsql/eval/metrics/test_suite_sql_eval/evaluate_classical.py ADDED Viewed

	@@ -0,0 +1,202 @@

+import argparse
+from typing import List, Dict, Any, Tuple
+import pickle as pkl
+import tqdm
+from .exec_eval import exec_on_db, result_eq
+import os
+from collections import defaultdict
+import time
+from multiprocessing import cpu_count, Pool, Manager
+from itertools import repeat
+NUM_PROCESSES = cpu_count() // 3
+if NUM_PROCESSES == 0:
+    NUM_PROCESSES = 1
+MULTIPLICATIVE_OVERHEAD = 3
+ADDITIVE_OVERHEAD = 30
+GOLD_TIMEOUT = 100
+cache_path = "cache.pkl"
+m = Manager()
+cache = m.dict()
+def load_predictions(f_path: str) -> List[str]:
+    preds = []
+    with open(f_path, "r") as in_file:
+        for l in in_file:
+            preds.append(l.strip())
+    return preds
+def acc(l, idxes=None):
+    if idxes is None:
+        idxes = [_ for _ in range(len(l))]
+    c = 0
+    for idx in idxes:
+        if l[idx]:
+            c += 1
+    return float(c) / len(idxes)
+# the input is a tuple of gold_dict, model prediction and whether to use cache
+# and teh output is whether the model prediction passes the entire test suite
+def judge(args: Tuple[Dict[str, Any], str, bool]) -> bool:
+    gold_dict, pred, use_cache = args
+    testsuite_paths = gold_dict["testsuite"]
+    gold_query = gold_dict["query"]
+    order_matters = "order by" in gold_query.lower()
+    db_path = gold_dict["db_path"]
+    # if already computed sometime before
+    # and cache allowed, directly return the result
+    k = (db_path, gold_query, pred)
+    if use_cache and k in cache:
+        return cache[k]
+    pass_all_testcase = True
+    for testcase_path in testsuite_paths:
+        start = time.time()
+        flg, gold_result = exec_on_db(testcase_path, gold_query, timeout=GOLD_TIMEOUT)
+        duration = time.time() - start
+        timeout = ADDITIVE_OVERHEAD + MULTIPLICATIVE_OVERHEAD * duration
+        if flg != "result":
+            print("Warning: executing gold query results in an exception")
+            continue
+        flg, pred_result = exec_on_db(testcase_path, pred, timeout=int(timeout))
+        if flg != "result":
+            pass_all_testcase = False
+            break
+        if not result_eq(gold_result, pred_result, order_matters):
+            pass_all_testcase = False
+            break
+    # save the results in the cache
+    if use_cache:
+        cache[k] = pass_all_testcase
+    return pass_all_testcase
+# cache is a dictionary
+# the key is a ternary tuple (empty_database_path, SQL1, SQL2)
+# the value is whether SQL1 and SQL2 are equivalent, judged by the test suites
+def load_cache() -> Dict[Tuple[str, str, str], bool]:
+    if os.path.exists(cache_path):
+        d = m.dict(pkl.load(open(cache_path, "rb")))
+        for k, v in d.items():
+            cache[k] = v
+    return cache
+# dump the cache
+def save_cache():
+    pkl.dump(dict(cache), open(cache_path, "wb"))
+def main(
+    preds: List[str],
+    gold_file: str = "classical_test.pkl",
+    verbose: bool = True,
+    num_processes: int = NUM_PROCESSES,
+    subset: str = "full",
+    use_cache: bool = True,
+) -> List[bool]:
+    gold_dicts = pkl.load(open(gold_file, "rb"))
+    if subset != "full":
+        gold_dicts = [
+            d
+            for d in gold_dicts
+            if d["db_path"] == "database/{db_id}/{db_id}.sqlite".format(db_id=subset)
+        ]
+    assert len(gold_dicts) == len(
+        preds
+    ), "number of gold and prediction should be equal"
+    group_name2idxes = defaultdict(list)
+    for idx, gold_dict in enumerate(gold_dicts):
+        group_name2idxes[gold_dict["db_id"]].append(idx)
+    with Pool(num_processes) as pool:
+        result = list(
+            tqdm.tqdm(
+                pool.imap(judge, zip(gold_dicts, preds, repeat(use_cache, len(preds)))),
+                total=len(gold_dicts),
+            )
+        )
+    if verbose:
+        print("overall accuracy: ", acc(result))
+        for group, idxes in group_name2idxes.items():
+            print("accuracy for ", group, acc(result, idxes))
+    return result
+if __name__ == "__main__":
+    start = time.time()
+    parser = argparse.ArgumentParser()
+    parser.add_argument(
+        "--gold",
+        dest="gold",
+        type=str,
+        default="classical_test.pkl",
+        help="the path to the predicted queries",
+    )
+    parser.add_argument(
+        "--pred", dest="pred", type=str, help="the path to the predicted queries"
+    )
+    parser.add_argument(
+        "--out_file", type=str, required=True, help="the output file path"
+    )
+    parser.add_argument(
+        "--num_processes", default=NUM_PROCESSES, help="number of processes to use"
+    )
+    parser.add_argument(
+        "--subset",
+        default="full",
+        choices=(
+            "atis",
+            "advising",
+            "academic",
+            "imdb",
+            "restaurants",
+            "geography",
+            "scholar",
+            "yelp",
+            "full",
+        ),
+        help="which subset to evaluate on.",
+    )
+    parser.add_argument(
+        "--disable_cache",
+        default=False,
+        action="store_true",
+        help="whether to directly apply previously computed result and cache the current results. "
+        "use this flag to disable caching.",
+    )
+    args = parser.parse_args()
+    preds = load_predictions(args.pred)
+    assert not os.path.exists(args.out_file), (
+        "output file path %s already exists" % args.out_file
+    )
+    use_cache = not args.disable_cache
+    if use_cache:
+        load_cache()
+    result = main(
+        preds=preds,
+        gold_file=args.gold,
+        verbose=True,
+        num_processes=args.num_processes,
+        subset=args.subset,
+        use_cache=use_cache,
+    )
+    pkl.dump(result, open(args.out_file, "wb"))
+    print("total time used: ", time.time() - start)
+    if use_cache:
+        save_cache()

duckdb-nsql/eval/metrics/test_suite_sql_eval/evaluation.py ADDED Viewed

	@@ -0,0 +1,1210 @@

+################################
+# val: number(float)/string(str)/sql(dict)
+# col_unit: (agg_id, col_id, isDistinct(bool))
+# val_unit: (unit_op, col_unit1, col_unit2)
+# table_unit: (table_type, col_unit/sql)
+# cond_unit: (not_op, op_id, val_unit, val1, val2)
+# condition: [cond_unit1, 'and'/'or', cond_unit2, ...]
+# sql {
+#   'select': (isDistinct(bool), [(agg_id, val_unit), (agg_id, val_unit), ...])
+#   'from': {'table_units': [table_unit1, table_unit2, ...], 'conds': condition}
+#   'where': condition
+#   'groupBy': [col_unit1, col_unit2, ...]
+#   'orderBy': ('asc'/'desc', [val_unit1, val_unit2, ...])
+#   'having': condition
+#   'limit': None/limit value
+#   'intersect': None/sql
+#   'except': None/sql
+#   'union': None/sql
+# }
+################################
+import os
+import json
+import sqlite3
+import argparse
+from .process_sql import get_schema, Schema, get_sql
+from .exec_eval import eval_exec_match
+# Flag to disable value evaluation
+LEVELS = ["easy", "medium", "hard", "duckdb", "ddl", "all"]
+TURNS = ["turn 1", "turn 2", "turn 3", "turn 4", "turn > 4"]
+PARTIAL_TYPES = [
+    "select",
+    "select(no AGG)",
+    "where",
+    "where(no OP)",
+    "group(no Having)",
+    "group",
+    "order",
+    "and/or",
+    "IUEN",
+    "keywords",
+]
+DISABLE_VALUE = True
+# Flag to disable distinct in select evaluation
+DISABLE_DISTINCT = True
+CLAUSE_KEYWORDS = (
+    "select",
+    "from",
+    "where",
+    "group",
+    "order",
+    "limit",
+    "intersect",
+    "union",
+    "except",
+)
+JOIN_KEYWORDS = ("join", "on", "as")
+WHERE_OPS = (
+    "not",
+    "between",
+    "=",
+    ">",
+    "<",
+    ">=",
+    "<=",
+    "!=",
+    "in",
+    "like",
+    "is",
+    "exists",
+)
+UNIT_OPS = ("none", "-", "+", "*", "/")
+AGG_OPS = ("none", "max", "min", "count", "sum", "avg")
+TABLE_TYPE = {
+    "sql": "sql",
+    "table_unit": "table_unit",
+}
+COND_OPS = ("and", "or")
+SQL_OPS = ("intersect", "union", "except")
+ORDER_OPS = ("desc", "asc")
+HARDNESS = {
+    "component1": ("where", "group", "order", "limit", "join", "or", "like"),
+    "component2": ("except", "union", "intersect"),
+}
+def condition_has_or(conds):
+    return "or" in conds[1::2]
+def condition_has_like(conds):
+    return WHERE_OPS.index("like") in [cond_unit[1] for cond_unit in conds[::2]]
+def condition_has_sql(conds):
+    for cond_unit in conds[::2]:
+        val1, val2 = cond_unit[3], cond_unit[4]
+        if val1 is not None and type(val1) is dict:
+            return True
+        if val2 is not None and type(val2) is dict:
+            return True
+    return False
+def val_has_op(val_unit):
+    return val_unit[0] != UNIT_OPS.index("none")
+def has_agg(unit):
+    return unit[0] != AGG_OPS.index("none")
+def accuracy(count, total):
+    if count == total:
+        return 1
+    return 0
+def recall(count, total):
+    if count == total:
+        return 1
+    return 0
+def F1(acc, rec):
+    if (acc + rec) == 0:
+        return 0
+    return (2.0 * acc * rec) / (acc + rec)
+def get_scores(count, pred_total, label_total):
+    if pred_total != label_total:
+        return 0, 0, 0
+    elif count == pred_total:
+        return 1, 1, 1
+    return 0, 0, 0
+def eval_sel(pred, label):
+    pred_sel = pred["select"][1]
+    label_sel = label["select"][1]
+    label_wo_agg = [unit[1] for unit in label_sel]
+    pred_total = len(pred_sel)
+    label_total = len(label_sel)
+    cnt = 0
+    cnt_wo_agg = 0
+    for unit in pred_sel:
+        if unit in label_sel:
+            cnt += 1
+            label_sel.remove(unit)
+        if unit[1] in label_wo_agg:
+            cnt_wo_agg += 1
+            label_wo_agg.remove(unit[1])
+    return label_total, pred_total, cnt, cnt_wo_agg
+def eval_where(pred, label):
+    pred_conds = [unit for unit in pred["where"][::2]]
+    label_conds = [unit for unit in label["where"][::2]]
+    label_wo_agg = [unit[2] for unit in label_conds]
+    pred_total = len(pred_conds)
+    label_total = len(label_conds)
+    cnt = 0
+    cnt_wo_agg = 0
+    for unit in pred_conds:
+        if unit in label_conds:
+            cnt += 1
+            label_conds.remove(unit)
+        if unit[2] in label_wo_agg:
+            cnt_wo_agg += 1
+            label_wo_agg.remove(unit[2])
+    return label_total, pred_total, cnt, cnt_wo_agg
+def eval_group(pred, label):
+    pred_cols = [unit[1] for unit in pred["groupBy"]]
+    label_cols = [unit[1] for unit in label["groupBy"]]
+    pred_total = len(pred_cols)
+    label_total = len(label_cols)
+    cnt = 0
+    pred_cols = [pred.split(".")[1] if "." in pred else pred for pred in pred_cols]
+    label_cols = [
+        label.split(".")[1] if "." in label else label for label in label_cols
+    ]
+    for col in pred_cols:
+        if col in label_cols:
+            cnt += 1
+            label_cols.remove(col)
+    return label_total, pred_total, cnt
+def eval_having(pred, label):
+    pred_total = label_total = cnt = 0
+    if len(pred["groupBy"]) > 0:
+        pred_total = 1
+    if len(label["groupBy"]) > 0:
+        label_total = 1
+    pred_cols = [unit[1] for unit in pred["groupBy"]]
+    label_cols = [unit[1] for unit in label["groupBy"]]
+    if (
+        pred_total == label_total == 1
+        and pred_cols == label_cols
+        and pred["having"] == label["having"]
+    ):
+        cnt = 1
+    return label_total, pred_total, cnt
+def eval_order(pred, label):
+    pred_total = label_total = cnt = 0
+    if len(pred["orderBy"]) > 0:
+        pred_total = 1
+    if len(label["orderBy"]) > 0:
+        label_total = 1
+    if (
+        len(label["orderBy"]) > 0
+        and pred["orderBy"] == label["orderBy"]
+        and (
+            (pred["limit"] is None and label["limit"] is None)
+            or (pred["limit"] is not None and label["limit"] is not None)
+        )
+    ):
+        cnt = 1
+    return label_total, pred_total, cnt
+def eval_and_or(pred, label):
+    pred_ao = pred["where"][1::2]
+    label_ao = label["where"][1::2]
+    pred_ao = set(pred_ao)
+    label_ao = set(label_ao)
+    if pred_ao == label_ao:
+        return 1, 1, 1
+    return len(pred_ao), len(label_ao), 0
+def get_nestedSQL(sql):
+    nested = []
+    for cond_unit in sql["from"]["conds"][::2] + sql["where"][::2] + sql["having"][::2]:
+        if type(cond_unit[3]) is dict:
+            nested.append(cond_unit[3])
+        if type(cond_unit[4]) is dict:
+            nested.append(cond_unit[4])
+    if sql["intersect"] is not None:
+        nested.append(sql["intersect"])
+    if sql["except"] is not None:
+        nested.append(sql["except"])
+    if sql["union"] is not None:
+        nested.append(sql["union"])
+    return nested
+def eval_nested(pred, label):
+    label_total = 0
+    pred_total = 0
+    cnt = 0
+    if pred is not None:
+        pred_total += 1
+    if label is not None:
+        label_total += 1
+    if pred is not None and label is not None:
+        partial_scores = Evaluator.eval_partial_match(pred, label)
+        cnt += Evaluator.eval_exact_match(pred, label, partial_scores)
+    return label_total, pred_total, cnt
+def eval_IUEN(pred, label):
+    lt1, pt1, cnt1 = eval_nested(pred["intersect"], label["intersect"])
+    lt2, pt2, cnt2 = eval_nested(pred["except"], label["except"])
+    lt3, pt3, cnt3 = eval_nested(pred["union"], label["union"])
+    label_total = lt1 + lt2 + lt3
+    pred_total = pt1 + pt2 + pt3
+    cnt = cnt1 + cnt2 + cnt3
+    return label_total, pred_total, cnt
+def get_keywords(sql):
+    res = set()
+    if len(sql["where"]) > 0:
+        res.add("where")
+    if len(sql["groupBy"]) > 0:
+        res.add("group")
+    if len(sql["having"]) > 0:
+        res.add("having")
+    if len(sql["orderBy"]) > 0:
+        res.add(sql["orderBy"][0])
+        res.add("order")
+    if sql["limit"] is not None:
+        res.add("limit")
+    if sql["except"] is not None:
+        res.add("except")
+    if sql["union"] is not None:
+        res.add("union")
+    if sql["intersect"] is not None:
+        res.add("intersect")
+    # or keyword
+    ao = sql["from"]["conds"][1::2] + sql["where"][1::2] + sql["having"][1::2]
+    if len([token for token in ao if token == "or"]) > 0:
+        res.add("or")
+    cond_units = sql["from"]["conds"][::2] + sql["where"][::2] + sql["having"][::2]
+    # not keyword
+    if len([cond_unit for cond_unit in cond_units if cond_unit[0]]) > 0:
+        res.add("not")
+    # in keyword
+    if (
+        len(
+            [
+                cond_unit
+                for cond_unit in cond_units
+                if cond_unit[1] == WHERE_OPS.index("in")
+            ]
+        )
+        > 0
+    ):
+        res.add("in")
+    # like keyword
+    if (
+        len(
+            [
+                cond_unit
+                for cond_unit in cond_units
+                if cond_unit[1] == WHERE_OPS.index("like")
+            ]
+        )
+        > 0
+    ):
+        res.add("like")
+    return res
+def eval_keywords(pred, label):
+    pred_keywords = get_keywords(pred)
+    label_keywords = get_keywords(label)
+    pred_total = len(pred_keywords)
+    label_total = len(label_keywords)
+    cnt = 0
+    for k in pred_keywords:
+        if k in label_keywords:
+            cnt += 1
+    return label_total, pred_total, cnt
+def count_agg(units):
+    return len([unit for unit in units if has_agg(unit)])
+def count_component1(sql):
+    count = 0
+    if len(sql["where"]) > 0:
+        count += 1
+    if len(sql["groupBy"]) > 0:
+        count += 1
+    if len(sql["orderBy"]) > 0:
+        count += 1
+    if sql["limit"] is not None:
+        count += 1
+    if len(sql["from"]["table_units"]) > 0:  # JOIN
+        count += len(sql["from"]["table_units"]) - 1
+    ao = sql["from"]["conds"][1::2] + sql["where"][1::2] + sql["having"][1::2]
+    count += len([token for token in ao if token == "or"])
+    cond_units = sql["from"]["conds"][::2] + sql["where"][::2] + sql["having"][::2]
+    count += len(
+        [
+            cond_unit
+            for cond_unit in cond_units
+            if cond_unit[1] == WHERE_OPS.index("like")
+        ]
+    )
+    return count
+def count_component2(sql):
+    nested = get_nestedSQL(sql)
+    return len(nested)
+def count_others(sql):
+    count = 0
+    # number of aggregation
+    agg_count = count_agg(sql["select"][1])
+    agg_count += count_agg(sql["where"][::2])
+    agg_count += count_agg(sql["groupBy"])
+    if len(sql["orderBy"]) > 0:
+        agg_count += count_agg(
+            [unit[1] for unit in sql["orderBy"][1] if unit[1]]
+            + [unit[2] for unit in sql["orderBy"][1] if unit[2]]
+        )
+    agg_count += count_agg(sql["having"])
+    if agg_count > 1:
+        count += 1
+    # number of select columns
+    if len(sql["select"][1]) > 1:
+        count += 1
+    # number of where conditions
+    if len(sql["where"]) > 1:
+        count += 1
+    # number of group by clauses
+    if len(sql["groupBy"]) > 1:
+        count += 1
+    return count
+class Evaluator:
+    """A simple evaluator"""
+    def __init__(
+        self,
+        db_dir,
+        kmaps,
+        etype,
+        plug_value,
+        keep_distinct,
+        progress_bar_for_each_datapoint
+    ):
+        self.db_dir = db_dir
+        self.kmaps = kmaps
+        self.etype = etype
+        self.plug_value = plug_value
+        self.keep_distinct = keep_distinct
+        self.progress_bar_for_each_datapoint = progress_bar_for_each_datapoint
+        self.db_paths = {}
+        self.schemas = {}
+        self.scores = {}
+        for turn in TURNS:
+            self.scores[turn] = {"count": 0, "exact": 0.0}
+            self.scores[turn]["exec"] = 0
+        for level in LEVELS:
+            self.scores[level] = {"count": 0, "partial": {}, "exact": 0.0}
+            self.scores[level]["exec"] = 0
+            for type_ in PARTIAL_TYPES:
+                self.scores[level]["partial"][type_] = {
+                    "acc": 0.0,
+                    "rec": 0.0,
+                    "f1": 0.0,
+                    "acc_count": 0,
+                    "rec_count": 0,
+                }
+    def eval_hardness(self, sql):
+        count_comp1_ = count_component1(sql)
+        count_comp2_ = count_component2(sql)
+        count_others_ = count_others(sql)
+        if count_comp1_ <= 1 and count_others_ == 0 and count_comp2_ == 0:
+            return "easy"
+        elif (count_others_ <= 2 and count_comp1_ <= 1 and count_comp2_ == 0) or (
+            count_comp1_ <= 2 and count_others_ < 2 and count_comp2_ == 0
+        ):
+            return "medium"
+        elif (
+            (count_others_ > 2 and count_comp1_ <= 2 and count_comp2_ == 0)
+            or (2 < count_comp1_ <= 3 and count_others_ <= 2 and count_comp2_ == 0)
+            or (count_comp1_ <= 1 and count_others_ == 0 and count_comp2_ <= 1)
+        ):
+            return "hard"
+        else:
+            return "extra"
+    @classmethod
+    def eval_exact_match(cls, pred, label, partial_scores):
+        for key, score in partial_scores.items():
+            if score["f1"] != 1:
+                return 0
+        if len(label["from"]["table_units"]) > 0:
+            label_tables = sorted(label["from"]["table_units"])
+            pred_tables = sorted(pred["from"]["table_units"])
+            return label_tables == pred_tables
+        return 1
+    @classmethod
+    def eval_partial_match(cls, pred, label):
+        res = {}
+        label_total, pred_total, cnt, cnt_wo_agg = eval_sel(pred, label)
+        acc, rec, f1 = get_scores(cnt, pred_total, label_total)
+        res["select"] = {
+            "acc": acc,
+            "rec": rec,
+            "f1": f1,
+            "label_total": label_total,
+            "pred_total": pred_total,
+        }
+        acc, rec, f1 = get_scores(cnt_wo_agg, pred_total, label_total)
+        res["select(no AGG)"] = {
+            "acc": acc,
+            "rec": rec,
+            "f1": f1,
+            "label_total": label_total,
+            "pred_total": pred_total,
+        }
+        label_total, pred_total, cnt, cnt_wo_agg = eval_where(pred, label)
+        acc, rec, f1 = get_scores(cnt, pred_total, label_total)
+        res["where"] = {
+            "acc": acc,
+            "rec": rec,
+            "f1": f1,
+            "label_total": label_total,
+            "pred_total": pred_total,
+        }
+        acc, rec, f1 = get_scores(cnt_wo_agg, pred_total, label_total)
+        res["where(no OP)"] = {
+            "acc": acc,
+            "rec": rec,
+            "f1": f1,
+            "label_total": label_total,
+            "pred_total": pred_total,
+        }
+        label_total, pred_total, cnt = eval_group(pred, label)
+        acc, rec, f1 = get_scores(cnt, pred_total, label_total)
+        res["group(no Having)"] = {
+            "acc": acc,
+            "rec": rec,
+            "f1": f1,
+            "label_total": label_total,
+            "pred_total": pred_total,
+        }
+        label_total, pred_total, cnt = eval_having(pred, label)
+        acc, rec, f1 = get_scores(cnt, pred_total, label_total)
+        res["group"] = {
+            "acc": acc,
+            "rec": rec,
+            "f1": f1,
+            "label_total": label_total,
+            "pred_total": pred_total,
+        }
+        label_total, pred_total, cnt = eval_order(pred, label)
+        acc, rec, f1 = get_scores(cnt, pred_total, label_total)
+        res["order"] = {
+            "acc": acc,
+            "rec": rec,
+            "f1": f1,
+            "label_total": label_total,
+            "pred_total": pred_total,
+        }
+        label_total, pred_total, cnt = eval_and_or(pred, label)
+        acc, rec, f1 = get_scores(cnt, pred_total, label_total)
+        res["and/or"] = {
+            "acc": acc,
+            "rec": rec,
+            "f1": f1,
+            "label_total": label_total,
+            "pred_total": pred_total,
+        }
+        label_total, pred_total, cnt = eval_IUEN(pred, label)
+        acc, rec, f1 = get_scores(cnt, pred_total, label_total)
+        res["IUEN"] = {
+            "acc": acc,
+            "rec": rec,
+            "f1": f1,
+            "label_total": label_total,
+            "pred_total": pred_total,
+        }
+        label_total, pred_total, cnt = eval_keywords(pred, label)
+        acc, rec, f1 = get_scores(cnt, pred_total, label_total)
+        res["keywords"] = {
+            "acc": acc,
+            "rec": rec,
+            "f1": f1,
+            "label_total": label_total,
+            "pred_total": pred_total,
+        }
+        return res
+    def evaluate_one(self, db_name, gold, predicted, setup_sql,
+                     validate_sql, turn_scores, idx, category):
+        if db_name not in self.db_paths:
+            db_path = os.path.join(self.db_dir, db_name, db_name + ".duckdb")
+            self.db_paths[db_name] = db_path
+            self.schemas[db_name] = Schema(get_schema(db_path))
+        if idx > 3:
+            idx = "> 4"
+        else:
+            idx += 1
+        turn_id = "turn " + str(idx)
+        hardness = category
+        self.scores[turn_id]["count"] += 1
+        self.scores[hardness]["count"] += 1
+        self.scores["all"]["count"] += 1
+        if self.etype in ['all', 'match']:
+            schema = self.schemas[db_name]
+            g_sql = get_sql(schema, gold)
+            self.scores[hardness]["count"] += 1
+            try:
+                p_sql = get_sql(schema, predicted)
+            except:
+                # If p_sql is not valid, then we will use an empty sql to evaluate with the correct sql
+                p_sql = {
+                    "except": None,
+                    "from": {"conds": [], "table_units": []},
+                    "groupBy": [],
+                    "having": [],
+                    "intersect": None,
+                    "limit": None,
+                    "orderBy": [],
+                    "select": [False, []],
+                    "union": None,
+                    "where": [],
+                }
+        if self.etype in ["all", "exec"]:
+            exec_score = eval_exec_match(
+                db=self.db_paths[db_name],
+                p_str=predicted,
+                g_str=gold,
+                setup_sql=setup_sql,
+                validate_sql=validate_sql,
+                plug_value=self.plug_value,
+                keep_distinct=self.keep_distinct,
+                progress_bar_for_each_datapoint=self.progress_bar_for_each_datapoint,
+            )
+            if exec_score:
+                self.scores[hardness]["exec"] += 1
+                self.scores[turn_id]["exec"] += 1
+                self.scores["all"]["exec"] += 1
+                turn_scores["exec"].append(1)
+            else:
+                turn_scores["exec"].append(0)
+        if self.etype in ["all", "match"]:
+            # rebuild sql for value evaluation
+            kmap = self.kmaps[db_name]
+            g_valid_col_units = build_valid_col_units(
+                g_sql["from"]["table_units"], schema
+            )
+            g_sql = rebuild_sql_val(g_sql)
+            g_sql = rebuild_sql_col(g_valid_col_units, g_sql, kmap)
+            p_valid_col_units = build_valid_col_units(
+                p_sql["from"]["table_units"], schema
+            )
+            p_sql = rebuild_sql_val(p_sql)
+            p_sql = rebuild_sql_col(p_valid_col_units, p_sql, kmap)
+            partial_scores = self.eval_partial_match(p_sql, g_sql)
+            exact_score = self.eval_exact_match(p_sql, g_sql, partial_scores)
+            if exact_score == 0:
+                turn_scores["exact"].append(0)
+                print("{} pred: {}".format(hardness, predicted))
+                print("{} gold: {}".format(hardness, gold))
+                print("")
+            else:
+                turn_scores["exact"].append(1)
+            self.scores[turn_id]["exact"] += exact_score
+            self.scores[hardness]["exact"] += exact_score
+            self.scores["all"]["exact"] += exact_score
+            for type_ in PARTIAL_TYPES:
+                if partial_scores[type_]["pred_total"] > 0:
+                    self.scores[hardness]["partial"][type_]["acc"] += partial_scores[
+                        type_
+                    ]["acc"]
+                    self.scores[hardness]["partial"][type_]["acc_count"] += 1
+                if partial_scores[type_]["label_total"] > 0:
+                    self.scores[hardness]["partial"][type_]["rec"] += partial_scores[
+                        type_
+                    ]["rec"]
+                    self.scores[hardness]["partial"][type_]["rec_count"] += 1
+                self.scores[hardness]["partial"][type_]["f1"] += partial_scores[type_][
+                    "f1"
+                ]
+                if partial_scores[type_]["pred_total"] > 0:
+                    self.scores["all"]["partial"][type_]["acc"] += partial_scores[type_][
+                        "acc"
+                    ]
+                    self.scores["all"]["partial"][type_]["acc_count"] += 1
+                if partial_scores[type_]["label_total"] > 0:
+                    self.scores["all"]["partial"][type_]["rec"] += partial_scores[type_][
+                        "rec"
+                    ]
+                    self.scores["all"]["partial"][type_]["rec_count"] += 1
+                self.scores["all"]["partial"][type_]["f1"] += partial_scores[type_]["f1"]
+        result = {
+            "predictSQL": predicted,
+            "goldSQL": gold,
+        }
+        if self.etype in ['all', 'match']:
+            result.update({
+                "hardness": hardness,
+                "exact": exact_score,
+                "partial": partial_scores,
+            })
+        if self.etype in ['all', 'exec']:
+            result['exec'] = exec_score
+        return result
+    def finalize(self):
+        scores = self.scores
+        for turn in TURNS:
+            if scores[turn]["count"] == 0:
+                continue
+            if self.etype in ["all", "exec"]:
+                scores[turn]["exec"] /= scores[turn]["count"]
+            if self.etype in ["all", "match"]:
+                scores[turn]["exact"] /= scores[turn]["count"]
+        for level in LEVELS:
+            if scores[level]["count"] == 0:
+                continue
+            if self.etype in ["all", "exec"]:
+                scores[level]["exec"] /= scores[level]["count"]
+            if self.etype in ["all", "match"]:
+                scores[level]["exact"] /= scores[level]["count"]
+                for type_ in PARTIAL_TYPES:
+                    if scores[level]["partial"][type_]["acc_count"] == 0:
+                        scores[level]["partial"][type_]["acc"] = 0
+                    else:
+                        scores[level]["partial"][type_]["acc"] = (
+                                scores[level]["partial"][type_]["acc"]
+                                / scores[level]["partial"][type_]["acc_count"]
+                                * 1.0
+                        )
+                    if scores[level]["partial"][type_]["rec_count"] == 0:
+                        scores[level]["partial"][type_]["rec"] = 0
+                    else:
+                        scores[level]["partial"][type_]["rec"] = (
+                                scores[level]["partial"][type_]["rec"]
+                                / scores[level]["partial"][type_]["rec_count"]
+                                * 1.0
+                        )
+                    if (
+                            scores[level]["partial"][type_]["acc"] == 0
+                            and scores[level]["partial"][type_]["rec"] == 0
+                    ):
+                        scores[level]["partial"][type_]["f1"] = 1
+                    else:
+                        scores[level]["partial"][type_]["f1"] = (
+                                2.0
+                                * scores[level]["partial"][type_]["acc"]
+                                * scores[level]["partial"][type_]["rec"]
+                                / (
+                                        scores[level]["partial"][type_]["rec"]
+                                        + scores[level]["partial"][type_]["acc"]
+                                )
+                        )
+def isValidSQL(sql, db):
+    conn = sqlite3.connect(db)
+    cursor = conn.cursor()
+    try:
+        cursor.execute(sql)
+    except:
+        return False
+    return True
+def print_formated_s(row_name, l, element_format):
+    template = "{:20} " + " ".join([element_format] * len(l))
+    print(template.format(row_name, *l))
+def print_scores(scores, etype, include_turn_acc=True):
+    turns = TURNS
+    levels = ["easy", "medium", "hard", "duckdb", "ddl", "all"]
+    if include_turn_acc:
+        levels.append("joint_all")
+    partial_types = PARTIAL_TYPES
+    print_formated_s("", levels, "{:20}")
+    counts = [scores[level]["count"] for level in levels]
+    print_formated_s("count", counts, "{:<20d}")
+    if etype in ["all", "exec"]:
+        print("=====================   EXECUTION ACCURACY     =====================")
+        exec_scores = [scores[level]["exec"] for level in levels]
+        print_formated_s("execution", exec_scores, "{:<20.3f}")
+    if etype in ["all", "match"]:
+        print("\n====================== EXACT MATCHING ACCURACY =====================")
+        exact_scores = [scores[level]["exact"] for level in levels]
+        print_formated_s("exact match", exact_scores, "{:<20.3f}")
+        print("\n---------------------PARTIAL MATCHING ACCURACY----------------------")
+        for type_ in partial_types:
+            this_scores = [scores[level]["partial"][type_]["acc"] for level in levels]
+            print_formated_s(type_, this_scores, "{:<20.3f}")
+        print("---------------------- PARTIAL MATCHING RECALL ----------------------")
+        for type_ in partial_types:
+            this_scores = [scores[level]["partial"][type_]["rec"] for level in levels]
+            print_formated_s(type_, this_scores, "{:<20.3f}")
+        print("---------------------- PARTIAL MATCHING F1 --------------------------")
+        for type_ in partial_types:
+            this_scores = [scores[level]["partial"][type_]["f1"] for level in levels]
+            print_formated_s(type_, this_scores, "{:<20.3f}")
+    if include_turn_acc:
+        print()
+        print()
+        print_formated_s("", turns, "{:20}")
+        counts = [scores[turn]["count"] for turn in turns]
+        print_formated_s("count", counts, "{:<20d}")
+        if etype in ["all", "exec"]:
+            print(
+                "=====================   TURN EXECUTION ACCURACY     ====================="
+            )
+            exec_scores = [scores[turn]["exec"] for turn in turns]
+            print_formated_s("execution", exec_scores, "{:<20.3f}")
+        if etype in ["all", "match"]:
+            print(
+                "\n====================== TURN EXACT MATCHING ACCURACY ====================="
+            )
+            exact_scores = [scores[turn]["exact"] for turn in turns]
+            print_formated_s("exact match", exact_scores, "{:<20.3f}")
+def evaluate(
+    gold,
+    predict,
+    db_dir,
+    etype,
+    kmaps,
+    plug_value,
+    keep_distinct,
+    progress_bar_for_each_datapoint,
+):
+    with open(gold) as f:
+        glist = []
+        gseq_one = []
+        for l in f.readlines():
+            if len(l.strip()) == 0:
+                glist.append(gseq_one)
+                gseq_one = []
+            else:
+                lstrip = l.strip().split("\t")
+                gseq_one.append(lstrip)
+        # include the last session
+        # this was previously ignored in the SParC evaluation script
+        # which might lead to slight differences in scores
+        if len(gseq_one) != 0:
+            glist.append(gseq_one)
+    # spider formatting indicates that there is only one "single turn"
+    # do not report "turn accuracy" for SPIDER
+    include_turn_acc = len(glist) > 1
+    with open(predict) as f:
+        plist = []
+        pseq_one = []
+        for l in f.readlines():
+            if len(l.strip()) == 0:
+                plist.append(pseq_one)
+                pseq_one = []
+            else:
+                pseq_one.append(l.strip().split("\t"))
+        if len(pseq_one) != 0:
+            plist.append(pseq_one)
+    assert len(plist) == len(glist), "number of sessions must equal"
+    evaluator = Evaluator(db_dir, kmaps, etype, plug_value, keep_distinct, progress_bar_for_each_datapoint)
+    results = []
+    for i, (p, g) in enumerate(zip(plist, glist)):
+        if (i + 1) % 10 == 0:
+            print("Evaluating %dth prediction" % (i + 1))
+        evaluator.scores["joint_all"]["count"] += 1
+        turn_scores = {"exec": [], "exact": []}
+        for idx, pg in enumerate(zip(p, g)):
+            p, g = pg
+            p_str = p[0]
+            p_str = p_str.replace("value", "1")
+            g_str, db_name = g
+            results.append(evaluator.evaluate_one(db_name, g_str, p_str, "", "", turn_scores, idx, ""))
+        if all(v == 1 for v in turn_scores["exec"]):
+            evaluator.scores["joint_all"]["exec"] += 1
+        if all(v == 1 for v in turn_scores["exact"]):
+            evaluator.scores["joint_all"]["exact"] += 1
+    evaluator.finalize()
+    print_scores(evaluator.scores, etype, include_turn_acc=include_turn_acc)
+    return {
+        "per_item": results,
+        "total_scores": evaluator.scores
+    }
+# Rebuild SQL functions for value evaluation
+def rebuild_cond_unit_val(cond_unit):
+    if cond_unit is None or not DISABLE_VALUE:
+        return cond_unit
+    not_op, op_id, val_unit, val1, val2 = cond_unit
+    if type(val1) is not dict:
+        val1 = None
+    else:
+        val1 = rebuild_sql_val(val1)
+    if type(val2) is not dict:
+        val2 = None
+    else:
+        val2 = rebuild_sql_val(val2)
+    return not_op, op_id, val_unit, val1, val2
+def rebuild_condition_val(condition):
+    if condition is None or not DISABLE_VALUE:
+        return condition
+    res = []
+    for idx, it in enumerate(condition):
+        if idx % 2 == 0:
+            res.append(rebuild_cond_unit_val(it))
+        else:
+            res.append(it)
+    return res
+def rebuild_sql_val(sql):
+    if sql is None or not DISABLE_VALUE:
+        return sql
+    sql["from"]["conds"] = rebuild_condition_val(sql["from"]["conds"])
+    sql["having"] = rebuild_condition_val(sql["having"])
+    sql["where"] = rebuild_condition_val(sql["where"])
+    sql["intersect"] = rebuild_sql_val(sql["intersect"])
+    sql["except"] = rebuild_sql_val(sql["except"])
+    sql["union"] = rebuild_sql_val(sql["union"])
+    return sql
+# Rebuild SQL functions for foreign key evaluation
+def build_valid_col_units(table_units, schema):
+    col_ids = [
+        table_unit[1]
+        for table_unit in table_units
+        if table_unit[0] == TABLE_TYPE["table_unit"]
+    ]
+    prefixs = [col_id[:-2] for col_id in col_ids]
+    valid_col_units = []
+    for value in schema.idMap.values():
+        if "." in value and value[: value.index(".")] in prefixs:
+            valid_col_units.append(value)
+    return valid_col_units
+def rebuild_col_unit_col(valid_col_units, col_unit, kmap):
+    if col_unit is None:
+        return col_unit
+    agg_id, col_id, distinct = col_unit
+    if col_id in kmap and col_id in valid_col_units:
+        col_id = kmap[col_id]
+    if DISABLE_DISTINCT:
+        distinct = None
+    return agg_id, col_id, distinct
+def rebuild_val_unit_col(valid_col_units, val_unit, kmap):
+    if val_unit is None:
+        return val_unit
+    unit_op, col_unit1, col_unit2 = val_unit
+    col_unit1 = rebuild_col_unit_col(valid_col_units, col_unit1, kmap)
+    col_unit2 = rebuild_col_unit_col(valid_col_units, col_unit2, kmap)
+    return unit_op, col_unit1, col_unit2
+def rebuild_table_unit_col(valid_col_units, table_unit, kmap):
+    if table_unit is None:
+        return table_unit
+    table_type, col_unit_or_sql = table_unit
+    if isinstance(col_unit_or_sql, tuple):
+        col_unit_or_sql = rebuild_col_unit_col(valid_col_units, col_unit_or_sql, kmap)
+    return table_type, col_unit_or_sql
+def rebuild_cond_unit_col(valid_col_units, cond_unit, kmap):
+    if cond_unit is None:
+        return cond_unit
+    not_op, op_id, val_unit, val1, val2 = cond_unit
+    val_unit = rebuild_val_unit_col(valid_col_units, val_unit, kmap)
+    return not_op, op_id, val_unit, val1, val2
+def rebuild_condition_col(valid_col_units, condition, kmap):
+    for idx in range(len(condition)):
+        if idx % 2 == 0:
+            condition[idx] = rebuild_cond_unit_col(
+                valid_col_units, condition[idx], kmap
+            )
+    return condition
+def rebuild_select_col(valid_col_units, sel, kmap):
+    if sel is None:
+        return sel
+    distinct, _list = sel
+    new_list = []
+    for it in _list:
+        agg_id, val_unit = it
+        new_list.append((agg_id, rebuild_val_unit_col(valid_col_units, val_unit, kmap)))
+    if DISABLE_DISTINCT:
+        distinct = None
+    return distinct, new_list
+def rebuild_from_col(valid_col_units, from_, kmap):
+    if from_ is None:
+        return from_
+    from_["table_units"] = [
+        rebuild_table_unit_col(valid_col_units, table_unit, kmap)
+        for table_unit in from_["table_units"]
+    ]
+    from_["conds"] = rebuild_condition_col(valid_col_units, from_["conds"], kmap)
+    return from_
+def rebuild_group_by_col(valid_col_units, group_by, kmap):
+    if group_by is None:
+        return group_by
+    return [
+        rebuild_col_unit_col(valid_col_units, col_unit, kmap) for col_unit in group_by
+    ]
+def rebuild_order_by_col(valid_col_units, order_by, kmap):
+    if order_by is None or len(order_by) == 0:
+        return order_by
+    direction, val_units = order_by
+    new_val_units = [
+        rebuild_val_unit_col(valid_col_units, val_unit, kmap) for val_unit in val_units
+    ]
+    return direction, new_val_units
+def rebuild_sql_col(valid_col_units, sql, kmap):
+    if sql is None:
+        return sql
+    sql["select"] = rebuild_select_col(valid_col_units, sql["select"], kmap)
+    sql["from"] = rebuild_from_col(valid_col_units, sql["from"], kmap)
+    sql["where"] = rebuild_condition_col(valid_col_units, sql["where"], kmap)
+    sql["groupBy"] = rebuild_group_by_col(valid_col_units, sql["groupBy"], kmap)
+    sql["orderBy"] = rebuild_order_by_col(valid_col_units, sql["orderBy"], kmap)
+    sql["having"] = rebuild_condition_col(valid_col_units, sql["having"], kmap)
+    sql["intersect"] = rebuild_sql_col(valid_col_units, sql["intersect"], kmap)
+    sql["except"] = rebuild_sql_col(valid_col_units, sql["except"], kmap)
+    sql["union"] = rebuild_sql_col(valid_col_units, sql["union"], kmap)
+    return sql
+def build_foreign_key_map(entry):
+    cols_orig = entry["column_names_original"]
+    tables_orig = entry["table_names_original"]
+    # rebuild cols corresponding to idmap in Schema
+    cols = []
+    for col_orig in cols_orig:
+        if col_orig[0] >= 0:
+            t = tables_orig[col_orig[0]]
+            c = col_orig[1]
+            cols.append("__" + t.lower() + "." + c.lower() + "__")
+        else:
+            cols.append("__all__")
+    def keyset_in_list(k1, k2, k_list):
+        for k_set in k_list:
+            if k1 in k_set or k2 in k_set:
+                return k_set
+        new_k_set = set()
+        k_list.append(new_k_set)
+        return new_k_set
+    foreign_key_list = []
+    foreign_keys = entry["foreign_keys"]
+    for fkey in foreign_keys:
+        key1, key2 = fkey
+        key_set = keyset_in_list(key1, key2, foreign_key_list)
+        key_set.add(key1)
+        key_set.add(key2)
+    foreign_key_map = {}
+    for key_set in foreign_key_list:
+        sorted_list = sorted(list(key_set))
+        midx = sorted_list[0]
+        for idx in sorted_list:
+            foreign_key_map[cols[idx]] = cols[midx]
+    return foreign_key_map
+def build_foreign_key_map_from_json(table):
+    with open(table) as f:
+        data = json.load(f)
+    tables = {}
+    for entry in data:
+        tables[entry["db_id"]] = build_foreign_key_map(entry)
+    return tables
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser()
+    parser.add_argument(
+        "--gold", dest="gold", type=str, help="the path to the gold queries"
+    )
+    parser.add_argument(
+        "--pred", dest="pred", type=str, help="the path to the predicted queries"
+    )
+    parser.add_argument(
+        "--db",
+        dest="db",
+        type=str,
+        help="the directory that contains all the databases and test suites",
+    )
+    parser.add_argument(
+        "--table", dest="table", type=str, help="the tables.json schema file"
+    )
+    parser.add_argument(
+        "--etype",
+        dest="etype",
+        type=str,
+        default="exec",
+        help="evaluation type, exec for test suite accuracy, match for the original exact set match accuracy",
+        choices=("all", "exec", "match"),
+    )
+    parser.add_argument(
+        "--plug_value",
+        default=False,
+        action="store_true",
+        help="whether to plug in the gold value into the predicted query; suitable if your model does not predict values.",
+    )
+    parser.add_argument(
+        "--keep_distinct",
+        default=False,
+        action="store_true",
+        help="whether to keep distinct keyword during evaluation. default is false.",
+    )
+    parser.add_argument(
+        "--progress_bar_for_each_datapoint",
+        default=False,
+        action="store_true",
+        help="whether to print progress bar of running test inputs for each datapoint",
+    )
+    args = parser.parse_args()
+    # only evaluting exact match needs this argument
+    kmaps = None
+    if args.etype in ["all", "match"]:
+        assert (
+            args.table is not None
+        ), "table argument must be non-None if exact set match is evaluated"
+        kmaps = build_foreign_key_map_from_json(args.table)
+    evaluate(
+        args.gold,
+        args.pred,
+        args.db,
+        args.etype,
+        kmaps,
+        args.plug_value,
+        args.keep_distinct,
+        args.progress_bar_for_each_datapoint,
+    )

duckdb-nsql/eval/metrics/test_suite_sql_eval/evaluation_examples/academic_gold.txt ADDED Viewed

	@@ -0,0 +1,196 @@

+SELECT JOURNALalias0.HOMEPAGE FROM JOURNAL AS JOURNALalias0 WHERE JOURNALalias0.NAME = "PVLDB" ;
+SELECT AUTHORalias0.HOMEPAGE FROM AUTHOR AS AUTHORalias0 WHERE AUTHORalias0.NAME = "H. V. Jagadish" ;
+SELECT PUBLICATIONalias0.ABSTRACT FROM PUBLICATION AS PUBLICATIONalias0 WHERE PUBLICATIONalias0.TITLE = "Making database systems usable" ;
+SELECT PUBLICATIONalias0.YEAR FROM PUBLICATION AS PUBLICATIONalias0 WHERE PUBLICATIONalias0.TITLE = "Making database systems usable" ;
+SELECT PUBLICATIONalias0.YEAR FROM PUBLICATION AS PUBLICATIONalias0 WHERE PUBLICATIONalias0.TITLE = "Making database systems usable" ;
+SELECT PUBLICATIONalias0.TITLE FROM PUBLICATION AS PUBLICATIONalias0 WHERE PUBLICATIONalias0.YEAR > 2000 ;
+SELECT CONFERENCEalias0.HOMEPAGE FROM CONFERENCE AS CONFERENCEalias0 WHERE CONFERENCEalias0.NAME = "VLDB" ;
+SELECT KEYWORDalias0.KEYWORD FROM KEYWORD AS KEYWORDalias0 ;
+SELECT ORGANIZATIONalias0.NAME FROM ORGANIZATION AS ORGANIZATIONalias0 ;
+SELECT ORGANIZATIONalias0.NAME FROM ORGANIZATION AS ORGANIZATIONalias0 WHERE ORGANIZATIONalias0.CONTINENT = "North America" ;
+SELECT ORGANIZATIONalias0.HOMEPAGE FROM ORGANIZATION AS ORGANIZATIONalias0 WHERE ORGANIZATIONalias0.NAME = "University of Michigan" ;
+SELECT PUBLICATIONalias0.REFERENCE_NUM FROM PUBLICATION AS PUBLICATIONalias0 WHERE PUBLICATIONalias0.TITLE = "Making database systems usable" ;
+SELECT PUBLICATIONalias0.REFERENCE_NUM FROM PUBLICATION AS PUBLICATIONalias0 WHERE PUBLICATIONalias0.TITLE = "Making database systems usable" ;
+SELECT PUBLICATIONalias0.CITATION_NUM FROM PUBLICATION AS PUBLICATIONalias0 WHERE PUBLICATIONalias0.TITLE = "Making database systems usable" ;
+SELECT PUBLICATIONalias0.CITATION_NUM FROM PUBLICATION AS PUBLICATIONalias0 WHERE PUBLICATIONalias0.TITLE = "Making database systems usable" ;
+SELECT PUBLICATIONalias0.TITLE FROM PUBLICATION AS PUBLICATIONalias0 WHERE PUBLICATIONalias0.CITATION_NUM > 200 ;
+SELECT AUTHORalias0.NAME FROM AUTHOR AS AUTHORalias0 , JOURNAL AS JOURNALalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE JOURNALalias0.NAME = "PVLDB" AND PUBLICATIONalias0.JID = JOURNALalias0.JID AND PUBLICATIONalias0.YEAR = 2010 AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID ;
+SELECT AUTHORalias0.NAME FROM AUTHOR AS AUTHORalias0 , JOURNAL AS JOURNALalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE JOURNALalias0.NAME = "PVLDB" AND PUBLICATIONalias0.JID = JOURNALalias0.JID AND PUBLICATIONalias0.YEAR > 2010 AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID ;
+SELECT AUTHORalias0.NAME FROM AUTHOR AS AUTHORalias0 , CONFERENCE AS CONFERENCEalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE CONFERENCEalias0.NAME = "VLDB" AND PUBLICATIONalias0.CID = CONFERENCEalias0.CID AND PUBLICATIONalias0.YEAR = 2002 AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID ;
+SELECT AUTHORalias0.NAME FROM AUTHOR AS AUTHORalias0 , CONFERENCE AS CONFERENCEalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE CONFERENCEalias0.NAME = "VLDB" AND PUBLICATIONalias0.CID = CONFERENCEalias0.CID AND PUBLICATIONalias0.YEAR < 2002 AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID ;
+SELECT AUTHORalias0.NAME FROM AUTHOR AS AUTHORalias0 , CONFERENCE AS CONFERENCEalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE CONFERENCEalias0.NAME = "VLDB" AND PUBLICATIONalias0.CID = CONFERENCEalias0.CID AND PUBLICATIONalias0.YEAR < 2002 AND PUBLICATIONalias0.YEAR > 1995 AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID ;
+SELECT AUTHORalias0.NAME FROM AUTHOR AS AUTHORalias0 , CONFERENCE AS CONFERENCEalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE ( PUBLICATIONalias0.YEAR < 1995 OR PUBLICATIONalias0.YEAR > 2002 ) AND CONFERENCEalias0.NAME = "VLDB" AND PUBLICATIONalias0.CID = CONFERENCEalias0.CID AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID ;
+SELECT DOMAINalias0.NAME FROM DOMAIN AS DOMAINalias0 , DOMAIN_JOURNAL AS DOMAIN_JOURNALalias0 , JOURNAL AS JOURNALalias0 WHERE DOMAINalias0.DID = DOMAIN_JOURNALalias0.DID AND JOURNALalias0.JID = DOMAIN_JOURNALalias0.JID AND JOURNALalias0.NAME = "PVLDB" ;
+SELECT AUTHORalias0.NAME FROM AUTHOR AS AUTHORalias0 , JOURNAL AS JOURNALalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE JOURNALalias0.NAME = "PVLDB" AND PUBLICATIONalias0.JID = JOURNALalias0.JID AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID ;
+SELECT ORGANIZATIONalias0.NAME FROM AUTHOR AS AUTHORalias0 , ORGANIZATION AS ORGANIZATIONalias0 WHERE AUTHORalias0.NAME = "H. V. Jagadish" AND ORGANIZATIONalias0.OID = AUTHORalias0.OID ;
+SELECT CONFERENCEalias0.NAME FROM AUTHOR AS AUTHORalias0 , CONFERENCE AS CONFERENCEalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE AUTHORalias0.NAME = "H. V. Jagadish" AND PUBLICATIONalias0.CID = CONFERENCEalias0.CID AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID ;
+SELECT JOURNALalias0.NAME FROM AUTHOR AS AUTHORalias0 , JOURNAL AS JOURNALalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE AUTHORalias0.NAME = "H. V. Jagadish" AND PUBLICATIONalias0.JID = JOURNALalias0.JID AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID ;
+SELECT DOMAINalias0.NAME FROM AUTHOR AS AUTHORalias0 , DOMAIN AS DOMAINalias0 , DOMAIN_AUTHOR AS DOMAIN_AUTHORalias0 WHERE AUTHORalias0.NAME = "H. V. Jagadish" AND DOMAIN_AUTHORalias0.AID = AUTHORalias0.AID AND DOMAINalias0.DID = DOMAIN_AUTHORalias0.DID ;
+SELECT AUTHORalias0.NAME FROM AUTHOR AS AUTHORalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE PUBLICATIONalias0.TITLE = "Making database systems usable" AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID ;
+SELECT CONFERENCEalias0.NAME FROM CONFERENCE AS CONFERENCEalias0 , PUBLICATION AS PUBLICATIONalias0 WHERE PUBLICATIONalias0.CID = CONFERENCEalias0.CID AND PUBLICATIONalias0.TITLE = "Making database systems usable" ;
+SELECT PUBLICATIONalias0.TITLE FROM AUTHOR AS AUTHORalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE AUTHORalias0.NAME = "H. V. Jagadish" AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID ;
+SELECT PUBLICATIONalias0.TITLE FROM CONFERENCE AS CONFERENCEalias0 , PUBLICATION AS PUBLICATIONalias0 WHERE CONFERENCEalias0.NAME = "VLDB" AND PUBLICATIONalias0.CID = CONFERENCEalias0.CID ;
+SELECT PUBLICATIONalias0.TITLE FROM JOURNAL AS JOURNALalias0 , PUBLICATION AS PUBLICATIONalias0 WHERE JOURNALalias0.NAME = "PVLDB" AND PUBLICATIONalias0.JID = JOURNALalias0.JID ;
+SELECT PUBLICATIONalias0.TITLE FROM JOURNAL AS JOURNALalias0 , PUBLICATION AS PUBLICATIONalias0 WHERE JOURNALalias0.NAME = "PVLDB" AND PUBLICATIONalias0.JID = JOURNALalias0.JID AND PUBLICATIONalias0.YEAR > 2000 ;
+SELECT PUBLICATIONalias0.TITLE FROM CONFERENCE AS CONFERENCEalias0 , PUBLICATION AS PUBLICATIONalias0 WHERE CONFERENCEalias0.NAME = "VLDB" AND PUBLICATIONalias0.CID = CONFERENCEalias0.CID AND PUBLICATIONalias0.YEAR > 2000 ;
+SELECT PUBLICATIONalias0.TITLE FROM AUTHOR AS AUTHORalias0 , JOURNAL AS JOURNALalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE AUTHORalias0.NAME = "H. V. Jagadish" AND JOURNALalias0.NAME = "PVLDB" AND PUBLICATIONalias0.JID = JOURNALalias0.JID AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID ;
+SELECT PUBLICATIONalias0.TITLE FROM AUTHOR AS AUTHORalias0 , CONFERENCE AS CONFERENCEalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE AUTHORalias0.NAME = "H. V. Jagadish" AND CONFERENCEalias0.NAME = "VLDB" AND PUBLICATIONalias0.CID = CONFERENCEalias0.CID AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID ;
+SELECT PUBLICATIONalias0.TITLE FROM AUTHOR AS AUTHORalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE AUTHORalias0.NAME = "H. V. Jagadish" AND PUBLICATIONalias0.YEAR > 2000 AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID ;
+SELECT PUBLICATIONalias0.TITLE FROM AUTHOR AS AUTHORalias0 , JOURNAL AS JOURNALalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE AUTHORalias0.NAME = "H. V. Jagadish" AND JOURNALalias0.NAME = "PVLDB" AND PUBLICATIONalias0.JID = JOURNALalias0.JID AND PUBLICATIONalias0.YEAR > 2000 AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID ;
+SELECT PUBLICATIONalias0.TITLE FROM AUTHOR AS AUTHORalias0 , CONFERENCE AS CONFERENCEalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE AUTHORalias0.NAME = "H. V. Jagadish" AND CONFERENCEalias0.NAME = "VLDB" AND PUBLICATIONalias0.CID = CONFERENCEalias0.CID AND PUBLICATIONalias0.YEAR > 2000 AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID ;
+SELECT DOMAINalias0.NAME FROM CONFERENCE AS CONFERENCEalias0 , DOMAIN AS DOMAINalias0 , DOMAIN_CONFERENCE AS DOMAIN_CONFERENCEalias0 WHERE CONFERENCEalias0.NAME = "VLDB" AND DOMAIN_CONFERENCEalias0.CID = CONFERENCEalias0.CID AND DOMAINalias0.DID = DOMAIN_CONFERENCEalias0.DID ;
+SELECT AUTHORalias0.NAME FROM AUTHOR AS AUTHORalias0 , CONFERENCE AS CONFERENCEalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE CONFERENCEalias0.NAME = "VLDB" AND PUBLICATIONalias0.CID = CONFERENCEalias0.CID AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID ;
+SELECT KEYWORDalias0.KEYWORD FROM DOMAIN AS DOMAINalias0 , DOMAIN_KEYWORD AS DOMAIN_KEYWORDalias0 , KEYWORD AS KEYWORDalias0 WHERE DOMAINalias0.DID = DOMAIN_KEYWORDalias0.DID AND DOMAINalias0.NAME = "Databases" AND KEYWORDalias0.KID = DOMAIN_KEYWORDalias0.KID ;
+SELECT PUBLICATIONalias0.TITLE FROM KEYWORD AS KEYWORDalias0 , PUBLICATION AS PUBLICATIONalias0 , PUBLICATION_KEYWORD AS PUBLICATION_KEYWORDalias0 WHERE KEYWORDalias0.KEYWORD = "Natural Language" AND PUBLICATION_KEYWORDalias0.KID = KEYWORDalias0.KID AND PUBLICATIONalias0.PID = PUBLICATION_KEYWORDalias0.PID ;
+SELECT KEYWORDalias0.KEYWORD FROM KEYWORD AS KEYWORDalias0 , PUBLICATION AS PUBLICATIONalias0 , PUBLICATION_KEYWORD AS PUBLICATION_KEYWORDalias0 WHERE PUBLICATION_KEYWORDalias0.KID = KEYWORDalias0.KID AND PUBLICATIONalias0.PID = PUBLICATION_KEYWORDalias0.PID AND PUBLICATIONalias0.TITLE = "Making database systems usable" ;
+SELECT KEYWORDalias0.KEYWORD FROM AUTHOR AS AUTHORalias0 , KEYWORD AS KEYWORDalias0 , PUBLICATION AS PUBLICATIONalias0 , PUBLICATION_KEYWORD AS PUBLICATION_KEYWORDalias0 , WRITES AS WRITESalias0 WHERE AUTHORalias0.NAME = "H. V. Jagadish" AND PUBLICATION_KEYWORDalias0.KID = KEYWORDalias0.KID AND PUBLICATIONalias0.PID = PUBLICATION_KEYWORDalias0.PID AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID ;
+SELECT KEYWORDalias0.KEYWORD FROM CONFERENCE AS CONFERENCEalias0 , KEYWORD AS KEYWORDalias0 , PUBLICATION AS PUBLICATIONalias0 , PUBLICATION_KEYWORD AS PUBLICATION_KEYWORDalias0 WHERE CONFERENCEalias0.NAME = "VLDB" AND PUBLICATION_KEYWORDalias0.KID = KEYWORDalias0.KID AND PUBLICATIONalias0.CID = CONFERENCEalias0.CID AND PUBLICATIONalias0.PID = PUBLICATION_KEYWORDalias0.PID ;
+SELECT KEYWORDalias0.KEYWORD FROM JOURNAL AS JOURNALalias0 , KEYWORD AS KEYWORDalias0 , PUBLICATION AS PUBLICATIONalias0 , PUBLICATION_KEYWORD AS PUBLICATION_KEYWORDalias0 WHERE JOURNALalias0.NAME = "PVLDB" AND PUBLICATION_KEYWORDalias0.KID = KEYWORDalias0.KID AND PUBLICATIONalias0.JID = JOURNALalias0.JID AND PUBLICATIONalias0.PID = PUBLICATION_KEYWORDalias0.PID ;
+SELECT KEYWORDalias0.KEYWORD FROM AUTHOR AS AUTHORalias0 , KEYWORD AS KEYWORDalias0 , ORGANIZATION AS ORGANIZATIONalias0 , PUBLICATION AS PUBLICATIONalias0 , PUBLICATION_KEYWORD AS PUBLICATION_KEYWORDalias0 , WRITES AS WRITESalias0 WHERE ORGANIZATIONalias0.NAME = "University of Michigan" AND ORGANIZATIONalias0.OID = AUTHORalias0.OID AND PUBLICATION_KEYWORDalias0.KID = KEYWORDalias0.KID AND PUBLICATIONalias0.PID = PUBLICATION_KEYWORDalias0.PID AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID ;
+SELECT PUBLICATIONalias0.TITLE FROM AUTHOR AS AUTHORalias0 , KEYWORD AS KEYWORDalias0 , PUBLICATION AS PUBLICATIONalias0 , PUBLICATION_KEYWORD AS PUBLICATION_KEYWORDalias0 , WRITES AS WRITESalias0 WHERE AUTHORalias0.NAME = "H. V. Jagadish" AND KEYWORDalias0.KEYWORD = "User Study" AND PUBLICATION_KEYWORDalias0.KID = KEYWORDalias0.KID AND PUBLICATIONalias0.PID = PUBLICATION_KEYWORDalias0.PID AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID ;
+SELECT PUBLICATIONalias0.TITLE FROM JOURNAL AS JOURNALalias0 , KEYWORD AS KEYWORDalias0 , PUBLICATION AS PUBLICATIONalias0 , PUBLICATION_KEYWORD AS PUBLICATION_KEYWORDalias0 WHERE JOURNALalias0.NAME = "PVLDB" AND KEYWORDalias0.KEYWORD = "Keyword search" AND PUBLICATION_KEYWORDalias0.KID = KEYWORDalias0.KID AND PUBLICATIONalias0.JID = JOURNALalias0.JID AND PUBLICATIONalias0.PID = PUBLICATION_KEYWORDalias0.PID ;
+SELECT PUBLICATIONalias0.TITLE FROM CONFERENCE AS CONFERENCEalias0 , KEYWORD AS KEYWORDalias0 , PUBLICATION AS PUBLICATIONalias0 , PUBLICATION_KEYWORD AS PUBLICATION_KEYWORDalias0 WHERE CONFERENCEalias0.NAME = "VLDB" AND KEYWORDalias0.KEYWORD = "Information Retrieval" AND PUBLICATION_KEYWORDalias0.KID = KEYWORDalias0.KID AND PUBLICATIONalias0.CID = CONFERENCEalias0.CID AND PUBLICATIONalias0.PID = PUBLICATION_KEYWORDalias0.PID ;
+SELECT AUTHORalias0.NAME FROM AUTHOR AS AUTHORalias0 , KEYWORD AS KEYWORDalias0 , PUBLICATION AS PUBLICATIONalias0 , PUBLICATION_KEYWORD AS PUBLICATION_KEYWORDalias0 , WRITES AS WRITESalias0 WHERE KEYWORDalias0.KEYWORD = "Relational Database" AND PUBLICATION_KEYWORDalias0.KID = KEYWORDalias0.KID AND PUBLICATIONalias0.PID = PUBLICATION_KEYWORDalias0.PID AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID ;
+SELECT ORGANIZATIONalias0.NAME FROM AUTHOR AS AUTHORalias0 , DOMAIN AS DOMAINalias0 , DOMAIN_AUTHOR AS DOMAIN_AUTHORalias0 , ORGANIZATION AS ORGANIZATIONalias0 WHERE DOMAIN_AUTHORalias0.AID = AUTHORalias0.AID AND DOMAINalias0.DID = DOMAIN_AUTHORalias0.DID AND DOMAINalias0.NAME = "Databases" AND ORGANIZATIONalias0.OID = AUTHORalias0.OID ;
+SELECT ORGANIZATIONalias0.NAME FROM AUTHOR AS AUTHORalias0 , DOMAIN AS DOMAINalias0 , DOMAIN_AUTHOR AS DOMAIN_AUTHORalias0 , ORGANIZATION AS ORGANIZATIONalias0 WHERE DOMAIN_AUTHORalias0.AID = AUTHORalias0.AID AND DOMAINalias0.DID = DOMAIN_AUTHORalias0.DID AND DOMAINalias0.NAME = "Databases" AND ORGANIZATIONalias0.CONTINENT = "North America" AND ORGANIZATIONalias0.OID = AUTHORalias0.OID ;
+SELECT AUTHORalias0.NAME FROM AUTHOR AS AUTHORalias0 , ORGANIZATION AS ORGANIZATIONalias0 WHERE ORGANIZATIONalias0.NAME = "University of Michigan" AND ORGANIZATIONalias0.OID = AUTHORalias0.OID ;
+SELECT AUTHORalias0.NAME FROM AUTHOR AS AUTHORalias0 , DOMAIN AS DOMAINalias0 , DOMAIN_AUTHOR AS DOMAIN_AUTHORalias0 , ORGANIZATION AS ORGANIZATIONalias0 WHERE DOMAIN_AUTHORalias0.AID = AUTHORalias0.AID AND DOMAINalias0.DID = DOMAIN_AUTHORalias0.DID AND DOMAINalias0.NAME = "Databases" AND ORGANIZATIONalias0.NAME = "University of Michigan" AND ORGANIZATIONalias0.OID = AUTHORalias0.OID ;
+SELECT PUBLICATIONalias0.TITLE FROM AUTHOR AS AUTHORalias0 , ORGANIZATION AS ORGANIZATIONalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE ORGANIZATIONalias0.NAME = "University of Michigan" AND ORGANIZATIONalias0.OID = AUTHORalias0.OID AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID ;
+SELECT PUBLICATIONalias0.TITLE FROM AUTHOR AS AUTHORalias0 , ORGANIZATION AS ORGANIZATIONalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE ORGANIZATIONalias0.NAME = "University of Michigan" AND ORGANIZATIONalias0.OID = AUTHORalias0.OID AND PUBLICATIONalias0.YEAR > 2000 AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID ;
+SELECT PUBLICATIONalias0.TITLE FROM AUTHOR AS AUTHORalias0 , CONFERENCE AS CONFERENCEalias0 , ORGANIZATION AS ORGANIZATIONalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE CONFERENCEalias0.NAME = "VLDB" AND ORGANIZATIONalias0.NAME = "University of Michigan" AND ORGANIZATIONalias0.OID = AUTHORalias0.OID AND PUBLICATIONalias0.CID = CONFERENCEalias0.CID AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID ;
+SELECT PUBLICATIONalias0.TITLE FROM AUTHOR AS AUTHORalias0 , JOURNAL AS JOURNALalias0 , ORGANIZATION AS ORGANIZATIONalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE JOURNALalias0.NAME = "PVLDB" AND ORGANIZATIONalias0.NAME = "University of Michigan" AND ORGANIZATIONalias0.OID = AUTHORalias0.OID AND PUBLICATIONalias0.JID = JOURNALalias0.JID AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID ;
+SELECT PUBLICATIONalias0.TITLE FROM AUTHOR AS AUTHORalias0 , JOURNAL AS JOURNALalias0 , ORGANIZATION AS ORGANIZATIONalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE JOURNALalias0.NAME = "PVLDB" AND ORGANIZATIONalias0.NAME = "University of Michigan" AND ORGANIZATIONalias0.OID = AUTHORalias0.OID AND PUBLICATIONalias0.JID = JOURNALalias0.JID AND PUBLICATIONalias0.YEAR > 2000 AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID ;
+SELECT PUBLICATIONalias0.TITLE FROM AUTHOR AS AUTHORalias0 , JOURNAL AS JOURNALalias0 , ORGANIZATION AS ORGANIZATIONalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE JOURNALalias0.NAME = "PVLDB" AND ORGANIZATIONalias0.NAME = "University of Michigan" AND ORGANIZATIONalias0.OID = AUTHORalias0.OID AND PUBLICATIONalias0.JID = JOURNALalias0.JID AND PUBLICATIONalias0.YEAR > 2000 AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID ;
+SELECT PUBLICATIONalias0.TITLE FROM DOMAIN AS DOMAINalias0 , DOMAIN_PUBLICATION AS DOMAIN_PUBLICATIONalias0 , PUBLICATION AS PUBLICATIONalias0 WHERE DOMAINalias0.DID = DOMAIN_PUBLICATIONalias0.DID AND DOMAINalias0.NAME = "Databases" AND PUBLICATIONalias0.CITATION_NUM > 200 AND PUBLICATIONalias0.PID = DOMAIN_PUBLICATIONalias0.PID ;
+SELECT PUBLICATIONalias0.TITLE FROM JOURNAL AS JOURNALalias0 , PUBLICATION AS PUBLICATIONalias0 WHERE JOURNALalias0.NAME = "PVLDB" AND PUBLICATIONalias0.CITATION_NUM > 200 AND PUBLICATIONalias0.JID = JOURNALalias0.JID ;
+SELECT PUBLICATIONalias0.TITLE FROM CONFERENCE AS CONFERENCEalias0 , PUBLICATION AS PUBLICATIONalias0 WHERE CONFERENCEalias0.NAME = "VLDB" AND PUBLICATIONalias0.CID = CONFERENCEalias0.CID AND PUBLICATIONalias0.CITATION_NUM > 200 ;
+SELECT PUBLICATIONalias0.TITLE FROM AUTHOR AS AUTHORalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE AUTHORalias0.NAME = "H. V. Jagadish" AND PUBLICATIONalias0.CITATION_NUM > 200 AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID ;
+SELECT PUBLICATIONalias0.TITLE FROM AUTHOR AS AUTHORalias0 , JOURNAL AS JOURNALalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE AUTHORalias0.NAME = "H. V. Jagadish" AND JOURNALalias0.NAME = "PVLDB" AND PUBLICATIONalias0.CITATION_NUM > 200 AND PUBLICATIONalias0.JID = JOURNALalias0.JID AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID ;
+SELECT PUBLICATIONalias0.TITLE FROM AUTHOR AS AUTHORalias0 , CONFERENCE AS CONFERENCEalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE AUTHORalias0.NAME = "H. V. Jagadish" AND CONFERENCEalias0.NAME = "VLDB" AND PUBLICATIONalias0.CID = CONFERENCEalias0.CID AND PUBLICATIONalias0.CITATION_NUM > 200 AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID ;
+SELECT PUBLICATIONalias0.TITLE FROM PUBLICATION AS PUBLICATIONalias0 WHERE PUBLICATIONalias0.CITATION_NUM > 200 AND PUBLICATIONalias0.YEAR > 2000 ;
+SELECT PUBLICATIONalias0.TITLE FROM DOMAIN AS DOMAINalias0 , DOMAIN_PUBLICATION AS DOMAIN_PUBLICATIONalias0 , PUBLICATION AS PUBLICATIONalias0 WHERE DOMAINalias0.DID = DOMAIN_PUBLICATIONalias0.DID AND DOMAINalias0.NAME = "Databases" AND PUBLICATIONalias0.CITATION_NUM > 200 AND PUBLICATIONalias0.PID = DOMAIN_PUBLICATIONalias0.PID AND PUBLICATIONalias0.YEAR > 2000 ;
+SELECT PUBLICATIONalias0.TITLE FROM JOURNAL AS JOURNALalias0 , PUBLICATION AS PUBLICATIONalias0 WHERE JOURNALalias0.NAME = "PVLDB" AND PUBLICATIONalias0.CITATION_NUM > 200 AND PUBLICATIONalias0.JID = JOURNALalias0.JID AND PUBLICATIONalias0.YEAR > 2000 ;
+SELECT PUBLICATIONalias0.TITLE FROM CONFERENCE AS CONFERENCEalias0 , PUBLICATION AS PUBLICATIONalias0 WHERE CONFERENCEalias0.NAME = "VLDB" AND PUBLICATIONalias0.CID = CONFERENCEalias0.CID AND PUBLICATIONalias0.CITATION_NUM > 200 AND PUBLICATIONalias0.YEAR > 2000 ;
+SELECT COUNT( DISTINCT ( CONFERENCEalias0.NAME ) ) FROM AUTHOR AS AUTHORalias0 , CONFERENCE AS CONFERENCEalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE AUTHORalias0.NAME = "H. V. Jagadish" AND PUBLICATIONalias0.CID = CONFERENCEalias0.CID AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID ;
+SELECT COUNT( DISTINCT ( JOURNALalias0.NAME ) ) FROM AUTHOR AS AUTHORalias0 , JOURNAL AS JOURNALalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE AUTHORalias0.NAME = "H. V. Jagadish" AND PUBLICATIONalias0.JID = JOURNALalias0.JID AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID ;
+SELECT COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) , PUBLICATIONalias0.YEAR FROM AUTHOR AS AUTHORalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE AUTHORalias0.NAME = "H. V. Jagadish" AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID GROUP BY PUBLICATIONalias0.YEAR ;
+SELECT COUNT( DISTINCT ( AUTHORalias0.NAME ) ) FROM AUTHOR AS AUTHORalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE PUBLICATIONalias0.TITLE = "Making database systems usable" AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID ;
+SELECT PUBLICATIONalias0.YEAR , SUM( PUBLICATIONalias0.CITATION_NUM ) FROM PUBLICATION AS PUBLICATIONalias0 WHERE PUBLICATIONalias0.TITLE = "Making database systems usable" GROUP BY PUBLICATIONalias0.YEAR ;
+SELECT COUNT( DISTINCT ( PUBLICATIONalias1.TITLE ) ) FROM CITE AS CITEalias0 , PUBLICATION AS PUBLICATIONalias0 , PUBLICATION AS PUBLICATIONalias1 WHERE PUBLICATIONalias0.PID = CITEalias0.CITED AND PUBLICATIONalias0.TITLE = "Making database systems usable" AND PUBLICATIONalias1.PID = CITEalias0.CITING AND PUBLICATIONalias1.YEAR < 2010 ;
+SELECT COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) FROM AUTHOR AS AUTHORalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE AUTHORalias0.NAME = "H. V. Jagadish" AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID ;
+SELECT COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) FROM CONFERENCE AS CONFERENCEalias0 , PUBLICATION AS PUBLICATIONalias0 WHERE CONFERENCEalias0.NAME = "VLDB" AND PUBLICATIONalias0.CID = CONFERENCEalias0.CID ;
+SELECT COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) FROM JOURNAL AS JOURNALalias0 , PUBLICATION AS PUBLICATIONalias0 WHERE JOURNALalias0.NAME = "PVLDB" AND PUBLICATIONalias0.JID = JOURNALalias0.JID ;
+SELECT COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) FROM PUBLICATION AS PUBLICATIONalias0 WHERE PUBLICATIONalias0.YEAR > 2000 ;
+SELECT COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) FROM JOURNAL AS JOURNALalias0 , PUBLICATION AS PUBLICATIONalias0 WHERE JOURNALalias0.NAME = "PVLDB" AND PUBLICATIONalias0.JID = JOURNALalias0.JID AND PUBLICATIONalias0.YEAR > 2000 ;
+SELECT COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) FROM CONFERENCE AS CONFERENCEalias0 , PUBLICATION AS PUBLICATIONalias0 WHERE CONFERENCEalias0.NAME = "VLDB" AND PUBLICATIONalias0.CID = CONFERENCEalias0.CID AND PUBLICATIONalias0.YEAR > 2000 ;
+SELECT COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) FROM AUTHOR AS AUTHORalias0 , JOURNAL AS JOURNALalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE AUTHORalias0.NAME = "H. V. Jagadish" AND JOURNALalias0.NAME = "PVLDB" AND PUBLICATIONalias0.JID = JOURNALalias0.JID AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID ;
+SELECT COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) FROM AUTHOR AS AUTHORalias0 , CONFERENCE AS CONFERENCEalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE AUTHORalias0.NAME = "H. V. Jagadish" AND CONFERENCEalias0.NAME = "VLDB" AND PUBLICATIONalias0.CID = CONFERENCEalias0.CID AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID ;
+SELECT COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) FROM AUTHOR AS AUTHORalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE AUTHORalias0.NAME = "H. V. Jagadish" AND PUBLICATIONalias0.YEAR > 2000 AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID ;
+SELECT COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) FROM AUTHOR AS AUTHORalias0 , JOURNAL AS JOURNALalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE AUTHORalias0.NAME = "H. V. Jagadish" AND JOURNALalias0.NAME = "PVLDB" AND PUBLICATIONalias0.JID = JOURNALalias0.JID AND PUBLICATIONalias0.YEAR > 2000 AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID ;
+SELECT COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) FROM AUTHOR AS AUTHORalias0 , CONFERENCE AS CONFERENCEalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE AUTHORalias0.NAME = "H. V. Jagadish" AND CONFERENCEalias0.NAME = "VLDB" AND PUBLICATIONalias0.CID = CONFERENCEalias0.CID AND PUBLICATIONalias0.YEAR > 2000 AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID ;
+SELECT COUNT( DISTINCT ( KEYWORDalias0.KEYWORD ) ) FROM KEYWORD AS KEYWORDalias0 ;
+SELECT COUNT( DISTINCT ( KEYWORDalias0.KEYWORD ) ) FROM DOMAIN AS DOMAINalias0 , DOMAIN_KEYWORD AS DOMAIN_KEYWORDalias0 , KEYWORD AS KEYWORDalias0 WHERE DOMAINalias0.DID = DOMAIN_KEYWORDalias0.DID AND DOMAINalias0.NAME = "Databases" AND KEYWORDalias0.KID = DOMAIN_KEYWORDalias0.KID ;
+SELECT COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) FROM KEYWORD AS KEYWORDalias0 , PUBLICATION AS PUBLICATIONalias0 , PUBLICATION_KEYWORD AS PUBLICATION_KEYWORDalias0 WHERE KEYWORDalias0.KEYWORD = "Natural Language" AND PUBLICATION_KEYWORDalias0.KID = KEYWORDalias0.KID AND PUBLICATIONalias0.PID = PUBLICATION_KEYWORDalias0.PID ;
+SELECT COUNT( DISTINCT ( KEYWORDalias0.KEYWORD ) ) FROM KEYWORD AS KEYWORDalias0 , PUBLICATION AS PUBLICATIONalias0 , PUBLICATION_KEYWORD AS PUBLICATION_KEYWORDalias0 WHERE PUBLICATION_KEYWORDalias0.KID = KEYWORDalias0.KID AND PUBLICATIONalias0.PID = PUBLICATION_KEYWORDalias0.PID AND PUBLICATIONalias0.TITLE = "Making database systems usable" ;
+SELECT COUNT( DISTINCT ( KEYWORDalias0.KEYWORD ) ) FROM AUTHOR AS AUTHORalias0 , KEYWORD AS KEYWORDalias0 , PUBLICATION AS PUBLICATIONalias0 , PUBLICATION_KEYWORD AS PUBLICATION_KEYWORDalias0 , WRITES AS WRITESalias0 WHERE AUTHORalias0.NAME = "H. V. Jagadish" AND PUBLICATION_KEYWORDalias0.KID = KEYWORDalias0.KID AND PUBLICATIONalias0.PID = PUBLICATION_KEYWORDalias0.PID AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID ;
+SELECT COUNT( DISTINCT ( KEYWORDalias0.KEYWORD ) ) FROM CONFERENCE AS CONFERENCEalias0 , KEYWORD AS KEYWORDalias0 , PUBLICATION AS PUBLICATIONalias0 , PUBLICATION_KEYWORD AS PUBLICATION_KEYWORDalias0 WHERE CONFERENCEalias0.NAME = "VLDB" AND PUBLICATION_KEYWORDalias0.KID = KEYWORDalias0.KID AND PUBLICATIONalias0.CID = CONFERENCEalias0.CID AND PUBLICATIONalias0.PID = PUBLICATION_KEYWORDalias0.PID ;
+SELECT COUNT( DISTINCT ( KEYWORDalias0.KEYWORD ) ) FROM JOURNAL AS JOURNALalias0 , KEYWORD AS KEYWORDalias0 , PUBLICATION AS PUBLICATIONalias0 , PUBLICATION_KEYWORD AS PUBLICATION_KEYWORDalias0 WHERE JOURNALalias0.NAME = "PVLDB" AND PUBLICATION_KEYWORDalias0.KID = KEYWORDalias0.KID AND PUBLICATIONalias0.JID = JOURNALalias0.JID AND PUBLICATIONalias0.PID = PUBLICATION_KEYWORDalias0.PID ;
+SELECT COUNT( DISTINCT ( KEYWORDalias0.KEYWORD ) ) FROM AUTHOR AS AUTHORalias0 , KEYWORD AS KEYWORDalias0 , ORGANIZATION AS ORGANIZATIONalias0 , PUBLICATION AS PUBLICATIONalias0 , PUBLICATION_KEYWORD AS PUBLICATION_KEYWORDalias0 , WRITES AS WRITESalias0 WHERE ORGANIZATIONalias0.NAME = "University of Michigan" AND ORGANIZATIONalias0.OID = AUTHORalias0.OID AND PUBLICATION_KEYWORDalias0.KID = KEYWORDalias0.KID AND PUBLICATIONalias0.PID = PUBLICATION_KEYWORDalias0.PID AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID ;
+SELECT COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) FROM AUTHOR AS AUTHORalias0 , KEYWORD AS KEYWORDalias0 , PUBLICATION AS PUBLICATIONalias0 , PUBLICATION_KEYWORD AS PUBLICATION_KEYWORDalias0 , WRITES AS WRITESalias0 WHERE AUTHORalias0.NAME = "H. V. Jagadish" AND KEYWORDalias0.KEYWORD = "User Study" AND PUBLICATION_KEYWORDalias0.KID = KEYWORDalias0.KID AND PUBLICATIONalias0.PID = PUBLICATION_KEYWORDalias0.PID AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID ;
+SELECT COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) FROM JOURNAL AS JOURNALalias0 , KEYWORD AS KEYWORDalias0 , PUBLICATION AS PUBLICATIONalias0 , PUBLICATION_KEYWORD AS PUBLICATION_KEYWORDalias0 WHERE JOURNALalias0.NAME = "PVLDB" AND KEYWORDalias0.KEYWORD = "Keyword search" AND PUBLICATION_KEYWORDalias0.KID = KEYWORDalias0.KID AND PUBLICATIONalias0.JID = JOURNALalias0.JID AND PUBLICATIONalias0.PID = PUBLICATION_KEYWORDalias0.PID ;
+SELECT COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) FROM CONFERENCE AS CONFERENCEalias0 , KEYWORD AS KEYWORDalias0 , PUBLICATION AS PUBLICATIONalias0 , PUBLICATION_KEYWORD AS PUBLICATION_KEYWORDalias0 WHERE CONFERENCEalias0.NAME = "VLDB" AND KEYWORDalias0.KEYWORD = "Information Retrieval" AND PUBLICATION_KEYWORDalias0.KID = KEYWORDalias0.KID AND PUBLICATIONalias0.CID = CONFERENCEalias0.CID AND PUBLICATIONalias0.PID = PUBLICATION_KEYWORDalias0.PID ;
+SELECT COUNT( DISTINCT ( AUTHORalias0.NAME ) ) FROM AUTHOR AS AUTHORalias0 , KEYWORD AS KEYWORDalias0 , PUBLICATION AS PUBLICATIONalias0 , PUBLICATION_KEYWORD AS PUBLICATION_KEYWORDalias0 , WRITES AS WRITESalias0 WHERE KEYWORDalias0.KEYWORD = "Relational Database" AND PUBLICATION_KEYWORDalias0.KID = KEYWORDalias0.KID AND PUBLICATIONalias0.PID = PUBLICATION_KEYWORDalias0.PID AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID ;
+SELECT SUM( PUBLICATIONalias0.CITATION_NUM ) FROM KEYWORD AS KEYWORDalias0 , PUBLICATION AS PUBLICATIONalias0 , PUBLICATION_KEYWORD AS PUBLICATION_KEYWORDalias0 WHERE KEYWORDalias0.KEYWORD = "Natural Language" AND PUBLICATION_KEYWORDalias0.KID = KEYWORDalias0.KID AND PUBLICATIONalias0.PID = PUBLICATION_KEYWORDalias0.PID ;
+SELECT COUNT( DISTINCT ( ORGANIZATIONalias0.NAME ) ) FROM ORGANIZATION AS ORGANIZATIONalias0 ;
+SELECT COUNT( DISTINCT ( ORGANIZATIONalias0.NAME ) ) FROM ORGANIZATION AS ORGANIZATIONalias0 WHERE ORGANIZATIONalias0.CONTINENT = "North America" ;
+SELECT COUNT( DISTINCT ( ORGANIZATIONalias0.NAME ) ) FROM AUTHOR AS AUTHORalias0 , DOMAIN AS DOMAINalias0 , DOMAIN_AUTHOR AS DOMAIN_AUTHORalias0 , ORGANIZATION AS ORGANIZATIONalias0 WHERE DOMAIN_AUTHORalias0.AID = AUTHORalias0.AID AND DOMAINalias0.DID = DOMAIN_AUTHORalias0.DID AND DOMAINalias0.NAME = "Databases" AND ORGANIZATIONalias0.OID = AUTHORalias0.OID ;
+SELECT COUNT( DISTINCT ( ORGANIZATIONalias0.NAME ) ) FROM AUTHOR AS AUTHORalias0 , DOMAIN AS DOMAINalias0 , DOMAIN_AUTHOR AS DOMAIN_AUTHORalias0 , ORGANIZATION AS ORGANIZATIONalias0 WHERE DOMAIN_AUTHORalias0.AID = AUTHORalias0.AID AND DOMAINalias0.DID = DOMAIN_AUTHORalias0.DID AND DOMAINalias0.NAME = "Databases" AND ORGANIZATIONalias0.CONTINENT = "North America" AND ORGANIZATIONalias0.OID = AUTHORalias0.OID ;
+SELECT COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) FROM AUTHOR AS AUTHORalias0 , ORGANIZATION AS ORGANIZATIONalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE ORGANIZATIONalias0.NAME = "University of Michigan" AND ORGANIZATIONalias0.OID = AUTHORalias0.OID AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID ;
+SELECT COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) FROM AUTHOR AS AUTHORalias0 , DOMAIN AS DOMAINalias0 , DOMAIN_AUTHOR AS DOMAIN_AUTHORalias0 , DOMAIN_PUBLICATION AS DOMAIN_PUBLICATIONalias0 , ORGANIZATION AS ORGANIZATIONalias0 , PUBLICATION AS PUBLICATIONalias0 WHERE DOMAIN_AUTHORalias0.AID = AUTHORalias0.AID AND DOMAINalias0.DID = DOMAIN_AUTHORalias0.DID AND DOMAINalias0.DID = DOMAIN_PUBLICATIONalias0.DID AND DOMAINalias0.NAME = "Databases" AND ORGANIZATIONalias0.NAME = "University of Michigan" AND ORGANIZATIONalias0.OID = AUTHORalias0.OID AND PUBLICATIONalias0.PID = DOMAIN_PUBLICATIONalias0.PID ;
+SELECT COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) FROM AUTHOR AS AUTHORalias0 , ORGANIZATION AS ORGANIZATIONalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE ORGANIZATIONalias0.NAME = "University of Michigan" AND ORGANIZATIONalias0.OID = AUTHORalias0.OID AND PUBLICATIONalias0.YEAR > 2000 AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID ;
+SELECT COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) FROM AUTHOR AS AUTHORalias0 , CONFERENCE AS CONFERENCEalias0 , ORGANIZATION AS ORGANIZATIONalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE CONFERENCEalias0.NAME = "VLDB" AND ORGANIZATIONalias0.NAME = "University of Michigan" AND ORGANIZATIONalias0.OID = AUTHORalias0.OID AND PUBLICATIONalias0.CID = CONFERENCEalias0.CID AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID ;
+SELECT COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) FROM AUTHOR AS AUTHORalias0 , JOURNAL AS JOURNALalias0 , ORGANIZATION AS ORGANIZATIONalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE JOURNALalias0.NAME = "PVLDB" AND ORGANIZATIONalias0.NAME = "University of Michigan" AND ORGANIZATIONalias0.OID = AUTHORalias0.OID AND PUBLICATIONalias0.JID = JOURNALalias0.JID AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID ;
+SELECT COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) FROM AUTHOR AS AUTHORalias0 , JOURNAL AS JOURNALalias0 , ORGANIZATION AS ORGANIZATIONalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE JOURNALalias0.NAME = "PVLDB" AND ORGANIZATIONalias0.NAME = "University of Michigan" AND ORGANIZATIONalias0.OID = AUTHORalias0.OID AND PUBLICATIONalias0.JID = JOURNALalias0.JID AND PUBLICATIONalias0.YEAR > 2000 AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID ;
+SELECT COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) FROM AUTHOR AS AUTHORalias0 , JOURNAL AS JOURNALalias0 , ORGANIZATION AS ORGANIZATIONalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE JOURNALalias0.NAME = "PVLDB" AND ORGANIZATIONalias0.NAME = "University of Michigan" AND ORGANIZATIONalias0.OID = AUTHORalias0.OID AND PUBLICATIONalias0.JID = JOURNALalias0.JID AND PUBLICATIONalias0.YEAR > 2000 AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID ;
+SELECT SUM( PUBLICATIONalias0.CITATION_NUM ) FROM AUTHOR AS AUTHORalias0 , ORGANIZATION AS ORGANIZATIONalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE ORGANIZATIONalias0.NAME = "University of Michigan" AND ORGANIZATIONalias0.OID = AUTHORalias0.OID AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID ;
+SELECT COUNT( DISTINCT ( AUTHORalias0.NAME ) ) FROM AUTHOR AS AUTHORalias0 , ORGANIZATION AS ORGANIZATIONalias0 WHERE ORGANIZATIONalias0.NAME = "University of Michigan" AND ORGANIZATIONalias0.OID = AUTHORalias0.OID ;
+SELECT COUNT( DISTINCT ( AUTHORalias0.NAME ) ) FROM AUTHOR AS AUTHORalias0 , DOMAIN AS DOMAINalias0 , DOMAIN_AUTHOR AS DOMAIN_AUTHORalias0 , ORGANIZATION AS ORGANIZATIONalias0 WHERE DOMAIN_AUTHORalias0.AID = AUTHORalias0.AID AND DOMAINalias0.DID = DOMAIN_AUTHORalias0.DID AND DOMAINalias0.NAME = "Databases" AND ORGANIZATIONalias0.NAME = "University of Michigan" AND ORGANIZATIONalias0.OID = AUTHORalias0.OID ;
+SELECT COUNT( DISTINCT ( AUTHORalias0.NAME ) ) FROM AUTHOR AS AUTHORalias0 , JOURNAL AS JOURNALalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE JOURNALalias0.NAME = "PVLDB" AND PUBLICATIONalias0.JID = JOURNALalias0.JID AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID ;
+SELECT COUNT( DISTINCT ( AUTHORalias0.NAME ) ) FROM AUTHOR AS AUTHORalias0 , CONFERENCE AS CONFERENCEalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE CONFERENCEalias0.NAME = "VLDB" AND PUBLICATIONalias0.CID = CONFERENCEalias0.CID AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID ;
+SELECT COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) FROM JOURNAL AS JOURNALalias0 , PUBLICATION AS PUBLICATIONalias0 WHERE JOURNALalias0.NAME = "PVLDB" AND PUBLICATIONalias0.JID = JOURNALalias0.JID AND PUBLICATIONalias0.YEAR < 2000 ;
+SELECT COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) FROM CONFERENCE AS CONFERENCEalias0 , PUBLICATION AS PUBLICATIONalias0 WHERE CONFERENCEalias0.NAME = "VLDB" AND PUBLICATIONalias0.CID = CONFERENCEalias0.CID AND PUBLICATIONalias0.YEAR < 2000 ;
+SELECT SUM( PUBLICATIONalias0.CITATION_NUM ) FROM JOURNAL AS JOURNALalias0 , PUBLICATION AS PUBLICATIONalias0 WHERE JOURNALalias0.NAME = "PVLDB" AND PUBLICATIONalias0.JID = JOURNALalias0.JID ;
+SELECT PUBLICATIONalias0.CITATION_NUM FROM JOURNAL AS JOURNALalias0 , PUBLICATION AS PUBLICATIONalias0 WHERE JOURNALalias0.NAME = "PVLDB" AND PUBLICATIONalias0.JID = JOURNALalias0.JID ;
+SELECT SUM( PUBLICATIONalias0.CITATION_NUM ) FROM JOURNAL AS JOURNALalias0 , PUBLICATION AS PUBLICATIONalias0 WHERE JOURNALalias0.NAME = "PVLDB" AND PUBLICATIONalias0.JID = JOURNALalias0.JID AND PUBLICATIONalias0.YEAR = 2005 ;
+SELECT SUM( PUBLICATIONalias0.CITATION_NUM ) FROM JOURNAL AS JOURNALalias0 , PUBLICATION AS PUBLICATIONalias0 WHERE JOURNALalias0.NAME = "PVLDB" AND PUBLICATIONalias0.JID = JOURNALalias0.JID AND PUBLICATIONalias0.YEAR < 2005 ;
+SELECT PUBLICATIONalias0.YEAR , SUM( PUBLICATIONalias0.CITATION_NUM ) FROM JOURNAL AS JOURNALalias0 , PUBLICATION AS PUBLICATIONalias0 WHERE JOURNALalias0.NAME = "PVLDB" AND PUBLICATIONalias0.JID = JOURNALalias0.JID GROUP BY PUBLICATIONalias0.YEAR ;
+SELECT COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) , PUBLICATIONalias0.YEAR FROM JOURNAL AS JOURNALalias0 , PUBLICATION AS PUBLICATIONalias0 WHERE JOURNALalias0.NAME = "PVLDB" AND PUBLICATIONalias0.JID = JOURNALalias0.JID GROUP BY PUBLICATIONalias0.YEAR ;
+SELECT SUM( PUBLICATIONalias0.CITATION_NUM ) FROM CONFERENCE AS CONFERENCEalias0 , PUBLICATION AS PUBLICATIONalias0 WHERE CONFERENCEalias0.NAME = "VLDB" AND PUBLICATIONalias0.CID = CONFERENCEalias0.CID ;
+SELECT PUBLICATIONalias0.CITATION_NUM FROM CONFERENCE AS CONFERENCEalias0 , PUBLICATION AS PUBLICATIONalias0 WHERE CONFERENCEalias0.NAME = "VLDB" AND PUBLICATIONalias0.CID = CONFERENCEalias0.CID ;
+SELECT SUM( PUBLICATIONalias0.CITATION_NUM ) FROM CONFERENCE AS CONFERENCEalias0 , PUBLICATION AS PUBLICATIONalias0 WHERE CONFERENCEalias0.NAME = "VLDB" AND PUBLICATIONalias0.CID = CONFERENCEalias0.CID AND PUBLICATIONalias0.YEAR = 2005 ;
+SELECT SUM( PUBLICATIONalias0.CITATION_NUM ) FROM CONFERENCE AS CONFERENCEalias0 , PUBLICATION AS PUBLICATIONalias0 WHERE CONFERENCEalias0.NAME = "VLDB" AND PUBLICATIONalias0.CID = CONFERENCEalias0.CID AND PUBLICATIONalias0.YEAR < 2005 ;
+SELECT PUBLICATIONalias0.YEAR , SUM( PUBLICATIONalias0.CITATION_NUM ) FROM CONFERENCE AS CONFERENCEalias0 , PUBLICATION AS PUBLICATIONalias0 WHERE CONFERENCEalias0.NAME = "VLDB" AND PUBLICATIONalias0.CID = CONFERENCEalias0.CID GROUP BY PUBLICATIONalias0.YEAR ;
+SELECT COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) , PUBLICATIONalias0.YEAR FROM CONFERENCE AS CONFERENCEalias0 , PUBLICATION AS PUBLICATIONalias0 WHERE CONFERENCEalias0.NAME = "VLDB" AND PUBLICATIONalias0.CID = CONFERENCEalias0.CID GROUP BY PUBLICATIONalias0.YEAR ;
+SELECT AUTHORalias0.NAME FROM AUTHOR AS AUTHORalias0 , AUTHOR AS AUTHORalias1 , AUTHOR AS AUTHORalias2 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 , WRITES AS WRITESalias1 , WRITES AS WRITESalias2 WHERE AUTHORalias1.NAME = "H. V. Jagadish" AND AUTHORalias2.NAME = "Divesh Srivastava" AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID AND WRITESalias1.AID = AUTHORalias1.AID AND WRITESalias1.PID = PUBLICATIONalias0.PID AND WRITESalias2.AID = AUTHORalias2.AID AND WRITESalias2.PID = PUBLICATIONalias0.PID ;
+SELECT AUTHORalias0.NAME FROM AUTHOR AS AUTHORalias0 , AUTHOR AS AUTHORalias1 , ORGANIZATION AS ORGANIZATIONalias0 , ORGANIZATION AS ORGANIZATIONalias1 WHERE ( AUTHORalias1.NAME = "H. V. Jagadish" OR AUTHORalias1.NAME = "Divesh Srivastava" ) AND ORGANIZATIONalias0.OID = AUTHORalias0.OID AND ORGANIZATIONalias0.OID = AUTHORalias1.OID ;
+SELECT AUTHORalias0.NAME FROM AUTHOR AS AUTHORalias0 , AUTHOR AS AUTHORalias1 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 , WRITES AS WRITESalias1 WHERE AUTHORalias1.NAME = "H. V. Jagadish" AND PUBLICATIONalias0.YEAR > 2000 AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID AND WRITESalias1.AID = AUTHORalias1.AID AND WRITESalias1.PID = PUBLICATIONalias0.PID ;
+SELECT PUBLICATIONalias0.TITLE FROM AUTHOR AS AUTHORalias0 , AUTHOR AS AUTHORalias1 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 , WRITES AS WRITESalias1 WHERE AUTHORalias0.NAME = "H. V. Jagadish" AND AUTHORalias1.NAME = "Divesh Srivastava" AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID AND WRITESalias1.AID = AUTHORalias1.AID AND WRITESalias1.PID = PUBLICATIONalias0.PID ;
+SELECT PUBLICATIONalias0.TITLE FROM AUTHOR AS AUTHORalias0 , AUTHOR AS AUTHORalias1 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 , WRITES AS WRITESalias1 WHERE AUTHORalias0.NAME = "H. V. Jagadish" AND AUTHORalias1.NAME = "Yunyao Li" AND PUBLICATIONalias0.YEAR > 2005 AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID AND WRITESalias1.AID = AUTHORalias1.AID AND WRITESalias1.PID = PUBLICATIONalias0.PID ;
+SELECT PUBLICATIONalias0.TITLE FROM AUTHOR AS AUTHORalias0 , AUTHOR AS AUTHORalias1 , JOURNAL AS JOURNALalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 , WRITES AS WRITESalias1 WHERE AUTHORalias0.NAME = "H. V. Jagadish" AND AUTHORalias1.NAME = "Yunyao Li" AND JOURNALalias0.NAME = "PVLDB" AND PUBLICATIONalias0.JID = JOURNALalias0.JID AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID AND WRITESalias1.AID = AUTHORalias1.AID AND WRITESalias1.PID = PUBLICATIONalias0.PID ;
+SELECT PUBLICATIONalias0.TITLE FROM AUTHOR AS AUTHORalias0 , AUTHOR AS AUTHORalias1 , JOURNAL AS JOURNALalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 , WRITES AS WRITESalias1 WHERE AUTHORalias0.NAME = "H. V. Jagadish" AND AUTHORalias1.NAME = "Yunyao Li" AND JOURNALalias0.NAME = "PVLDB" AND PUBLICATIONalias0.JID = JOURNALalias0.JID AND PUBLICATIONalias0.YEAR > 2005 AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID AND WRITESalias1.AID = AUTHORalias1.AID AND WRITESalias1.PID = PUBLICATIONalias0.PID ;
+SELECT AUTHORalias0.NAME FROM AUTHOR AS AUTHORalias0 , AUTHOR AS AUTHORalias1 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 , WRITES AS WRITESalias1 WHERE AUTHORalias1.NAME = "H. V. Jagadish" AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID AND WRITESalias1.AID = AUTHORalias1.AID AND WRITESalias1.PID = PUBLICATIONalias0.PID ;
+SELECT PUBLICATIONalias0.TITLE FROM AUTHOR AS AUTHORalias0 , AUTHOR AS AUTHORalias1 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 , WRITES AS WRITESalias1 WHERE AUTHORalias0.NAME = "H. V. Jagadish" AND AUTHORalias1.NAME = "Divesh Srivastava" AND PUBLICATIONalias0.YEAR < 2000 AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID AND WRITESalias1.AID = AUTHORalias1.AID AND WRITESalias1.PID = PUBLICATIONalias0.PID ;
+SELECT AUTHORalias0.NAME FROM AUTHOR AS AUTHORalias0 , AUTHOR AS AUTHORalias1 , CITE AS CITEalias0 , PUBLICATION AS PUBLICATIONalias0 , PUBLICATION AS PUBLICATIONalias1 , WRITES AS WRITESalias0 , WRITES AS WRITESalias1 WHERE AUTHORalias1.NAME = "H. V. Jagadish" AND PUBLICATIONalias0.PID = CITEalias0.CITING AND PUBLICATIONalias1.PID = CITEalias0.CITED AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID AND WRITESalias1.AID = AUTHORalias1.AID AND WRITESalias1.PID = PUBLICATIONalias1.PID ;
+SELECT COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) FROM AUTHOR AS AUTHORalias0 , AUTHOR AS AUTHORalias1 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 , WRITES AS WRITESalias1 WHERE AUTHORalias0.NAME = "H. V. Jagadish" AND AUTHORalias1.NAME = "Divesh Srivastava" AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID AND WRITESalias1.AID = AUTHORalias1.AID AND WRITESalias1.PID = PUBLICATIONalias0.PID ;
+SELECT COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) FROM AUTHOR AS AUTHORalias0 , AUTHOR AS AUTHORalias1 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 , WRITES AS WRITESalias1 WHERE AUTHORalias0.NAME = "H. V. Jagadish" AND AUTHORalias1.NAME = "Divesh Srivastava" AND PUBLICATIONalias0.YEAR < 2000 AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID AND WRITESalias1.AID = AUTHORalias1.AID AND WRITESalias1.PID = PUBLICATIONalias0.PID ;
+SELECT COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) FROM AUTHOR AS AUTHORalias0 , AUTHOR AS AUTHORalias1 , AUTHOR AS AUTHORalias2 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 , WRITES AS WRITESalias1 , WRITES AS WRITESalias2 WHERE AUTHORalias0.NAME = "Cong Yu" AND AUTHORalias1.NAME = "H. V. Jagadish" AND AUTHORalias2.NAME = "Yunyao Li" AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID AND WRITESalias1.AID = AUTHORalias1.AID AND WRITESalias1.PID = PUBLICATIONalias0.PID AND WRITESalias2.AID = AUTHORalias2.AID AND WRITESalias2.PID = PUBLICATIONalias0.PID ;
+SELECT COUNT( DISTINCT ( AUTHORalias0.NAME ) ) FROM AUTHOR AS AUTHORalias0 , AUTHOR AS AUTHORalias1 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 , WRITES AS WRITESalias1 WHERE AUTHORalias1.NAME = "H. V. Jagadish" AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID AND WRITESalias1.AID = AUTHORalias1.AID AND WRITESalias1.PID = PUBLICATIONalias0.PID ;
+SELECT COUNT( DISTINCT ( AUTHORalias0.NAME ) ) FROM AUTHOR AS AUTHORalias0 , AUTHOR AS AUTHORalias1 , CITE AS CITEalias0 , PUBLICATION AS PUBLICATIONalias0 , PUBLICATION AS PUBLICATIONalias1 , WRITES AS WRITESalias0 , WRITES AS WRITESalias1 WHERE AUTHORalias1.NAME = "H. V. Jagadish" AND PUBLICATIONalias0.PID = CITEalias0.CITING AND PUBLICATIONalias1.PID = CITEalias0.CITED AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID AND WRITESalias1.AID = AUTHORalias1.AID AND WRITESalias1.PID = PUBLICATIONalias1.PID ;
+SELECT PUBLICATIONalias0.TITLE FROM AUTHOR AS AUTHORalias0 , AUTHOR AS AUTHORalias1 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 , WRITES AS WRITESalias1 WHERE AUTHORalias0.NAME = "H. V. Jagadish" AND AUTHORalias1.NAME = "Divesh Srivastava" AND PUBLICATIONalias0.CITATION_NUM > 200 AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID AND WRITESalias1.AID = AUTHORalias1.AID AND WRITESalias1.PID = PUBLICATIONalias0.PID ;
+SELECT AUTHORalias0.NAME FROM AUTHOR AS AUTHORalias0 , KEYWORD AS KEYWORDalias0 , PUBLICATION AS PUBLICATIONalias0 , PUBLICATION_KEYWORD AS PUBLICATION_KEYWORDalias0 , WRITES AS WRITESalias0 WHERE KEYWORDalias0.KEYWORD = "Relational Database" AND PUBLICATION_KEYWORDalias0.KID = KEYWORDalias0.KID AND PUBLICATIONalias0.PID = PUBLICATION_KEYWORDalias0.PID AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID GROUP BY AUTHORalias0.NAME ORDER BY COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) DESC LIMIT 1 ;
+SELECT AUTHORalias0.NAME FROM AUTHOR AS AUTHORalias0 , KEYWORD AS KEYWORDalias0 , PUBLICATION AS PUBLICATIONalias0 , PUBLICATION_KEYWORD AS PUBLICATION_KEYWORDalias0 , WRITES AS WRITESalias0 WHERE KEYWORDalias0.KEYWORD = "Relational Database" AND PUBLICATION_KEYWORDalias0.KID = KEYWORDalias0.KID AND PUBLICATIONalias0.PID = PUBLICATION_KEYWORDalias0.PID AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID GROUP BY AUTHORalias0.NAME ORDER BY COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) DESC LIMIT 1 ;
+SELECT CONFERENCEalias0.NAME FROM CONFERENCE AS CONFERENCEalias0 , KEYWORD AS KEYWORDalias0 , PUBLICATION AS PUBLICATIONalias0 , PUBLICATION_KEYWORD AS PUBLICATION_KEYWORDalias0 WHERE KEYWORDalias0.KEYWORD = "Relational Database" AND PUBLICATION_KEYWORDalias0.KID = KEYWORDalias0.KID AND PUBLICATIONalias0.CID = CONFERENCEalias0.CID AND PUBLICATIONalias0.PID = PUBLICATION_KEYWORDalias0.PID GROUP BY CONFERENCEalias0.NAME ORDER BY COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) DESC LIMIT 1 ;
+SELECT CONFERENCEalias0.NAME FROM CONFERENCE AS CONFERENCEalias0 , KEYWORD AS KEYWORDalias0 , PUBLICATION AS PUBLICATIONalias0 , PUBLICATION_KEYWORD AS PUBLICATION_KEYWORDalias0 WHERE KEYWORDalias0.KEYWORD = "Relational Database" AND PUBLICATION_KEYWORDalias0.KID = KEYWORDalias0.KID AND PUBLICATIONalias0.CID = CONFERENCEalias0.CID AND PUBLICATIONalias0.PID = PUBLICATION_KEYWORDalias0.PID GROUP BY CONFERENCEalias0.NAME ORDER BY COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) DESC LIMIT 1 ;
+SELECT JOURNALalias0.NAME FROM JOURNAL AS JOURNALalias0 , KEYWORD AS KEYWORDalias0 , PUBLICATION AS PUBLICATIONalias0 , PUBLICATION_KEYWORD AS PUBLICATION_KEYWORDalias0 WHERE KEYWORDalias0.KEYWORD = "Relational Database" AND PUBLICATION_KEYWORDalias0.KID = KEYWORDalias0.KID AND PUBLICATIONalias0.JID = JOURNALalias0.JID AND PUBLICATIONalias0.PID = PUBLICATION_KEYWORDalias0.PID GROUP BY JOURNALalias0.NAME ORDER BY COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) DESC LIMIT 1 ;
+SELECT JOURNALalias0.NAME FROM JOURNAL AS JOURNALalias0 , KEYWORD AS KEYWORDalias0 , PUBLICATION AS PUBLICATIONalias0 , PUBLICATION_KEYWORD AS PUBLICATION_KEYWORDalias0 WHERE KEYWORDalias0.KEYWORD = "Relational Database" AND PUBLICATION_KEYWORDalias0.KID = KEYWORDalias0.KID AND PUBLICATIONalias0.JID = JOURNALalias0.JID AND PUBLICATIONalias0.PID = PUBLICATION_KEYWORDalias0.PID GROUP BY JOURNALalias0.NAME ORDER BY COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) DESC LIMIT 1 ;
+SELECT COUNT( * ) FROM ( SELECT AUTHORalias0.NAME FROM AUTHOR AS AUTHORalias0 , KEYWORD AS KEYWORDalias0 , PUBLICATION AS PUBLICATIONalias0 , PUBLICATION_KEYWORD AS PUBLICATION_KEYWORDalias0 , WRITES AS WRITESalias0 WHERE KEYWORDalias0.KEYWORD = "Relational Database" AND PUBLICATION_KEYWORDalias0.KID = KEYWORDalias0.KID AND PUBLICATIONalias0.PID = PUBLICATION_KEYWORDalias0.PID AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID GROUP BY AUTHORalias0.NAME HAVING COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) > 10 ) AS DERIVED_TABLEalias0 ;
+SELECT COUNT( * ) FROM ( SELECT AUTHORalias0.NAME FROM AUTHOR AS AUTHORalias0 , KEYWORD AS KEYWORDalias0 , PUBLICATION AS PUBLICATIONalias0 , PUBLICATION_KEYWORD AS PUBLICATION_KEYWORDalias0 , WRITES AS WRITESalias0 WHERE KEYWORDalias0.KEYWORD = "Relational Database" AND PUBLICATION_KEYWORDalias0.KID = KEYWORDalias0.KID AND PUBLICATIONalias0.PID = PUBLICATION_KEYWORDalias0.PID AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID GROUP BY AUTHORalias0.NAME HAVING COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) > 10 ) AS DERIVED_TABLEalias0 ;
+SELECT COUNT( * ) FROM ( SELECT CONFERENCEalias0.NAME FROM CONFERENCE AS CONFERENCEalias0 , KEYWORD AS KEYWORDalias0 , PUBLICATION AS PUBLICATIONalias0 , PUBLICATION_KEYWORD AS PUBLICATION_KEYWORDalias0 WHERE KEYWORDalias0.KEYWORD = "Relational Database" AND PUBLICATION_KEYWORDalias0.KID = KEYWORDalias0.KID AND PUBLICATIONalias0.CID = CONFERENCEalias0.CID AND PUBLICATIONalias0.PID = PUBLICATION_KEYWORDalias0.PID GROUP BY CONFERENCEalias0.NAME HAVING COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) > 60 ) AS DERIVED_TABLEalias0 ;
+SELECT COUNT( * ) FROM ( SELECT CONFERENCEalias0.NAME FROM CONFERENCE AS CONFERENCEalias0 , KEYWORD AS KEYWORDalias0 , PUBLICATION AS PUBLICATIONalias0 , PUBLICATION_KEYWORD AS PUBLICATION_KEYWORDalias0 WHERE KEYWORDalias0.KEYWORD = "Relational Database" AND PUBLICATION_KEYWORDalias0.KID = KEYWORDalias0.KID AND PUBLICATIONalias0.CID = CONFERENCEalias0.CID AND PUBLICATIONalias0.PID = PUBLICATION_KEYWORDalias0.PID GROUP BY CONFERENCEalias0.NAME HAVING COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) > 60 ) AS DERIVED_TABLEalias0 ;
+SELECT COUNT( * ) FROM ( SELECT JOURNALalias0.NAME FROM JOURNAL AS JOURNALalias0 , KEYWORD AS KEYWORDalias0 , PUBLICATION AS PUBLICATIONalias0 , PUBLICATION_KEYWORD AS PUBLICATION_KEYWORDalias0 WHERE KEYWORDalias0.KEYWORD = "Relational Database" AND PUBLICATION_KEYWORDalias0.KID = KEYWORDalias0.KID AND PUBLICATIONalias0.JID = JOURNALalias0.JID AND PUBLICATIONalias0.PID = PUBLICATION_KEYWORDalias0.PID GROUP BY JOURNALalias0.NAME HAVING COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) > 60 ) AS DERIVED_TABLEalias0 ;
+SELECT COUNT( * ) FROM ( SELECT JOURNALalias0.NAME FROM JOURNAL AS JOURNALalias0 , KEYWORD AS KEYWORDalias0 , PUBLICATION AS PUBLICATIONalias0 , PUBLICATION_KEYWORD AS PUBLICATION_KEYWORDalias0 WHERE KEYWORDalias0.KEYWORD = "Relational Database" AND PUBLICATION_KEYWORDalias0.KID = KEYWORDalias0.KID AND PUBLICATIONalias0.JID = JOURNALalias0.JID AND PUBLICATIONalias0.PID = PUBLICATION_KEYWORDalias0.PID GROUP BY JOURNALalias0.NAME HAVING COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) > 60 ) AS DERIVED_TABLEalias0 ;
+SELECT COUNT( * ) FROM ( SELECT KEYWORDalias0.KEYWORD FROM CONFERENCE AS CONFERENCEalias0 , KEYWORD AS KEYWORDalias0 , PUBLICATION AS PUBLICATIONalias0 , PUBLICATION_KEYWORD AS PUBLICATION_KEYWORDalias0 WHERE CONFERENCEalias0.NAME = "VLDB" AND PUBLICATION_KEYWORDalias0.KID = KEYWORDalias0.KID AND PUBLICATIONalias0.CID = CONFERENCEalias0.CID AND PUBLICATIONalias0.PID = PUBLICATION_KEYWORDalias0.PID GROUP BY KEYWORDalias0.KEYWORD HAVING COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) > 100 ) AS DERIVED_TABLEalias0 ;
+SELECT COUNT( * ) FROM ( SELECT KEYWORDalias0.KEYWORD FROM JOURNAL AS JOURNALalias0 , KEYWORD AS KEYWORDalias0 , PUBLICATION AS PUBLICATIONalias0 , PUBLICATION_KEYWORD AS PUBLICATION_KEYWORDalias0 WHERE JOURNALalias0.NAME = "PVLDB" AND PUBLICATION_KEYWORDalias0.KID = KEYWORDalias0.KID AND PUBLICATIONalias0.JID = JOURNALalias0.JID AND PUBLICATIONalias0.PID = PUBLICATION_KEYWORDalias0.PID GROUP BY KEYWORDalias0.KEYWORD HAVING COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) > 100 ) AS DERIVED_TABLEalias0 ;
+SELECT COUNT( * ) FROM ( SELECT KEYWORDalias0.KEYWORD FROM AUTHOR AS AUTHORalias0 , KEYWORD AS KEYWORDalias0 , PUBLICATION AS PUBLICATIONalias0 , PUBLICATION_KEYWORD AS PUBLICATION_KEYWORDalias0 , WRITES AS WRITESalias0 WHERE AUTHORalias0.NAME = "H. V. Jagadish" AND PUBLICATION_KEYWORDalias0.KID = KEYWORDalias0.KID AND PUBLICATIONalias0.PID = PUBLICATION_KEYWORDalias0.PID AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID GROUP BY KEYWORDalias0.KEYWORD HAVING COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) > 10 ) AS DERIVED_TABLEalias0 ;
+SELECT KEYWORDalias0.KEYWORD FROM CONFERENCE AS CONFERENCEalias0 , KEYWORD AS KEYWORDalias0 , PUBLICATION AS PUBLICATIONalias0 , PUBLICATION_KEYWORD AS PUBLICATION_KEYWORDalias0 WHERE CONFERENCEalias0.NAME = "VLDB" AND PUBLICATION_KEYWORDalias0.KID = KEYWORDalias0.KID AND PUBLICATIONalias0.CID = CONFERENCEalias0.CID AND PUBLICATIONalias0.PID = PUBLICATION_KEYWORDalias0.PID GROUP BY KEYWORDalias0.KEYWORD ORDER BY COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) DESC LIMIT 1 ;
+SELECT KEYWORDalias0.KEYWORD FROM JOURNAL AS JOURNALalias0 , KEYWORD AS KEYWORDalias0 , PUBLICATION AS PUBLICATIONalias0 , PUBLICATION_KEYWORD AS PUBLICATION_KEYWORDalias0 WHERE JOURNALalias0.NAME = "PVLDB" AND PUBLICATION_KEYWORDalias0.KID = KEYWORDalias0.KID AND PUBLICATIONalias0.JID = JOURNALalias0.JID AND PUBLICATIONalias0.PID = PUBLICATION_KEYWORDalias0.PID GROUP BY KEYWORDalias0.KEYWORD ORDER BY COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) DESC LIMIT 1 ;
+SELECT KEYWORDalias0.KEYWORD FROM AUTHOR AS AUTHORalias0 , KEYWORD AS KEYWORDalias0 , PUBLICATION AS PUBLICATIONalias0 , PUBLICATION_KEYWORD AS PUBLICATION_KEYWORDalias0 , WRITES AS WRITESalias0 WHERE AUTHORalias0.NAME = "H. V. Jagadish" AND PUBLICATION_KEYWORDalias0.KID = KEYWORDalias0.KID AND PUBLICATIONalias0.PID = PUBLICATION_KEYWORDalias0.PID AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID GROUP BY KEYWORDalias0.KEYWORD ORDER BY COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) DESC LIMIT 1 ;
+SELECT AUTHORalias0.NAME FROM AUTHOR AS AUTHORalias0 , ORGANIZATION AS ORGANIZATIONalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE ORGANIZATIONalias0.NAME = "University of Michigan" AND ORGANIZATIONalias0.OID = AUTHORalias0.OID AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID GROUP BY AUTHORalias0.NAME ORDER BY SUM( PUBLICATIONalias0.CITATION_NUM ) DESC LIMIT 1 ;
+SELECT AUTHORalias0.NAME FROM AUTHOR AS AUTHORalias0 , DOMAIN AS DOMAINalias0 , DOMAIN_PUBLICATION AS DOMAIN_PUBLICATIONalias0 , ORGANIZATION AS ORGANIZATIONalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE DOMAINalias0.DID = DOMAIN_PUBLICATIONalias0.DID AND DOMAINalias0.NAME = "Databases" AND ORGANIZATIONalias0.NAME = "University of Michigan" AND ORGANIZATIONalias0.OID = AUTHORalias0.OID AND PUBLICATIONalias0.PID = DOMAIN_PUBLICATIONalias0.PID AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID GROUP BY AUTHORalias0.NAME ORDER BY SUM( PUBLICATIONalias0.CITATION_NUM ) DESC LIMIT 1 ;
+SELECT PUBLICATIONalias0.TITLE FROM AUTHOR AS AUTHORalias0 , AUTHOR AS AUTHORalias1 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 , WRITES AS WRITESalias1 WHERE AUTHORalias0.NAME = "Divesh Srivastava" AND AUTHORalias1.NAME = "H. V. Jagadish" AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID AND WRITESalias1.AID = AUTHORalias1.AID AND WRITESalias1.PID = PUBLICATIONalias0.PID ORDER BY PUBLICATIONalias0.CITATION_NUM DESC LIMIT 1 ;
+SELECT CONFERENCEalias0.NAME FROM AUTHOR AS AUTHORalias0 , CONFERENCE AS CONFERENCEalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE AUTHORalias0.NAME = "H. V. Jagadish" AND PUBLICATIONalias0.CID = CONFERENCEalias0.CID AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID GROUP BY CONFERENCEalias0.NAME HAVING COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) > 10 ;
+SELECT CONFERENCEalias0.NAME FROM AUTHOR AS AUTHORalias0 , CONFERENCE AS CONFERENCEalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE AUTHORalias0.NAME = "H. V. Jagadish" AND PUBLICATIONalias0.CID = CONFERENCEalias0.CID AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID GROUP BY CONFERENCEalias0.NAME ORDER BY COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) DESC LIMIT 1 ;
+SELECT JOURNALalias0.NAME FROM AUTHOR AS AUTHORalias0 , JOURNAL AS JOURNALalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE AUTHORalias0.NAME = "H. V. Jagadish" AND PUBLICATIONalias0.JID = JOURNALalias0.JID AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID GROUP BY JOURNALalias0.NAME HAVING COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) > 10 ;
+SELECT JOURNALalias0.NAME FROM AUTHOR AS AUTHORalias0 , JOURNAL AS JOURNALalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE AUTHORalias0.NAME = "H. V. Jagadish" AND PUBLICATIONalias0.JID = JOURNALalias0.JID AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID GROUP BY JOURNALalias0.NAME ORDER BY COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) DESC LIMIT 1 ;
+SELECT PUBLICATIONalias0.TITLE FROM PUBLICATION AS PUBLICATIONalias0 ORDER BY PUBLICATIONalias0.CITATION_NUM DESC LIMIT 1 ;
+SELECT PUBLICATIONalias0.TITLE FROM DOMAIN AS DOMAINalias0 , DOMAIN_PUBLICATION AS DOMAIN_PUBLICATIONalias0 , PUBLICATION AS PUBLICATIONalias0 WHERE DOMAINalias0.DID = DOMAIN_PUBLICATIONalias0.DID AND DOMAINalias0.NAME = "Databases" AND PUBLICATIONalias0.PID = DOMAIN_PUBLICATIONalias0.PID ORDER BY PUBLICATIONalias0.CITATION_NUM DESC LIMIT 1 ;
+SELECT PUBLICATIONalias0.TITLE FROM JOURNAL AS JOURNALalias0 , PUBLICATION AS PUBLICATIONalias0 WHERE JOURNALalias0.NAME = "PVLDB" AND PUBLICATIONalias0.JID = JOURNALalias0.JID ORDER BY PUBLICATIONalias0.CITATION_NUM DESC LIMIT 1 ;
+SELECT PUBLICATIONalias0.TITLE FROM CONFERENCE AS CONFERENCEalias0 , PUBLICATION AS PUBLICATIONalias0 WHERE CONFERENCEalias0.NAME = "VLDB" AND PUBLICATIONalias0.CID = CONFERENCEalias0.CID ORDER BY PUBLICATIONalias0.CITATION_NUM DESC LIMIT 1 ;
+SELECT PUBLICATIONalias0.TITLE FROM AUTHOR AS AUTHORalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE AUTHORalias0.NAME = "H. V. Jagadish" AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID ORDER BY PUBLICATIONalias0.CITATION_NUM DESC LIMIT 1 ;
+SELECT PUBLICATIONalias0.TITLE FROM PUBLICATION AS PUBLICATIONalias0 WHERE PUBLICATIONalias0.YEAR > 2000 ORDER BY PUBLICATIONalias0.CITATION_NUM DESC LIMIT 1 ;
+SELECT PUBLICATIONalias0.TITLE FROM DOMAIN AS DOMAINalias0 , DOMAIN_PUBLICATION AS DOMAIN_PUBLICATIONalias0 , PUBLICATION AS PUBLICATIONalias0 WHERE DOMAINalias0.DID = DOMAIN_PUBLICATIONalias0.DID AND DOMAINalias0.NAME = "Databases" AND PUBLICATIONalias0.PID = DOMAIN_PUBLICATIONalias0.PID AND PUBLICATIONalias0.YEAR > 2000 ORDER BY PUBLICATIONalias0.CITATION_NUM DESC LIMIT 1 ;
+SELECT PUBLICATIONalias0.TITLE FROM JOURNAL AS JOURNALalias0 , PUBLICATION AS PUBLICATIONalias0 WHERE JOURNALalias0.NAME = "PVLDB" AND PUBLICATIONalias0.JID = JOURNALalias0.JID AND PUBLICATIONalias0.YEAR > 2000 ORDER BY PUBLICATIONalias0.CITATION_NUM DESC LIMIT 1 ;
+SELECT PUBLICATIONalias0.TITLE FROM CONFERENCE AS CONFERENCEalias0 , PUBLICATION AS PUBLICATIONalias0 WHERE CONFERENCEalias0.NAME = "VLDB" AND PUBLICATIONalias0.CID = CONFERENCEalias0.CID AND PUBLICATIONalias0.YEAR > 2000 ORDER BY PUBLICATIONalias0.CITATION_NUM DESC LIMIT 1 ;
+SELECT AUTHORalias0.NAME FROM AUTHOR AS AUTHORalias0 , JOURNAL AS JOURNALalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE JOURNALalias0.NAME = "PVLDB" AND PUBLICATIONalias0.JID = JOURNALalias0.JID AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID GROUP BY AUTHORalias0.NAME HAVING COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) > 10 ;
+SELECT AUTHORalias0.NAME FROM AUTHOR AS AUTHORalias0 , JOURNAL AS JOURNALalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE JOURNALalias0.NAME = "PVLDB" AND PUBLICATIONalias0.JID = JOURNALalias0.JID AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID GROUP BY AUTHORalias0.NAME ORDER BY COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) DESC LIMIT 1 ;
+SELECT AUTHORalias0.NAME FROM AUTHOR AS AUTHORalias0 , KEYWORD AS KEYWORDalias0 , PUBLICATION AS PUBLICATIONalias0 , PUBLICATION_KEYWORD AS PUBLICATION_KEYWORDalias0 , WRITES AS WRITESalias0 WHERE KEYWORDalias0.KEYWORD = "Relational Database" AND PUBLICATION_KEYWORDalias0.KID = KEYWORDalias0.KID AND PUBLICATIONalias0.PID = PUBLICATION_KEYWORDalias0.PID AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID GROUP BY AUTHORalias0.NAME HAVING COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) > 10 ;
+SELECT CONFERENCEalias0.NAME FROM CONFERENCE AS CONFERENCEalias0 , KEYWORD AS KEYWORDalias0 , PUBLICATION AS PUBLICATIONalias0 , PUBLICATION_KEYWORD AS PUBLICATION_KEYWORDalias0 WHERE KEYWORDalias0.KEYWORD = "Relational Database" AND PUBLICATION_KEYWORDalias0.KID = KEYWORDalias0.KID AND PUBLICATIONalias0.CID = CONFERENCEalias0.CID AND PUBLICATIONalias0.PID = PUBLICATION_KEYWORDalias0.PID GROUP BY CONFERENCEalias0.NAME HAVING COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) > 60 ;
+SELECT JOURNALalias0.NAME FROM JOURNAL AS JOURNALalias0 , KEYWORD AS KEYWORDalias0 , PUBLICATION AS PUBLICATIONalias0 , PUBLICATION_KEYWORD AS PUBLICATION_KEYWORDalias0 WHERE KEYWORDalias0.KEYWORD = "Relational Database" AND PUBLICATION_KEYWORDalias0.KID = KEYWORDalias0.KID AND PUBLICATIONalias0.JID = JOURNALalias0.JID AND PUBLICATIONalias0.PID = PUBLICATION_KEYWORDalias0.PID GROUP BY JOURNALalias0.NAME HAVING COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) > 60 ;
+SELECT KEYWORDalias0.KEYWORD FROM CONFERENCE AS CONFERENCEalias0 , KEYWORD AS KEYWORDalias0 , PUBLICATION AS PUBLICATIONalias0 , PUBLICATION_KEYWORD AS PUBLICATION_KEYWORDalias0 WHERE CONFERENCEalias0.NAME = "VLDB" AND PUBLICATION_KEYWORDalias0.KID = KEYWORDalias0.KID AND PUBLICATIONalias0.CID = CONFERENCEalias0.CID AND PUBLICATIONalias0.PID = PUBLICATION_KEYWORDalias0.PID GROUP BY KEYWORDalias0.KEYWORD HAVING COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) > 100 ;
+SELECT KEYWORDalias0.KEYWORD FROM JOURNAL AS JOURNALalias0 , KEYWORD AS KEYWORDalias0 , PUBLICATION AS PUBLICATIONalias0 , PUBLICATION_KEYWORD AS PUBLICATION_KEYWORDalias0 WHERE JOURNALalias0.NAME = "PVLDB" AND PUBLICATION_KEYWORDalias0.KID = KEYWORDalias0.KID AND PUBLICATIONalias0.JID = JOURNALalias0.JID AND PUBLICATIONalias0.PID = PUBLICATION_KEYWORDalias0.PID GROUP BY KEYWORDalias0.KEYWORD HAVING COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) > 100 ;
+SELECT KEYWORDalias0.KEYWORD FROM AUTHOR AS AUTHORalias0 , KEYWORD AS KEYWORDalias0 , PUBLICATION AS PUBLICATIONalias0 , PUBLICATION_KEYWORD AS PUBLICATION_KEYWORDalias0 , WRITES AS WRITESalias0 WHERE AUTHORalias0.NAME = "H. V. Jagadish" AND PUBLICATION_KEYWORDalias0.KID = KEYWORDalias0.KID AND PUBLICATIONalias0.PID = PUBLICATION_KEYWORDalias0.PID AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID GROUP BY KEYWORDalias0.KEYWORD HAVING COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) > 10 ;
+SELECT AUTHORalias0.NAME FROM AUTHOR AS AUTHORalias0 , CONFERENCE AS CONFERENCEalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE CONFERENCEalias0.NAME = "VLDB" AND PUBLICATIONalias0.CID = CONFERENCEalias0.CID AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID GROUP BY AUTHORalias0.NAME HAVING COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) > 10 ;
+SELECT AUTHORalias0.NAME FROM AUTHOR AS AUTHORalias0 , CONFERENCE AS CONFERENCEalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE CONFERENCEalias0.NAME = "VLDB" AND PUBLICATIONalias0.CID = CONFERENCEalias0.CID AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID GROUP BY AUTHORalias0.NAME ORDER BY COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) DESC LIMIT 1 ;
+SELECT DERIVED_FIELDalias0 FROM ( SELECT AUTHORalias0.NAME AS DERIVED_FIELDalias0 , COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) AS DERIVED_FIELDalias1 FROM AUTHOR AS AUTHORalias0 , CONFERENCE AS CONFERENCEalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE CONFERENCEalias0.NAME = "VLDB" AND PUBLICATIONalias0.CID = CONFERENCEalias0.CID AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID GROUP BY AUTHORalias0.NAME ) AS DERIVED_TABLEalias0 , ( SELECT AUTHORalias1.NAME AS DERIVED_FIELDalias2 , COUNT( DISTINCT ( PUBLICATIONalias1.TITLE ) ) AS DERIVED_FIELDalias3 FROM AUTHOR AS AUTHORalias1 , CONFERENCE AS CONFERENCEalias1 , PUBLICATION AS PUBLICATIONalias1 , WRITES AS WRITESalias1 WHERE CONFERENCEalias1.NAME = "ICDE" AND PUBLICATIONalias1.CID = CONFERENCEalias1.CID AND WRITESalias1.AID = AUTHORalias1.AID AND WRITESalias1.PID = PUBLICATIONalias1.PID GROUP BY AUTHORalias1.NAME ) AS DERIVED_TABLEalias1 WHERE DERIVED_TABLEalias0.DERIVED_FIELDalias1 > DERIVED_TABLEalias1.DERIVED_FIELDalias3 AND DERIVED_TABLEalias1.DERIVED_FIELDalias2 = DERIVED_TABLEalias0.DERIVED_FIELDalias0 ;
+SELECT AUTHORalias0.NAME FROM AUTHOR AS AUTHORalias0 , ORGANIZATION AS ORGANIZATIONalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE ORGANIZATIONalias0.NAME = "University of Michigan" AND ORGANIZATIONalias0.OID = AUTHORalias0.OID AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID GROUP BY AUTHORalias0.NAME HAVING SUM( PUBLICATIONalias0.CITATION_NUM ) > 5000 ;
+SELECT AUTHORalias0.NAME FROM AUTHOR AS AUTHORalias0 , DOMAIN AS DOMAINalias0 , DOMAIN_AUTHOR AS DOMAIN_AUTHORalias0 , ORGANIZATION AS ORGANIZATIONalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE DOMAIN_AUTHORalias0.AID = AUTHORalias0.AID AND DOMAINalias0.DID = DOMAIN_AUTHORalias0.DID AND DOMAINalias0.NAME = "Databases" AND ORGANIZATIONalias0.NAME = "University of Michigan" AND ORGANIZATIONalias0.OID = AUTHORalias0.OID AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID GROUP BY AUTHORalias0.NAME HAVING SUM( PUBLICATIONalias0.CITATION_NUM ) > 5000 ;

duckdb-nsql/eval/metrics/test_suite_sql_eval/evaluation_examples/classical_test_gold.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

duckdb-nsql/eval/metrics/test_suite_sql_eval/evaluation_examples/gold.txt ADDED Viewed

	@@ -0,0 +1,453 @@

+SELECT * FROM AIRLINES	flight_2
+SELECT * FROM AIRLINES WHERE Airline  =  "JetBlue Airways"	flight_2
+SELECT Country FROM AIRLINES WHERE Airline  =  "JetBlue Airways"	flight_2
+SELECT Abbreviation FROM AIRLINES	flight_2
+SELECT Abbreviation FROM AIRLINES WHERE Airline  =  "JetBlue Airways"	flight_2
+SELECT Airline ,  Abbreviation FROM AIRLINES	flight_2
+SELECT Airline ,  Abbreviation FROM AIRLINES WHERE Country  =  "USA"	flight_2
+SELECT * FROM AIRPORTS WHERE city  =  "Anthony"	flight_2
+SELECT AirportCode ,  AirportName FROM AIRPORTS WHERE city  =  "Anthony"	flight_2
+SELECT * FROM AIRLINES	flight_2
+SELECT count(*) FROM AIRLINES	flight_2
+SELECT * FROM AIRPORTS	flight_2
+SELECT count(*) FROM AIRPORTS	flight_2
+SELECT * FROM FLIGHTS	flight_2
+SELECT count(*) FROM FLIGHTS	flight_2
+SELECT Airline FROM AIRLINES	flight_2
+SELECT Airline FROM AIRLINES WHERE Abbreviation  =  "UAL"	flight_2
+SELECT airline FROM AIRLINES WHERE Country  =  "USA"	flight_2
+SELECT count(*) FROM AIRLINES WHERE Country  =  "USA"	flight_2
+SELECT City ,  Country FROM AIRPORTS	flight_2
+SELECT City ,  Country FROM AIRPORTS WHERE AirportName  =  "Alton"	flight_2
+SELECT AirportName FROM AIRPORTS	flight_2
+SELECT AirportName FROM AIRPORTS WHERE AirportCode  =  "AKO"	flight_2
+SELECT AirportName FROM AIRPORTS	flight_2
+SELECT AirportName FROM AIRPORTS WHERE City = "Aberdeen"	flight_2
+SELECT * FROM FLIGHTS WHERE SourceAirport  =  "APG"	flight_2
+SELECT count(*) FROM FLIGHTS WHERE SourceAirport  =  "APG"	flight_2
+SELECT * FROM FLIGHTS WHERE DestAirport  =  "ATO"	flight_2
+SELECT count(*) FROM FLIGHTS WHERE DestAirport  =  "ATO"	flight_2
+SELECT * FROM FLIGHTS AS T1 JOIN AIRPORTS AS T2 ON T1.SourceAirport  =  T2.AirportCode WHERE T2.City  =  "Aberdeen"	flight_2
+SELECT count(*) FROM FLIGHTS AS T1 JOIN AIRPORTS AS T2 ON T1.SourceAirport  =  T2.AirportCode WHERE T2.City  =  "Aberdeen"	flight_2
+SELECT * FROM FLIGHTS AS T1 JOIN AIRPORTS AS T2 ON T1.DestAirport  =  T2.AirportCode WHERE T2.City  =  "Aberdeen"	flight_2
+SELECT count(*) FROM FLIGHTS AS T1 JOIN AIRPORTS AS T2 ON T1.DestAirport  =  T2.AirportCode WHERE T2.City  =  "Aberdeen"	flight_2
+SELECT * FROM FLIGHTS AS T1 JOIN AIRPORTS AS T2 ON T1.SourceAirport  =  T2.AirportCode WHERE T2.City  =  "Aberdeen"	flight_2
+SELECT * FROM FLIGHTS AS T1 JOIN AIRPORTS AS T2 ON T1.DestAirport  =  T2.AirportCode JOIN AIRPORTS AS T3 ON T1.SourceAirport  =  T3.AirportCode WHERE T2.City  =  "Ashley" AND T3.City  =  "Aberdeen"	flight_2
+SELECT count(*) FROM FLIGHTS AS T1 JOIN AIRPORTS AS T2 ON T1.DestAirport  =  T2.AirportCode JOIN AIRPORTS AS T3 ON T1.SourceAirport  =  T3.AirportCode WHERE T2.City  =  "Ashley" AND T3.City  =  "Aberdeen"	flight_2
+SELECT * FROM FLIGHTS AS T1 JOIN AIRLINES AS T2 ON T1.Airline  =  T2.uid WHERE T2.Airline = "JetBlue Airways"	flight_2
+SELECT count(*) FROM FLIGHTS AS T1 JOIN AIRLINES AS T2 ON T1.Airline  =  T2.uid WHERE T2.Airline = "JetBlue Airways"	flight_2
+SELECT * FROM AIRLINES WHERE Airline  =  "United Airlines"	flight_2
+SELECT count(*) FROM AIRLINES WHERE Airline  =  "United Airlines"	flight_2
+SELECT count(*) FROM AIRLINES AS T1 JOIN FLIGHTS AS T2 ON T2.Airline  =  T1.uid WHERE T1.Airline  =  "United Airlines" AND T2.DestAirport  =  "ASY"	flight_2
+SELECT * FROM AIRLINES WHERE Airline  =  "United Airlines"	flight_2
+SELECT * FROM AIRLINES AS T1 JOIN FLIGHTS AS T2 ON T2.Airline  =  T1.uid WHERE T1.Airline  =  "United Airlines" AND T2.SourceAirport  =  "AHD"	flight_2
+SELECT count(*) FROM AIRLINES AS T1 JOIN FLIGHTS AS T2 ON T2.Airline  =  T1.uid WHERE T1.Airline  =  "United Airlines" AND T2.SourceAirport  =  "AHD"	flight_2
+SELECT * FROM FLIGHTS AS T1 JOIN AIRPORTS AS T2 ON T1.DestAirport  =  T2.AirportCode JOIN AIRLINES AS T3 ON T3.uid  =  T1.Airline WHERE T2.City  =  "Aberdeen" AND T3.Airline  =  "United Airlines"	flight_2
+SELECT count(*) FROM FLIGHTS AS T1 JOIN AIRPORTS AS T2 ON T1.DestAirport  =  T2.AirportCode JOIN AIRLINES AS T3 ON T3.uid  =  T1.Airline WHERE T2.City  =  "Aberdeen" AND T3.Airline  =  "United Airlines"	flight_2
+SELECT T1.City FROM AIRPORTS AS T1 JOIN FLIGHTS AS T2 ON T1.AirportCode  =  T2.DestAirport	flight_2
+SELECT T1.City FROM AIRPORTS AS T1 JOIN FLIGHTS AS T2 ON T1.AirportCode  =  T2.DestAirport GROUP BY T1.City ORDER BY count(*) DESC	flight_2
+SELECT T1.City FROM AIRPORTS AS T1 JOIN FLIGHTS AS T2 ON T1.AirportCode  =  T2.DestAirport GROUP BY T1.City ORDER BY count(*) DESC LIMIT 1	flight_2
+SELECT T1.City FROM AIRPORTS AS T1 JOIN FLIGHTS AS T2 ON T1.AirportCode  =  T2.SourceAirport	flight_2
+SELECT T1.City FROM AIRPORTS AS T1 JOIN FLIGHTS AS T2 ON T1.AirportCode  =  T2.SourceAirport GROUP BY T1.City ORDER BY count(*) DESC	flight_2
+SELECT T1.City FROM AIRPORTS AS T1 JOIN FLIGHTS AS T2 ON T1.AirportCode  =  T2.SourceAirport GROUP BY T1.City ORDER BY count(*) DESC LIMIT 1	flight_2
+SELECT T1.AirportCode FROM AIRPORTS AS T1 JOIN FLIGHTS AS T2 ON T1.AirportCode  =  T2.DestAirport	flight_2
+SELECT T1.AirportCode FROM AIRPORTS AS T1 JOIN FLIGHTS AS T2 ON T1.AirportCode  =  T2.DestAirport OR T1.AirportCode  =  T2.SourceAirport	flight_2
+SELECT T1.AirportCode FROM AIRPORTS AS T1 JOIN FLIGHTS AS T2 ON T1.AirportCode  =  T2.DestAirport OR T1.AirportCode  =  T2.SourceAirport GROUP BY T1.AirportCode ORDER BY count(*) DESC LIMIT 1	flight_2
+SELECT T1.AirportCode FROM AIRPORTS AS T1 JOIN FLIGHTS AS T2 ON T1.AirportCode  =  T2.DestAirport	flight_2
+SELECT T1.AirportCode FROM AIRPORTS AS T1 JOIN FLIGHTS AS T2 ON T1.AirportCode  =  T2.DestAirport OR T1.AirportCode  =  T2.SourceAirport	flight_2
+SELECT T1.AirportCode FROM AIRPORTS AS T1 JOIN FLIGHTS AS T2 ON T1.AirportCode  =  T2.DestAirport OR T1.AirportCode  =  T2.SourceAirport GROUP BY T1.AirportCode ORDER BY count(*) LIMIT 1	flight_2
+SELECT count(*) ,  T1.Airline FROM AIRLINES AS T1 JOIN FLIGHTS AS T2 ON T1.uid  =  T2.Airline GROUP BY T1.Airline	flight_2
+SELECT T1.Airline FROM AIRLINES AS T1 JOIN FLIGHTS AS T2 ON T1.uid  =  T2.Airline GROUP BY T1.Airline ORDER BY count(*) DESC LIMIT 1	flight_2
+SELECT Abbreviation ,  Country FROM AIRLINES	flight_2
+SELECT T1.Abbreviation ,  T1.Country FROM AIRLINES AS T1 JOIN FLIGHTS AS T2 ON T1.uid  =  T2.Airline GROUP BY T1.Airline ORDER BY count(*)	flight_2
+SELECT T1.Abbreviation ,  T1.Country FROM AIRLINES AS T1 JOIN FLIGHTS AS T2 ON T1.uid  =  T2.Airline GROUP BY T1.Airline ORDER BY count(*) LIMIT 1	flight_2
+SELECT * FROM FLIGHTS WHERE SourceAirport  =  "AHD"	flight_2
+SELECT T1.Airline FROM AIRLINES AS T1 JOIN FLIGHTS AS T2 ON T1.uid  =  T2.Airline WHERE T2.SourceAirport  =  "AHD"	flight_2
+SELECT * FROM FLIGHTS WHERE DestAirport  =  "AHD"	flight_2
+SELECT T1.Airline FROM AIRLINES AS T1 JOIN FLIGHTS AS T2 ON T1.uid  =  T2.Airline WHERE T2.DestAirport  =  "AHD"	flight_2
+SELECT T1.Airline FROM AIRLINES AS T1 JOIN FLIGHTS AS T2 ON T1.uid  =  T2.Airline WHERE T2.SourceAirport  =  "APG"	flight_2
+SELECT T1.Airline FROM AIRLINES AS T1 JOIN FLIGHTS AS T2 ON T1.uid  =  T2.Airline WHERE T2.SourceAirport  =  "APG" INTERSECT SELECT T1.Airline FROM AIRLINES AS T1 JOIN FLIGHTS AS T2 ON T1.uid  =  T2.Airline WHERE T2.SourceAirport  =  "CVO"	flight_2
+SELECT T1.Airline FROM AIRLINES AS T1 JOIN FLIGHTS AS T2 ON T1.uid  =  T2.Airline WHERE T2.SourceAirport  =  "CVO"	flight_2
+SELECT T1.Airline FROM AIRLINES AS T1 JOIN FLIGHTS AS T2 ON T1.uid  =  T2.Airline WHERE T2.SourceAirport  =  "CVO" EXCEPT SELECT T1.Airline FROM AIRLINES AS T1 JOIN FLIGHTS AS T2 ON T1.uid  =  T2.Airline WHERE T2.SourceAirport  =  "APG"	flight_2
+SELECT DISTINCT Airline FROM AIRLINES	flight_2
+SELECT T1.Airline FROM AIRLINES AS T1 JOIN FLIGHTS AS T2 ON T1.uid  =  T2.Airline GROUP BY T1.Airline HAVING count(*)  >  10	flight_2
+SELECT DISTINCT Airline FROM AIRLINES	flight_2
+SELECT T1.Airline FROM AIRLINES AS T1 JOIN FLIGHTS AS T2 ON T1.uid  =  T2.Airline GROUP BY T1.Airline HAVING count(*)  <  200	flight_2
+SELECT FlightNo FROM FLIGHTS	flight_2
+SELECT T1.FlightNo FROM FLIGHTS AS T1 JOIN AIRLINES AS T2 ON T2.uid  =  T1.Airline WHERE T2.Airline  =  "United Airlines"	flight_2
+SELECT FlightNo FROM FLIGHTS	flight_2
+SELECT FlightNo FROM FLIGHTS WHERE SourceAirport  =  "APG"	flight_2
+SELECT FlightNo FROM FLIGHTS	flight_2
+SELECT FlightNo FROM FLIGHTS WHERE DestAirport  =  "APG"	flight_2
+SELECT FlightNo FROM FLIGHTS	flight_2
+SELECT T1.FlightNo FROM FLIGHTS AS T1 JOIN AIRPORTS AS T2 ON T1.SourceAirport   =  T2.AirportCode	flight_2
+SELECT T1.FlightNo FROM FLIGHTS AS T1 JOIN AIRPORTS AS T2 ON T1.SourceAirport   =  T2.AirportCode WHERE T2.City  =  "Aberdeen"	flight_2
+SELECT FlightNo FROM FLIGHTS	flight_2
+SELECT T1.FlightNo FROM FLIGHTS AS T1 JOIN AIRPORTS AS T2 ON T1.DestAirport   =  T2.AirportCode	flight_2
+SELECT T1.FlightNo FROM FLIGHTS AS T1 JOIN AIRPORTS AS T2 ON T1.DestAirport   =  T2.AirportCode WHERE T2.City  =  "Aberdeen"	flight_2
+SELECT * FROM Flights AS T1 JOIN Airports AS T2 ON T1.DestAirport  =  T2.AirportCode WHERE T2.city  =  "Aberdeen"	flight_2
+SELECT * FROM Flights AS T1 JOIN Airports AS T2 ON T1.DestAirport  =  T2.AirportCode WHERE T2.city  =  "Aberdeen" OR T2.city  =  "Abilene"	flight_2
+SELECT count(*) FROM Flights AS T1 JOIN Airports AS T2 ON T1.DestAirport  =  T2.AirportCode WHERE T2.city  =  "Aberdeen" OR T2.city  =  "Abilene"	flight_2
+SELECT SourceAirport FROM Flights	flight_2
+SELECT SourceAirport FROM Flights UNION SELECT DestAirport FROM Flights	flight_2
+SELECT AirportName FROM Airports WHERE AirportCode NOT IN (SELECT SourceAirport FROM Flights UNION SELECT DestAirport FROM Flights)	flight_2
+SELECT * FROM pets	pets_1
+SELECT * FROM pets WHERE weight  >  10	pets_1
+SELECT count(*) FROM pets WHERE weight  >  10	pets_1
+SELECT * FROM pets ORDER BY pet_age	pets_1
+SELECT weight FROM pets ORDER BY pet_age	pets_1
+SELECT weight FROM pets ORDER BY pet_age LIMIT 1	pets_1
+SELECT DISTINCT petType FROM pets	pets_1
+SELECT max(weight) ,  petType FROM pets GROUP BY petType	pets_1
+SELECT * FROM student AS T1 JOIN has_pet AS T2 ON T1.stuid  =  T2.stuid WHERE T1.age  >  20	pets_1
+SELECT count(*) FROM student AS T1 JOIN has_pet AS T2 ON T1.stuid  =  T2.stuid WHERE T1.age  >  20	pets_1
+SELECT * FROM student WHERE sex  =  'F'	pets_1
+SELECT * FROM student AS T1 JOIN has_pet AS T2 ON T1.stuid  =  T2.stuid WHERE T1.sex  =  'F'	pets_1
+SELECT count(*) FROM student AS T1 JOIN has_pet AS T2 ON T1.stuid  =  T2.stuid JOIN pets AS T3 ON T2.petid  =  T3.petid WHERE T1.sex  =  'F' AND T3.pettype  =  'dog'	pets_1
+SELECT DISTINCT pettype FROM pets	pets_1
+SELECT count(DISTINCT pettype) FROM pets	pets_1
+SELECT DISTINCT T1.Fname FROM student AS T1	pets_1
+SELECT DISTINCT T1.Fname FROM student AS T1 JOIN has_pet AS T2 ON T1.stuid  =  T2.stuid JOIN pets AS T3 ON T3.petid  =  T2.petid WHERE T3.pettype  =  'cat'	pets_1
+SELECT DISTINCT T1.Fname FROM student AS T1 JOIN has_pet AS T2 ON T1.stuid  =  T2.stuid JOIN pets AS T3 ON T3.petid  =  T2.petid WHERE T3.pettype  =  'cat' OR T3.pettype  =  'dog'	pets_1
+SELECT * FROM student AS T1 JOIN has_pet AS T2 ON T1.stuid  =  T2.stuid JOIN pets AS T3 ON T3.petid  =  T2.petid WHERE T3.pettype  =  'dog'	pets_1
+SELECT * FROM student AS T1 JOIN has_pet AS T2 ON T1.stuid  =  T2.stuid JOIN pets AS T3 ON T3.petid  =  T2.petid WHERE T3.pettype  =  'cat' INTERSECT SELECT * FROM student AS T1 JOIN has_pet AS T2 ON T1.stuid  =  T2.stuid JOIN pets AS T3 ON T3.petid  =  T2.petid WHERE T3.pettype  =  'dog'	pets_1
+SELECT T1.Fname FROM student AS T1 JOIN has_pet AS T2 ON T1.stuid  =  T2.stuid JOIN pets AS T3 ON T3.petid  =  T2.petid WHERE T3.pettype  =  'cat' INTERSECT SELECT T1.Fname FROM student AS T1 JOIN has_pet AS T2 ON T1.stuid  =  T2.stuid JOIN pets AS T3 ON T3.petid  =  T2.petid WHERE T3.pettype  =  'dog'	pets_1
+SELECT * FROM student WHERE stuid NOT IN (SELECT T1.stuid FROM student AS T1 JOIN has_pet AS T2 ON T1.stuid  =  T2.stuid JOIN pets AS T3 ON T3.petid  =  T2.petid WHERE T3.pettype  =  'cat')	pets_1
+SELECT major FROM student WHERE stuid NOT IN (SELECT T1.stuid FROM student AS T1 JOIN has_pet AS T2 ON T1.stuid  =  T2.stuid JOIN pets AS T3 ON T3.petid  =  T2.petid WHERE T3.pettype  =  'cat')	pets_1
+SELECT major ,  age FROM student WHERE stuid NOT IN (SELECT T1.stuid FROM student AS T1 JOIN has_pet AS T2 ON T1.stuid  =  T2.stuid JOIN pets AS T3 ON T3.petid  =  T2.petid WHERE T3.pettype  =  'cat')	pets_1
+SELECT stuid FROM student	pets_1
+SELECT T1.stuid FROM student AS T1 JOIN has_pet AS T2 ON T1.stuid  =  T2.stuid JOIN pets AS T3 ON T3.petid  =  T2.petid WHERE T3.pettype  =  'cat'	pets_1
+SELECT stuid FROM student EXCEPT SELECT T1.stuid FROM student AS T1 JOIN has_pet AS T2 ON T1.stuid  =  T2.stuid JOIN pets AS T3 ON T3.petid  =  T2.petid WHERE T3.pettype  =  'cat'	pets_1
+SELECT * FROM student AS T1 JOIN has_pet AS T2 ON T1.stuid  =  T2.stuid JOIN pets AS T3 ON T3.petid  =  T2.petid WHERE T3.pettype  =  'dog'	pets_1
+SELECT * FROM student AS T1 JOIN has_pet AS T2 ON T1.stuid  =  T2.stuid JOIN pets AS T3 ON T3.petid  =  T2.petid WHERE T3.pettype  =  'dog' EXCEPT SELECT * FROM student AS T1 JOIN has_pet AS T2 ON T1.stuid  =  T2.stuid JOIN pets AS T3 ON T3.petid  =  T2.petid WHERE T3.pettype  =  'cat'	pets_1
+SELECT T1.fname ,  T1.age FROM student AS T1 JOIN has_pet AS T2 ON T1.stuid  =  T2.stuid JOIN pets AS T3 ON T3.petid  =  T2.petid WHERE T3.pettype  =  'dog' EXCEPT SELECT T1.fname ,  T1.age FROM student AS T1 JOIN has_pet AS T2 ON T1.stuid  =  T2.stuid JOIN pets AS T3 ON T3.petid  =  T2.petid WHERE T3.pettype  =  'cat'	pets_1
+SELECT * FROM pets ORDER BY pet_age LIMIT 1	pets_1
+SELECT pettype FROM pets ORDER BY pet_age LIMIT 1	pets_1
+SELECT pettype ,  weight FROM pets ORDER BY pet_age LIMIT 1	pets_1
+SELECT petid FROM pets	pets_1
+SELECT petid FROM pets WHERE pet_age  >  1	pets_1
+SELECT petid ,  weight FROM pets WHERE pet_age  >  1	pets_1
+SELECT DISTINCT pettype FROM pets	pets_1
+SELECT max(pet_age) ,  pettype FROM pets GROUP BY pettype	pets_1
+SELECT avg(pet_age) ,  pettype FROM pets GROUP BY pettype	pets_1
+SELECT * FROM pets	pets_1
+SELECT avg(weight) ,  pettype FROM pets GROUP BY pettype	pets_1
+SELECT * FROM student AS T1 JOIN has_pet AS T2 ON T1.stuid  =  T2.stuid	pets_1
+SELECT DISTINCT T1.fname FROM student AS T1 JOIN has_pet AS T2 ON T1.stuid  =  T2.stuid	pets_1
+SELECT DISTINCT T1.fname ,  T1.age FROM student AS T1 JOIN has_pet AS T2 ON T1.stuid  =  T2.stuid	pets_1
+SELECT * FROM student AS T1 JOIN has_pet AS T2 ON T1.stuid  =  T2.stuid	pets_1
+SELECT * FROM student AS T1 JOIN has_pet AS T2 ON T1.stuid  =  T2.stuid WHERE T1.Lname  =  'Smith'	pets_1
+SELECT T2.petid FROM student AS T1 JOIN has_pet AS T2 ON T1.stuid  =  T2.stuid WHERE T1.Lname  =  'Smith'	pets_1
+SELECT * FROM student AS T1 JOIN has_pet AS T2 ON T1.stuid  =  T2.stuid	pets_1
+SELECT count(*) ,  T1.stuid FROM student AS T1 JOIN has_pet AS T2 ON T1.stuid  =  T2.stuid GROUP BY T1.stuid	pets_1
+SELECT T1.fname ,  T1.sex FROM student AS T1	pets_1
+SELECT T1.fname ,  T1.sex FROM student AS T1 JOIN has_pet AS T2 ON T1.stuid  =  T2.stuid GROUP BY T1.stuid HAVING count(*)  >  1	pets_1
+SELECT petid FROM pets WHERE pet_age  =  3 AND pettype  =  'cat'	pets_1
+SELECT * FROM student AS T1 JOIN has_pet AS T2 ON T1.stuid  =  T2.stuid JOIN pets AS T3 ON T3.petid  =  T2.petid WHERE T3.pet_age  =  3 AND T3.pettype  =  'cat'	pets_1
+SELECT T1.lname FROM student AS T1 JOIN has_pet AS T2 ON T1.stuid  =  T2.stuid JOIN pets AS T3 ON T3.petid  =  T2.petid WHERE T3.pet_age  =  3 AND T3.pettype  =  'cat'	pets_1
+SELECT * FROM student WHERE stuid NOT IN (SELECT T1.stuid FROM student AS T1 JOIN has_pet AS T2 ON T1.stuid  =  T2.stuid)	pets_1
+SELECT avg(age) FROM student WHERE stuid NOT IN (SELECT T1.stuid FROM student AS T1 JOIN has_pet AS T2 ON T1.stuid  =  T2.stuid)	pets_1
+SELECT Name FROM country	world_1
+SELECT Name FROM country WHERE IndepYear  >  1950	world_1
+SELECT count(*) FROM country	world_1
+SELECT count(*) FROM country WHERE GovernmentForm  =  "Republic"	world_1
+SELECT * FROM country WHERE Region  =  "Caribbean"	world_1
+SELECT SurfaceArea FROM country WHERE Region  =  "Caribbean"	world_1
+SELECT sum(SurfaceArea) FROM country WHERE Region  =  "Caribbean"	world_1
+SELECT Continent FROM country	world_1
+SELECT Continent FROM country WHERE Name  =  "Anguilla"	world_1
+SELECT Region FROM country	world_1
+SELECT Region FROM country AS T1 JOIN city AS T2 ON T1.Code  =  T2.CountryCode WHERE T2.Name  =  "Kabul"	world_1
+SELECT LANGUAGE FROM countrylanguage	world_1
+SELECT T2.Language FROM country AS T1 JOIN countrylanguage AS T2 ON T1.Code  =  T2.CountryCode WHERE T1.Name  =  "Aruba"	world_1
+SELECT T2.Language FROM country AS T1 JOIN countrylanguage AS T2 ON T1.Code  =  T2.CountryCode WHERE T1.Name  =  "Aruba" ORDER BY Percentage DESC LIMIT 1	world_1
+SELECT Population ,  LifeExpectancy FROM country	world_1
+SELECT Population ,  LifeExpectancy FROM country WHERE Name  =  "Brazil"	world_1
+SELECT Region FROM country WHERE Name  =  "Angola"	world_1
+SELECT Population FROM country WHERE Name  =  "Angola"	world_1
+SELECT LifeExpectancy FROM country	world_1
+SELECT LifeExpectancy FROM country WHERE Region  =  "Central Africa"	world_1
+SELECT avg(LifeExpectancy) FROM country WHERE Region  =  "Central Africa"	world_1
+SELECT Name FROM country WHERE Continent  =  "Asia"	world_1
+SELECT Name FROM country WHERE Continent  =  "Asia" ORDER BY LifeExpectancy LIMIT 1	world_1
+SELECT sum(Population) FROM country WHERE Continent  =  "Asia"	world_1
+SELECT max(GNP) FROM country WHERE Continent  =  "Asia"	world_1
+SELECT * FROM country WHERE Continent  =  "Africa"	world_1
+SELECT * FROM country WHERE Continent  =  "Africa" AND GovernmentForm  =  "Republic"	world_1
+SELECT avg(LifeExpectancy) FROM country WHERE Continent  =  "Africa" AND GovernmentForm  =  "Republic"	world_1
+SELECT * FROM country WHERE Continent  =  "Asia" OR Continent  =  "Europe"	world_1
+SELECT SurfaceArea FROM country WHERE Continent  =  "Asia" OR Continent  =  "Europe"	world_1
+SELECT sum(SurfaceArea) FROM country WHERE Continent  =  "Asia" OR Continent  =  "Europe"	world_1
+SELECT Population FROM city WHERE District  =  "Gelderland"	world_1
+SELECT sum(Population) FROM city WHERE District  =  "Gelderland"	world_1
+SELECT * FROM country	world_1
+SELECT * FROM country WHERE GovernmentForm  =  "US Territory"	world_1
+SELECT avg(GNP) ,  sum(population) FROM country WHERE GovernmentForm  =  "US Territory"	world_1
+SELECT DISTINCT LANGUAGE FROM countrylanguage	world_1
+SELECT count(DISTINCT LANGUAGE) FROM countrylanguage	world_1
+SELECT DISTINCT GovernmentForm FROM country WHERE Continent  =  "Africa"	world_1
+SELECT count(DISTINCT GovernmentForm) FROM country WHERE Continent  =  "Africa"	world_1
+SELECT * FROM country WHERE Name  =  "Aruba"	world_1
+SELECT T2.Language FROM country AS T1 JOIN countrylanguage AS T2 ON T1.Code  =  T2.CountryCode WHERE T1.Name  =  "Aruba"	world_1
+SELECT COUNT(T2.Language) FROM country AS T1 JOIN countrylanguage AS T2 ON T1.Code  =  T2.CountryCode WHERE T1.Name  =  "Aruba"	world_1
+SELECT T2.Language FROM country AS T1 JOIN countrylanguage AS T2 ON T1.Code  =  T2.CountryCode WHERE T1.Name  =  "Afghanistan"	world_1
+SELECT COUNT(*) FROM country AS T1 JOIN countrylanguage AS T2 ON T1.Code  =  T2.CountryCode WHERE T1.Name  =  "Afghanistan" AND IsOfficial  =  "T"	world_1
+SELECT count(*) ,  T1.Name FROM country AS T1 JOIN countrylanguage AS T2 ON T1.Code  =  T2.CountryCode GROUP BY T1.Name	world_1
+SELECT T1.Name FROM country AS T1 JOIN countrylanguage AS T2 ON T1.Code  =  T2.CountryCode GROUP BY T1.Name ORDER BY COUNT(*) DESC LIMIT 1	world_1
+SELECT COUNT(*) ,  T1.Continent FROM country AS T1 JOIN countrylanguage AS T2 ON T1.Code  =  T2.CountryCode GROUP BY T1.Continent	world_1
+SELECT T1.Continent FROM country AS T1 JOIN countrylanguage AS T2 ON T1.Code  =  T2.CountryCode GROUP BY T1.Continent ORDER BY COUNT(*) DESC LIMIT 1	world_1
+SELECT T1.Name FROM country AS T1 JOIN countrylanguage AS T2 ON T1.Code  =  T2.CountryCode WHERE T2.Language  =  "English"	world_1
+SELECT T1.Name FROM country AS T1 JOIN countrylanguage AS T2 ON T1.Code  =  T2.CountryCode WHERE T2.Language  =  "English" INTERSECT SELECT T1.Name FROM country AS T1 JOIN countrylanguage AS T2 ON T1.Code  =  T2.CountryCode WHERE T2.Language  =  "Dutch"	world_1
+SELECT COUNT(*) FROM (SELECT T1.Name FROM country AS T1 JOIN countrylanguage AS T2 ON T1.Code  =  T2.CountryCode WHERE T2.Language  =  "English" INTERSECT SELECT T1.Name FROM country AS T1 JOIN countrylanguage AS T2 ON T1.Code  =  T2.CountryCode WHERE T2.Language  =  "Dutch")	world_1
+SELECT T1.Name FROM country AS T1 JOIN countrylanguage AS T2 ON T1.Code  =  T2.CountryCode WHERE T2.Language  =  "English"	world_1
+SELECT T1.Name FROM country AS T1 JOIN countrylanguage AS T2 ON T1.Code  =  T2.CountryCode WHERE T2.Language  =  "English" INTERSECT SELECT T1.Name FROM country AS T1 JOIN countrylanguage AS T2 ON T1.Code  =  T2.CountryCode WHERE T2.Language  =  "French"	world_1
+SELECT T2.Language FROM country AS T1 JOIN countrylanguage AS T2 ON T1.Code  =  T2.CountryCode WHERE T2.IsOfficial  =  "T"	world_1
+SELECT T1.Name FROM country AS T1 JOIN countrylanguage AS T2 ON T1.Code  =  T2.CountryCode WHERE T2.Language  =  "English" AND T2.IsOfficial  =  "T"	world_1
+SELECT T1.Name FROM country AS T1 JOIN countrylanguage AS T2 ON T1.Code  =  T2.CountryCode WHERE T2.Language  =  "English" AND T2.IsOfficial  =  "T" INTERSECT SELECT T1.Name FROM country AS T1 JOIN countrylanguage AS T2 ON T1.Code  =  T2.CountryCode WHERE T2.Language  =  "French" AND T2.IsOfficial  =  "T"	world_1
+SELECT T1.name FROM country AS T1 JOIN countrylanguage AS T2 ON T1.Code  =  T2.CountryCode WHERE T2.Language  =  "Chinese"	world_1
+SELECT DISTINCT T1.Continent FROM country AS T1 JOIN countrylanguage AS T2 ON T1.Code  =  T2.CountryCode WHERE T2.Language  =  "Chinese"	world_1
+SELECT COUNT( DISTINCT Continent) FROM country AS T1 JOIN countrylanguage AS T2 ON T1.Code  =  T2.CountryCode WHERE T2.Language  =  "Chinese"	world_1
+SELECT DISTINCT Region FROM country	world_1
+SELECT DISTINCT T1.Region FROM country AS T1 JOIN countrylanguage AS T2 ON T1.Code  =  T2.CountryCode WHERE T2.Language  =  "English" OR T2.Language  =  "Dutch"	world_1
+SELECT T2.Language ,  T1.Name FROM country AS T1 JOIN countrylanguage AS T2 ON T1.Code  =  T2.CountryCode WHERE IsOfficial  =  "T"	world_1
+SELECT * FROM country AS T1 JOIN countrylanguage AS T2 ON T1.Code  =  T2.CountryCode WHERE T2.Language  =  "English" AND IsOfficial  =  "T" UNION SELECT * FROM country AS T1 JOIN countrylanguage AS T2 ON T1.Code  =  T2.CountryCode WHERE T2.Language  =  "Dutch" AND IsOfficial  =  "T"	world_1
+SELECT DISTINCT T2.Language FROM country AS T1 JOIN countrylanguage AS T2 ON T1.Code  =  T2.CountryCode WHERE T1.Continent  =  "Asia"	world_1
+SELECT T2.Language FROM country AS T1 JOIN countrylanguage AS T2 ON T1.Code  =  T2.CountryCode WHERE T1.Continent  =  "Asia" GROUP BY T2.Language ORDER BY COUNT (*) DESC LIMIT 1	world_1
+SELECT * FROM country WHERE GovernmentForm  =  "Republic"	world_1
+SELECT T2.Language FROM country AS T1 JOIN countrylanguage AS T2 ON T1.Code  =  T2.CountryCode WHERE T1.GovernmentForm  =  "Republic" GROUP BY T2.Language HAVING COUNT(*)  =  1	world_1
+SELECT T1.Name FROM city AS T1 JOIN countrylanguage AS T2 ON T1.CountryCode  =  T2.CountryCode WHERE T2.Language  =  "English"	world_1
+SELECT T1.Name ,  T1.Population FROM city AS T1 JOIN countrylanguage AS T2 ON T1.CountryCode  =  T2.CountryCode WHERE T2.Language  =  "English" ORDER BY T1.Population DESC LIMIT 1	world_1
+SELECT Name ,  Population ,  LifeExpectancy FROM country WHERE Continent  =  "Asia"	world_1
+SELECT Name ,  Population ,  LifeExpectancy FROM country WHERE Continent  =  "Asia" ORDER BY SurfaceArea DESC LIMIT 1	world_1
+SELECT T2.Language ,  T1.Name FROM country AS T1 JOIN countrylanguage AS T2 ON T1.Code  =  T2.CountryCode WHERE T2.IsOfficial  =  "T"	world_1
+SELECT * FROM country WHERE Name NOT IN (SELECT T1.Name FROM country AS T1 JOIN countrylanguage AS T2 ON T1.Code  =  T2.CountryCode WHERE T2.Language  =  "English" AND T2.IsOfficial  =  "T")	world_1
+SELECT avg(LifeExpectancy) FROM country WHERE Name NOT IN (SELECT T1.Name FROM country AS T1 JOIN countrylanguage AS T2 ON T1.Code  =  T2.CountryCode WHERE T2.Language  =  "English" AND T2.IsOfficial  =  "T")	world_1
+SELECT Name FROM country WHERE Name NOT IN (SELECT T1.Name FROM country AS T1 JOIN countrylanguage AS T2 ON T1.Code  =  T2.CountryCode WHERE T2.Language  =  "English")	world_1
+SELECT sum(Population) FROM country WHERE Name NOT IN (SELECT T1.Name FROM country AS T1 JOIN countrylanguage AS T2 ON T1.Code  =  T2.CountryCode WHERE T2.Language  =  "English")	world_1
+SELECT * FROM country AS T1 JOIN countrylanguage AS T2 ON T1.Code  =  T2.CountryCode WHERE T1.HeadOfState  =  "Beatrix"	world_1
+SELECT T2.Language FROM country AS T1 JOIN countrylanguage AS T2 ON T1.Code  =  T2.CountryCode WHERE T1.HeadOfState  =  "Beatrix"	world_1
+SELECT T2.Language FROM country AS T1 JOIN countrylanguage AS T2 ON T1.Code  =  T2.CountryCode WHERE T1.HeadOfState  =  "Beatrix" AND T2.IsOfficial  =  "T"	world_1
+SELECT T1.Name FROM country AS t1	world_1
+SELECT T1.Name FROM country AS t1 WHERE  IndepYear  <  1930	world_1
+SELECT count(DISTINCT T2.Language) FROM country AS T1 JOIN countrylanguage AS T2 ON T1.Code  =  T2.CountryCode WHERE  IndepYear  <  1930 AND T2.IsOfficial  =  "T"	world_1
+SELECT * FROM country WHERE Continent  =  "Europe"	world_1
+SELECT min(SurfaceArea) FROM country WHERE Continent  =  "Europe"	world_1
+SELECT Name FROM country WHERE SurfaceArea  >  (SELECT min(SurfaceArea) FROM country WHERE Continent  =  "Europe")	world_1
+SELECT min(population) FROM country WHERE Continent  =  "Asia"	world_1
+SELECT Name FROM country WHERE Continent  =  "Africa"  AND population  <  (SELECT max(population) FROM country WHERE Continent  =  "Asia")	world_1
+SELECT min(population) FROM country WHERE Continent  =  "Africa"	world_1
+SELECT Name FROM country WHERE Continent  =  "Asia"  AND population  >  (SELECT min(population) FROM country WHERE Continent  =  "Africa")	world_1
+SELECT CountryCode FROM countrylanguage	world_1
+SELECT CountryCode FROM countrylanguage EXCEPT SELECT CountryCode FROM countrylanguage WHERE LANGUAGE  =  "English"	world_1
+SELECT DISTINCT CountryCode FROM countrylanguage	world_1
+SELECT DISTINCT CountryCode FROM countrylanguage WHERE LANGUAGE ! =  "English"	world_1
+SELECT Code FROM country WHERE GovernmentForm ! =  "Republic"	world_1
+SELECT Code FROM country WHERE GovernmentForm ! =  "Republic" EXCEPT SELECT CountryCode FROM countrylanguage WHERE LANGUAGE  =  "English"	world_1
+SELECT T1.Name FROM country AS T1 JOIN countrylanguage AS T2 ON T1.Code  =  T2.CountryCode WHERE T2.IsOfficial  =  'T' AND T2.Language  =  'English'	world_1
+SELECT Name FROM country WHERE Continent  =  'Europe' AND Name NOT IN (SELECT T1.Name FROM country AS T1 JOIN countrylanguage AS T2 ON T1.Code  =  T2.CountryCode WHERE T2.IsOfficial  =  'T' AND T2.Language  =  'English')	world_1
+SELECT DISTINCT T2.Name FROM country AS T1 JOIN city AS T2 ON T2.CountryCode  =  T1.Code WHERE T1.Continent  =  'Europe' AND T1.Name NOT IN (SELECT T3.Name FROM country AS T3 JOIN countrylanguage AS T4 ON T3.Code  =  T4.CountryCode WHERE T4.IsOfficial  =  'T' AND T4.Language  =  'English')	world_1
+SELECT * FROM country AS T1 JOIN countrylanguage AS T2 ON T1.Code  =  T2.CountryCode WHERE T2.Language  =  'Chinese' AND T1.Continent  =  "Asia"	world_1
+SELECT * FROM country AS T1 JOIN countrylanguage AS T2 ON T1.Code  =  T2.CountryCode WHERE T2.IsOfficial  =  'T' AND T2.Language  =  'Chinese' AND T1.Continent  =  "Asia"	world_1
+SELECT DISTINCT T3.Name FROM country AS T1 JOIN countrylanguage AS T2 ON T1.Code  =  T2.CountryCode JOIN city AS T3 ON T1.Code  =  T3.CountryCode WHERE T2.IsOfficial  =  'T' AND T2.Language  =  'Chinese' AND T1.Continent  =  "Asia"	world_1
+SELECT * FROM country ORDER BY Population LIMIT 1	world_1
+SELECT Name ,  SurfaceArea ,  IndepYear FROM country ORDER BY Population LIMIT 1	world_1
+SELECT * FROM country ORDER BY SurfaceArea DESC LIMIT 1	world_1
+SELECT Name ,  population ,  HeadOfState FROM country ORDER BY SurfaceArea DESC LIMIT 1	world_1
+SELECT Name FROM country	world_1
+SELECT T1.Name FROM country AS T1 JOIN countrylanguage AS T2 ON T1.Code  =  T2.CountryCode GROUP BY T1.Name HAVING COUNT(*)  >  2	world_1
+SELECT COUNT(T2.Language) ,  T1.Name FROM country AS T1 JOIN countrylanguage AS T2 ON T1.Code  =  T2.CountryCode GROUP BY T1.Name HAVING COUNT(*)  >  2	world_1
+SELECT avg(Population) FROM city	world_1
+SELECT count(*) ,  District FROM city WHERE Population  >  (SELECT avg(Population) FROM city) GROUP BY District	world_1
+SELECT * FROM country GROUP BY GovernmentForm HAVING avg(LifeExpectancy)  >  72	world_1
+SELECT sum(Population) ,  GovernmentForm FROM country GROUP BY GovernmentForm HAVING avg(LifeExpectancy)  >  72	world_1
+SELECT Continent FROM country GROUP BY Continent HAVING avg(LifeExpectancy)  <  72	world_1
+SELECT sum(Population) ,  avg(LifeExpectancy) ,  Continent FROM country GROUP BY Continent HAVING avg(LifeExpectancy)  <  72	world_1
+SELECT * FROM country ORDER BY SurfaceArea DESC LIMIT 5	world_1
+SELECT Name ,  SurfaceArea FROM country ORDER BY SurfaceArea DESC LIMIT 5	world_1
+SELECT * FROM country ORDER BY Population DESC	world_1
+SELECT Name FROM country ORDER BY Population DESC LIMIT 3	world_1
+SELECT * FROM country ORDER BY Population	world_1
+SELECT Name FROM country ORDER BY Population DESC LIMIT 3	world_1
+SELECT * FROM country WHERE continent  =  "Asia"	world_1
+SELECT count(*) FROM country WHERE continent  =  "Asia"	world_1
+SELECT * FROM country WHERE continent  =  "Europe"	world_1
+SELECT Name FROM country WHERE continent  =  "Europe" AND Population  =  "80000"	world_1
+SELECT * FROM country WHERE Continent  =  "North America"	world_1
+SELECT * FROM country WHERE Continent  =  "North America" AND SurfaceArea  >  3000	world_1
+SELECT sum(Population) ,  avg(SurfaceArea) FROM country WHERE Continent  =  "North America" AND SurfaceArea  >  3000	world_1
+SELECT name FROM city	world_1
+SELECT name FROM city WHERE Population BETWEEN 160000 AND 90000	world_1
+SELECT LANGUAGE FROM countrylanguage	world_1
+SELECT LANGUAGE FROM countrylanguage GROUP BY LANGUAGE ORDER BY count(*) DESC LIMIT 1	world_1
+SELECT Directed_by FROM Cartoon WHERE Title = "Day of the Dark Knight!"	tvshow
+SELECT Channel FROM Cartoon WHERE Title = "Day of the Dark Knight!"	tvshow
+SELECT Title FROM Cartoon WHERE Directed_by = "Ben Jones" OR Directed_by = "Brandon Vietti"	tvshow
+SELECT * FROM TV_Channel WHERE Country = "Italy"	tvshow
+SELECT * FROM TV_Channel WHERE Country = "Poland"	tvshow
+SELECT Country ,  count(*) FROM TV_Channel GROUP BY Country ORDER BY count(*) DESC LIMIT 1	tvshow
+SELECT Channel FROM Cartoon WHERE Title = "The Eyes of Despero!"	tvshow
+SELECT series_name FROM TV_Channel WHERE id IN (SELECT Channel FROM Cartoon WHERE Title = "The Eyes of Despero!")	tvshow
+SELECT count(DISTINCT series_name) ,  count(DISTINCT content) FROM TV_Channel	tvshow
+SELECT Package_Option FROM TV_Channel WHERE series_name = "Rock TV"	tvshow
+SELECT Language FROM TV_Channel WHERE series_name = "Rock TV"	tvshow
+SELECT LANGUAGE ,  count(*) FROM TV_Channel GROUP BY LANGUAGE ORDER BY count(*) ASC LIMIT 1	tvshow
+SELECT Written_by FROM Cartoon WHERE Title = "The Rise of the Blue Beetle!"	tvshow
+SELECT Directed_by FROM Cartoon WHERE Title = "The Rise of the Blue Beetle!"	tvshow
+SELECT T1.series_name FROM TV_Channel AS T1 JOIN Cartoon AS T2 ON T1.id = T2.Channel WHERE T2.Title = "The Rise of the Blue Beetle!"	tvshow
+SELECT Country FROM TV_Channel WHERE series_name = "Sky Radio"	tvshow
+SELECT Content FROM TV_Channel WHERE series_name = "Sky Radio"	tvshow
+SELECT T2.Title FROM TV_Channel AS T1 JOIN Cartoon AS T2 ON T1.id = T2.Channel WHERE T1.series_name = "Sky Radio"	tvshow
+SELECT Rating FROM TV_series WHERE Episode = "Double Down"	tvshow
+SELECT Rating FROM TV_series WHERE Episode = "Keepers"	tvshow
+SELECT Episode ,  Rating FROM TV_series ORDER BY Rating DESC LIMIT 3	tvshow
+SELECT Weekly_Rank FROM TV_series WHERE Episode = "Emily"	tvshow
+SELECT Share FROM TV_series WHERE Episode = "Emily"	tvshow
+SELECT max(SHARE) , min(SHARE) FROM TV_series	tvshow
+SELECT Rating FROM TV_series WHERE Episode = "A Love of a Lifetime"	tvshow
+SELECT Weekly_Rank FROM TV_series WHERE Episode = "A Love of a Lifetime"	tvshow
+SELECT T1.series_name FROM TV_Channel AS T1 JOIN TV_series AS T2 ON T1.id = T2.Channel WHERE T2.Episode = "A Love of a Lifetime"	tvshow
+SELECT Content FROM TV_Channel WHERE series_name = "Sky Radio"	tvshow
+SELECT Language FROM TV_Channel WHERE series_name = "Sky Radio"	tvshow
+SELECT T2.Episode FROM TV_Channel AS T1 JOIN TV_series AS T2 ON T1.id = T2.Channel WHERE T1.series_name = "Sky Radio"	tvshow
+SELECT Original_air_date FROM Cartoon WHERE Title = "Fall of the Blue Beetle!"	tvshow
+SELECT Production_code FROM Cartoon WHERE Title = "Fall of the Blue Beetle!"	tvshow
+SELECT production_code ,  channel FROM cartoon ORDER BY original_air_date LIMIT 1	tvshow
+SELECT Title FROM Cartoon WHERE Directed_by = "Ben Jones"	tvshow
+SELECT Title FROM Cartoon WHERE Written_by = "Todd Casey"	tvshow
+SELECT T1.country FROM TV_Channel AS T1 JOIN cartoon AS T2 ON T1.id = T2.Channel WHERE T2.Written_by  =  'Todd Casey'	tvshow
+SELECT T1.country FROM TV_Channel AS T1 JOIN cartoon AS T2 ON T1.id = T2.Channel WHERE T2.Written_by  =  'Steven Melching'	tvshow
+SELECT country FROM TV_Channel EXCEPT SELECT T1.country FROM TV_Channel AS T1 JOIN cartoon AS T2 ON T1.id = T2.Channel WHERE T2.written_by  =  'Todd Casey'	tvshow
+SELECT Directed_by FROM Cartoon WHERE Title = "Deep Cover for Batman!"	tvshow
+SELECT Production_code FROM Cartoon WHERE Title = "Deep Cover for Batman!"	tvshow
+SELECT T1.series_name ,  T1.country FROM TV_Channel AS T1 JOIN cartoon AS T2 ON T1.id = T2.Channel WHERE T2.directed_by  =  'Michael Chang' INTERSECT SELECT T1.series_name ,  T1.country FROM TV_Channel AS T1 JOIN cartoon AS T2 ON T1.id = T2.Channel WHERE T2.directed_by  =  'Ben Jones'	tvshow

duckdb-nsql/eval/metrics/test_suite_sql_eval/evaluation_examples/predict.txt ADDED Viewed

	@@ -0,0 +1,453 @@

+select * from airlines
+select T1.Airline,T2.AirportName from airlines as T1 join airports as T2 where T2.AirportName = 'terminal'
+select Country from airports where Country = 'terminal'
+select * from airlines
+select AirportName from airports where AirportName = 'terminal'
+select T1.Airline,T2.AirportName from airlines as T1 join airports as T2
+select * from airlines as T1 join airports as T2 where T1.Airline = 'terminal' and T2.AirportName = 'terminal'
+select * from airports where AirportName = 'terminal'
+select T1.Airline,T2.AirportName from airlines as T1 join airports as T2
+select * from airlines
+select count(*) from airports where Country = 'terminal'
+select * from airlines
+select count(*) from airports where Country = 'terminal'
+select * from airlines
+select count(*) from airports where Country = 'terminal'
+select Airline from airlines
+select AirportName from airports where AirportName = 'terminal'
+select T1.Airline from airlines as T1 join airports as T2 where T2.AirportName = 'terminal'
+select count(*) from airports where Country = 'terminal'
+select Country,City from airports
+select AirportCode,AirportName,City from airports where Country = 'terminal'
+select AirportName from airports
+select AirportName,AirportCode from airports where AirportName = 'terminal'
+select AirportName from airports
+select AirportName from airports where AirportName = 'terminal'
+select * from flights where DestAirport = 'terminal'
+select count(*) from airports where Country = 'terminal'
+select City from airports where Country = 'terminal'
+select count(*) from airports where Country = 'terminal'
+select * from flights where SourceAirport = 'terminal'
+select count(*) from airports where Country = 'terminal'
+select * from airlines
+select count(*) from airports where Country = 'terminal'
+select * from flights where DestAirport = 'terminal'
+select City,AirportName from airports where Country = 'terminal'
+select count(*) from airports where Country = 'terminal'
+select * from airlines
+select count(*) from airports where Country = 'terminal'
+select * from airlines
+select count(*) from airports where Country = 'terminal'
+select count(*) from airports where Country = 'terminal'
+select * from airlines
+select AirportName from airports where Country = 'terminal'
+select count(*) from airports where Country = 'terminal'
+select T2.AirportName,T1.Airline from airlines as T1 join airports as T2 where T2.AirportName = 'terminal'
+select count(*) from airports where Country = 'terminal'
+select City from airports
+select count(*) from airlines group by uid
+select Country from airports group by Country order by count(*) desc limit 1
+select City from airports
+select count(*) from airlines group by uid
+select Country from airports group by Country order by count(*) desc limit 1
+select AirportCode from airports where AirportName = 'terminal'
+select AirportCode from airports
+select FlightNo,count(*) from flights group by DestAirport order by count(*) desc limit 1
+select AirportCode from airports where AirportName = 'terminal'
+select AirportCode from airports
+select * from airlines group by uid order by count(*) asc limit 1
+select count(*) from airlines group by uid
+select Airline from airlines group by uid order by count(*) desc limit 1
+select T2.Country,T1.Country from airlines as T1 join airports as T2
+select T1.CountryAbbrev,count(*) from airports as T1 join flights as T2 on T1.AirportCode = T2.DestAirport order by T2.FlightNo asc
+select Country from airports group by Country order by count(*) asc limit 1
+select City from airports where Country = 'terminal'
+select Airline from airlines where Airline = 'terminal'
+select AirportName from airports where AirportName = 'terminal'
+select Airline from airlines where Airline = 'terminal'
+select * from airports where AirportName = 'terminal'
+select T1.Airline,T2.AirportName from airlines as T1 join airports as T2 where T2.AirportName = 'terminal'
+select * from airports where AirportName = 'terminal'
+select T1.Airline,T2.AirportName from airlines as T1 join airports as T2 where T2.AirportName = 'terminal'
+select * from airlines
+select Country from airports group by Country having count(*) > 'terminal'
+select * from airlines
+select Country from airlines where Airline = 'terminal'
+select FlightNo from flights
+select T1.Airline,T2.AirportName from airlines as T1 join airports as T2 where T2.AirportName = 'terminal'
+select FlightNo from flights
+select T1.Airline,T2.AirportName from airlines as T1 join airports as T2 where T2.AirportName = 'terminal'
+select FlightNo from flights
+select T3.FlightNo,T1.Airline from airlines as T1 join airports as T2 join flights as T3 where T2.AirportName = 'terminal'
+select FlightNo from flights
+select T2.AirportName,T1.Airline from airlines as T1 join airports as T2 where T2.AirportName = 'terminal'
+select AirportName from airports where AirportName = 'terminal'
+select FlightNo from flights
+select T2.AirportName,T1.Airline from airlines as T1 join airports as T2 where T2.AirportName = 'terminal'
+select City from airports where Country = 'terminal'
+select City from airports where Country = 'terminal'
+select AirportName from airports where AirportName = 'terminal'
+select count(*) from airports where Country = 'terminal'
+select AirportName from airports
+select AirportName from airports
+select * from airports where AirportName like 'terminal'
+select * from Pets where weight = 'terminal'
+select * from Pets group by PetID having count(*) > 'terminal'
+select count(*) from Student where LName = 'terminal'
+select * from Student where Age = 'terminal'
+select T3.weight,T1.Age from Student as T1 join Has_Pet as T2 on T1.StuID = T2.StuID join Pets as T3 on T2.PetID = T3.PetID group by T1.Sex
+select * from Student order by Age asc limit 1
+select PetType from Pets
+select PetType,count(*) from Pets group by PetType
+select * from Student where Age > 'terminal'
+select count(*) from Student where LName = 'terminal'
+select * from Student where Age = 'terminal'
+select T1.Fname,* from Student as T1 join Has_Pet as T2 on T1.StuID = T2.StuID join Pets as T3 on T2.PetID = T3.PetID where T3.pet_age = 'terminal'
+select count(*) from Student where Age > 'terminal'
+select PetType from Pets
+select count(*) from Student
+select Fname from Student group by StuID
+select T1.Fname from Student as T1 join Has_Pet as T2 on T1.StuID = T2.StuID join Pets as T3 on T2.PetID = T3.PetID where T3.weight = 'terminal'
+select T1.Fname from Student as T1 join Has_Pet as T2 on T1.StuID = T2.StuID join Pets as T3 on T2.PetID = T3.PetID where T3.PetType = 'terminal'
+select * from Pets where PetType = 'terminal'
+select T1.LName,T1.Fname from Student as T1 join Has_Pet as T2 on T1.StuID = T2.StuID join Pets as T3 on T2.PetID = T3.PetID where T3.PetType = 'terminal'
+select Fname from Student where Sex = 'terminal'
+select * from Student where Fname = 'terminal'
+select * from Student where Fname = 'terminal'
+select T1.Age,count(*) from Student as T1 join Has_Pet as T2 on T1.StuID = T2.StuID join Pets as T3 on T2.PetID = T3.PetID where T3.pet_age = 'terminal'
+select T1.StuID from Student as T1 join Has_Pet as T2 on T1.StuID = T2.StuID group by T2.StuID
+select T3.PetID from Student as T1 join Has_Pet as T2 on T1.StuID = T2.StuID join Pets as T3 on T2.PetID = T3.PetID where T1.Fname = 'terminal'
+select StuID from Student
+select * from Student
+select T1.Fname,* from Student as T1 join Has_Pet as T2 on T1.StuID = T2.StuID join Pets as T3 on T2.PetID = T3.PetID where T3.PetType = 'terminal'
+select T1.Fname,T1.Age from Student as T1 join Has_Pet as T2 on T1.StuID = T2.StuID join Pets as T3 on T2.PetID = T3.PetID where T3.PetID = 'terminal'
+select * from Student order by Age desc limit 1
+select pet_age,PetType from Pets
+select T2.weight,count(*) from Has_Pet as T1 join Pets as T2 on T1.PetID = T2.PetID group by T1.PetID
+select T3.PetID from Student as T1 join Has_Pet as T2 on T1.StuID = T2.StuID join Pets as T3 on T2.PetID = T3.PetID where T1.Fname = 'terminal'
+select weight from Pets where pet_age > 'terminal'
+select T2.weight,count(*) from Has_Pet as T1 join Pets as T2 on T1.PetID = T2.PetID group by T1.PetID
+select PetType from Pets
+select Age,count(*) from Student group by Sex
+select avg(Age) from Student
+select * from Pets where weight = 'terminal'
+select avg(weight),PetType from Pets group by PetType
+select * from Student
+select Fname from Student
+select Fname,Age from Student
+select * from Student
+select Fname from Student where LName = 'terminal' and Fname = 'terminal'
+select Sex from Student where LName = 'terminal'
+select * from Student
+select LName,count(*) from Student group by StuID
+select LName,Fname from Student
+select T1.Fname from Student as T1 join Has_Pet as T2 on T1.StuID = T2.StuID join Pets as T3 on T2.PetID = T3.PetID where T3.weight > 'terminal'
+select StuID from Student where Sex = 'terminal'
+select * from Student where Fname = 'terminal'
+select LName from Student where Sex = 'terminal'
+select * from Pets where PetID not in (select PetID from Pets)
+select avg(Age) from Student where LName = 'terminal' and Fname = 'terminal'
+select Name from country
+select T1.Name from city as T1 join country as T2 on T1.CountryCode = T2.Code where T2.IndepYear > 'terminal'
+select count(*) from city
+select Code2,count(*) from country where Population > 'terminal'
+select Region from country where Region = 'terminal'
+select T2.Region,T1.District from city as T1 join country as T2 on T1.CountryCode = T2.Code group by T2.Region
+select SurfaceArea from country where SurfaceArea > (select avg(SurfaceArea) from country)
+select Region,Continent from country group by Region
+select T1.Name,T2.Name from city as T1 join country as T2 on T1.CountryCode = T2.Code where T2.Region = 'terminal'
+select Name from country
+select T1.District from city as T1 join country as T2 on T1.CountryCode = T2.Code where T2.Region = 'terminal'
+select Language from countrylanguage
+select Language from countrylanguage where Language = 'terminal'
+select GNPOld from country group by GNP order by count(*) desc limit 1
+select LifeExpectancy,Population from country
+select Name from country where Region = 'terminal'
+select T1.District from city as T1 join country as T2 on T1.CountryCode = T2.Code where T2.Region = 'terminal'
+select T1.Population,T2.Population from city as T1 join country as T2 on T1.CountryCode = T2.Code where T2.Region = 'terminal'
+select Population from country
+select GNP,Continent from country where Name = 'terminal'
+select avg(T3.Percentage) from city as T1 join country as T2 on T1.CountryCode = T2.Code join countrylanguage as T3 on T2.Code = T3.CountryCode where T1.District = 'terminal' and T2.GovernmentForm = 'terminal'
+select Name from country where Region = 'terminal'
+select Population from country order by Population asc limit 1
+select count(*) from city where District = 'terminal'
+select count(*) from city group by ID order by count(*) desc limit 1
+select Continent from country where Region = 'terminal'
+select HeadOfState from country where Region = 'terminal' intersect select Region from country where Region = 'terminal'
+select avg(T3.Percentage) from city as T1 join country as T2 on T1.CountryCode = T2.Code join countrylanguage as T3 on T2.Code = T3.CountryCode where T2.Name = 'terminal' and T1.Name = 'terminal'
+select Region from country where Population > 'terminal' and Population >= 'terminal'
+select SurfaceArea,Region from country group by Region
+select sum(SurfaceArea) from country where SurfaceArea = 'terminal'
+select T2.HeadOfState,T1.District from city as T1 join country as T2 on T1.CountryCode = T2.Code where T2.Region = 'terminal'
+select sum(T1.Population) from city as T1 join country as T2 on T1.CountryCode = T2.Code where T2.Name = 'terminal'
+select Language from countrylanguage
+select * from country where Name = 'terminal'
+select avg(Population) from country where Name = 'terminal'
+select Language from countrylanguage
+select count(*) from country where Population > 'terminal'
+select HeadOfState from country where Region = 'terminal'
+select count(*) from country where Population > 'terminal'
+select * from countrylanguage where Language = 'terminal'
+select Language from countrylanguage where Language = 'terminal'
+select count(*) from country where Population > 'terminal'
+select Language from countrylanguage where Language = 'terminal'
+select count(*),count(T2.CountryCode) from country as T1 join countrylanguage as T2 on T1.Code = T2.CountryCode where T1.IndepYear = 'terminal' and T1.Population > 'terminal'
+select count(CountryCode),count(*) from countrylanguage group by CountryCode
+select Continent from country group by Region order by count(*) desc limit 1
+select count(CountryCode),count(*) from countrylanguage group by CountryCode
+select T2.Language,T1.GNPOld from country as T1 join countrylanguage as T2 on T1.Code = T2.CountryCode group by T1.GNP order by count(*) desc limit 1
+select Continent from country where Name = 'terminal'
+select T1.Name,T2.Name from city as T1 join country as T2 on T1.CountryCode = T2.Code where T2.Region = 'terminal'
+select count(*) from country where Population > 'terminal'
+select Name from country where Region = 'terminal'
+select Name from country where Region = 'terminal' intersect select Name from country where Region = 'terminal'
+select Language from countrylanguage group by Language
+select Name from country where Region = 'terminal'
+select name from sqlite_sequence where name = 'terminal' intersect select name from sqlite_sequence where name = 'terminal'
+select Name from country where Region = 'terminal'
+select CountryCode from city
+select count(*) from country where Population > 'terminal'
+select Region from country
+select Name from city where Population = 'terminal'
+select T1.Name,T2.Language from country as T1 join countrylanguage as T2 on T1.Code = T2.CountryCode group by T1.Name
+select Name from country where Code like 'terminal' and Code = 'terminal'
+select Language from countrylanguage where Language = 'terminal'
+select Continent from country group by Region order by count(*) asc limit 1
+select T2.HeadOfState from city as T1 join country as T2 on T1.CountryCode = T2.Code where T1.District = 'terminal'
+select Continent from country group by Region
+select T1.Name from city as T1 join country as T2 on T1.CountryCode = T2.Code where T2.Region = 'terminal'
+select Name from country order by Population desc limit 1
+select T1.Population,T2.LifeExpectancy,T2.Population from city as T1 join country as T2 on T1.CountryCode = T2.Code where T2.Region = 'terminal'
+select T1.name,T2.Name from sqlite_sequence as T1 join country as T2 where T2.Region = 'terminal'
+select Language from countrylanguage group by Language
+select count(*) from country where HeadOfState != 'terminal'
+select Continent,avg(Population) from country where Region = 'terminal'
+select Name from country where Region != 'terminal'
+select count(*) from country where Population > 'terminal'
+select T1.District from city as T1 join country as T2 on T1.CountryCode = T2.Code where T2.Region = 'terminal'
+select T2.Language from country as T1 join countrylanguage as T2 on T1.Code = T2.CountryCode where T1.Population = 'terminal'
+select T1.name from sqlite_sequence as T1 join country as T2 where T2.Population = 'terminal'
+select Name from country
+select Name from country where Capital > 'terminal'
+select count(T3.CountryCode),T1.CountryCode from city as T1 join country as T2 on T1.CountryCode = T2.Code join countrylanguage as T3 on T2.Code = T3.CountryCode where T2.IndepYear = 'terminal'
+select * from city
+select SurfaceArea from country where SurfaceArea > (select min(SurfaceArea) from country)
+select Continent from country where SurfaceArea > (select avg(Population) from country)
+select max(Population) from country where Name = 'terminal'
+select Continent from country where Population > (select avg(Population) from country)
+select Population from country where Name = 'terminal'
+select Continent from country where Population > (select avg(Population) from country)
+select Region from country
+select Code from country where Name = 'terminal' except select Code from country where Name = 'terminal'
+select Region from country
+select T1.name,T2.Name from sqlite_sequence as T1 join country as T2 where T2.Region = 'terminal'
+select CountryCode from city
+select T1.name,T2.Name from sqlite_sequence as T1 join country as T2 where T2.Region = 'terminal'
+select Name from country where Region = 'terminal'
+select Name from country except select Name from country
+select T1.Name from city as T1 join country as T2 on T1.CountryCode = T2.Code where T2.Population > 'terminal'
+select Continent from country where Name = 'terminal'
+select T2.Name,T1.name from sqlite_sequence as T1 join country as T2 where T2.Region = 'terminal'
+select District from city
+select Continent from country order by Population asc limit 1
+select T2.name,T3.Population,T1.Population from city as T1 join sqlite_sequence as T2 join country as T3 on T1.CountryCode = T3.Code where T3.Region = 'terminal'
+select Population from country order by Population desc limit 1
+select T1.Population,T2.Population from city as T1 join country as T2 on T1.CountryCode = T2.Code where T2.Region = 'terminal'
+select Name from country
+select LocalName from country group by Name having count(*) >= 'terminal'
+select count(CountryCode),count(*) from countrylanguage group by CountryCode
+select avg(Population),District from city group by District
+select count(CountryCode),count(*) from countrylanguage group by CountryCode
+select Continent from country where SurfaceArea > 'terminal' intersect select Continent from country where LifeExpectancy > 'terminal'
+select count(*),Code2 from country where Population > 'terminal'
+select Continent from country where Population > 'terminal'
+select avg(Population) from country
+select HeadOfState from country order by SurfaceArea desc limit 1
+select T2.Name,T1.Name from city as T1 join country as T2 on T1.CountryCode = T2.Code where T2.SurfaceArea > 'terminal'
+select Continent from country order by Population desc
+select Name from country group by Name order by count(*) desc limit 1
+select Continent from country order by Population desc
+select T1.Name from country as T1 join countrylanguage as T2 on T1.Code = T2.CountryCode order by T2.Percentage asc limit 1
+select Region from country where Population = 'terminal'
+select count(*) from country where Population > 'terminal'
+select Region from country where Population = 'terminal'
+select Name from country where Population > 'terminal'
+select Continent from country where Name = 'terminal'
+select T1.name,T2.Name from sqlite_sequence as T1 join country as T2 where T2.Capital > 'terminal'
+select avg(T1.Population),avg(T2.Population) from city as T1 join country as T2 on T1.CountryCode = T2.Code where T2.Region = 'terminal'
+select Name from city
+select * from city where Population > 'terminal'
+select T1.GNPOld,T2.Language from country as T1 join countrylanguage as T2 on T1.Code = T2.CountryCode
+select Continent from country group by Region order by count(*) desc limit 1
+select T2.Title from TV_Channel as T1 join Cartoon as T2 on T1.id = T2.Channel where T2.Written_by = 'terminal' and T1.series_name = 'terminal'
+select Episode from TV_series where Episode = 'terminal'
+select Episode from TV_series where Episode like 'terminal' and Episode = 'terminal'
+select * from Cartoon where Directed_by = 'terminal'
+select T3.Written_by,T3.Directed_by from TV_series as T1 join TV_Channel as T2 on T1.Channel = T2.id join Cartoon as T3 on T2.id = T3.Channel where T1.Share = 'terminal'
+select Episode,count(*) from TV_series group by Episode order by count(*) desc limit 1
+select T2.Title from TV_Channel as T1 join Cartoon as T2 on T1.id = T2.Channel where T2.Title = 'terminal' and T1.series_name = 'terminal'
+select Episode from TV_series group by Episode
+select T3.Title,T2.Episode from TV_Channel as T1 join TV_series as T2 on T1.id = T2.Channel join Cartoon as T3 on T1.id = T3.Channel where T1.Language = 'terminal'
+select Package_Option from TV_Channel where Content = 'terminal'
+select Episode from TV_series where Share = 'terminal'
+select Title,count(*) from Cartoon where id not in (select id from Cartoon) group by id
+select Title from Cartoon where Written_by = 'terminal'
+select Directed_by from Cartoon where Directed_by = 'terminal'
+select T2.Title from TV_Channel as T1 join Cartoon as T2 on T1.id = T2.Channel where T1.series_name = 'terminal'
+select count(*) from Cartoon where Title = 'terminal'
+select T1.Content from TV_Channel as T1 join TV_series as T2 on T1.id = T2.Channel where T2.Episode = 'terminal'
+select T2.Title from TV_Channel as T1 join Cartoon as T2 on T1.id = T2.Channel where T1.series_name = 'terminal'
+select Rating from TV_series where Episode = 'terminal'
+select Rating from TV_series where Episode = 'terminal'
+select T1.Hight_definition_TV,T2.Episode from TV_Channel as T1 join TV_series as T2 on T1.id = T2.Channel order by T2.Rating desc limit 1
+select Rating from TV_series where Episode = 'terminal'
+select Share from TV_series where Share = 'terminal'
+select max(Share),min(Share),18_49_Rating_Share from TV_series
+select Episode from TV_series where Episode = 'terminal'
+select T1.Rating,T3.Title from TV_series as T1 join TV_Channel as T2 on T1.Channel = T2.id join Cartoon as T3 on T2.id = T3.Channel order by T1.Rating desc limit 1
+select T2.Title from TV_Channel as T1 join Cartoon as T2 on T1.id = T2.Channel where T1.series_name = 'terminal'
+select T1.Content from TV_Channel as T1 join TV_series as T2 on T1.id = T2.Channel where T2.Episode = 'terminal'
+select T1.Language from TV_Channel as T1 join TV_series as T2 on T1.id = T2.Channel where T2.Episode = 'terminal'
+select T1.Language from TV_Channel as T1 join TV_series as T2 on T1.id = T2.Channel where T2.Episode = 'terminal'
+select Title from Cartoon where Title = 'terminal'
+select T2.Production_code from TV_Channel as T1 join Cartoon as T2 on T1.id = T2.Channel where T1.series_name = 'terminal'
+select Rating,Episode from TV_series group by Episode
+select T3.Directed_by from TV_series as T1 join TV_Channel as T2 on T1.Channel = T2.id join Cartoon as T3 on T2.id = T3.Channel where T1.Episode = 'terminal'
+select Directed_by from Cartoon where Title = 'terminal'
+select * from TV_Channel as T1 join TV_series as T2 on T1.id = T2.Channel where T1.series_name = 'terminal' and T2.Episode = 'terminal'
+select * from TV_series as T1 join TV_Channel as T2 on T1.Channel = T2.id join Cartoon as T3 on T2.id = T3.Channel where T3.Title = 'terminal' and T1.Episode = 'terminal'
+select Episode from TV_series where Episode like 'terminal'
+select Title from Cartoon where Title = 'terminal'
+select T3.Production_code,T1.Episode from TV_series as T1 join TV_Channel as T2 on T1.Channel = T2.id join Cartoon as T3 on T2.id = T3.Channel where T3.Title = 'terminal'
+select T1.series_name,T2.Episode from TV_Channel as T1 join TV_series as T2 on T1.id = T2.Channel where T2.Episode = 'terminal'

duckdb-nsql/eval/metrics/test_suite_sql_eval/exec_eval.py ADDED Viewed

	@@ -0,0 +1,313 @@

+import os
+import re
+import duckdb
+import asyncio
+import threading
+from typing import Tuple, Any, List, Set
+from itertools import product
+from collections import defaultdict
+import tqdm
+import random
+import time
+import pickle as pkl
+import subprocess
+from itertools import chain
+import shutil
+from pathlib import Path
+from .parse import get_all_preds_for_execution, remove_distinct
+threadLock = threading.Lock()
+TIMEOUT = 60
+TMP_DIR = "_tmp"
+EXEC_TMP_DIR = os.path.join(os.path.dirname(__file__), "tmp")
+def permute_tuple(element: Tuple, perm: Tuple) -> Tuple:
+    assert len(element) == len(perm)
+    return tuple([element[i] for i in perm])
+def unorder_row(row: Tuple) -> Tuple:
+    return tuple(sorted(row, key=lambda x: str(x) + str(type(x))))
+def tuple_sublists(row: Tuple) -> Tuple:
+    new_row = []
+    for item in row:
+        if isinstance(item, list):
+            new_row.append(tuple(item))
+        elif isinstance(item, dict):
+            new_row.append(tuple(sorted(item.items(), key=lambda x: x[0])))
+            print(new_row[-1])
+        else:
+            new_row.append(item)
+    new_row = tuple(new_row)
+    return new_row
+# unorder each row in the table
+# [result_1 and result_2 has the same bag of unordered row]
+# is a necessary condition of
+# [result_1 and result_2 are equivalent in denotation]
+def quick_rej(result1: List[Tuple], result2: List[Tuple], order_matters: bool) -> bool:
+    s1 = [unorder_row(row) for row in result1]
+    s2 = [unorder_row(row) for row in result2]
+    if order_matters:
+        return s1 == s2
+    else:
+        return set(s1) == set(s2)
+# return whether two bag of relations are equivalent
+def multiset_eq(l1: List, l2: List) -> bool:
+    if len(l1) != len(l2):
+        return False
+    d = defaultdict(int)
+    for e in l1:
+        d[e] = d[e] + 1
+    for e in l2:
+        d[e] = d[e] - 1
+        if d[e] < 0:
+            return False
+    return True
+def get_constraint_permutation(tab1_sets_by_columns: List[Set], result2: List[Tuple]):
+    num_cols = len(result2[0])
+    perm_constraints = [{i for i in range(num_cols)} for _ in range(num_cols)]
+    if num_cols <= 3:
+        return product(*perm_constraints)
+    # we sample 20 rows and constrain the space of permutations
+    for _ in range(20):
+        random_tab2_row = random.choice(result2)
+        for tab1_col in range(num_cols):
+            for tab2_col in set(perm_constraints[tab1_col]):
+                if random_tab2_row[tab2_col] not in tab1_sets_by_columns[tab1_col]:
+                    perm_constraints[tab1_col].remove(tab2_col)
+    return product(*perm_constraints)
+# check whether two denotations are correct
+def result_eq(result1: List[Tuple], result2: List[Tuple], order_matters: bool) -> bool:
+    if len(result1) == 0 and len(result2) == 0:
+        return True
+    # if length is not the same, then they are definitely different bag of rows
+    if len(result1) != len(result2):
+        return False
+    num_cols = len(result1[0])
+    # if the results do not have the same number of columns, they are different
+    if len(result2[0]) != num_cols:
+        return False
+    result1 = [tuple_sublists(row) for row in result1]
+    result2 = [tuple_sublists(row) for row in result2]
+    # unorder each row and compare whether the denotation is the same
+    # this can already find most pair of denotations that are different
+    if not quick_rej(result1, result2, order_matters):
+        return False
+    # the rest of the problem is in fact more complicated than one might think
+    # we want to find a permutation of column order and a permutation of row order,
+    # s.t. result_1 is the same as result_2
+    # we return true if we can find such column & row permutations
+    # and false if we cannot
+    tab1_sets_by_columns = [{row[i] for row in result1} for i in range(num_cols)]
+    # on a high level, we enumerate all possible column permutations that might make result_1 == result_2
+    # we decrease the size of the column permutation space by the function get_constraint_permutation
+    # if one of the permutation make result_1, result_2 equivalent, then they are equivalent
+    for perm in get_constraint_permutation(tab1_sets_by_columns, result2):
+        if len(perm) != len(set(perm)):
+            continue
+        if num_cols == 1:
+            result2_perm = result2
+        else:
+            result2_perm = [permute_tuple(element, perm) for element in result2]
+        if order_matters:
+            if result1 == result2_perm:
+                return True
+        else:
+            # in fact the first condition must hold if the second condition holds
+            # but the first is way more efficient implementation-wise
+            # and we use it to quickly reject impossible candidates
+            if set(result1) == set(result2_perm) and multiset_eq(result1, result2_perm):
+                return True
+    return False
+def replace_cur_year(query: str) -> str:
+    return re.sub(
+        "YEAR\s*\(\s*CURDATE\s*\(\s*\)\s*\)\s*", "2020", query, flags=re.IGNORECASE
+    )
+class WithDuckDBConnectionInTmpDir(object):
+    def __init__(self, databases_file, tmp_dir):
+        if not os.path.exists(databases_file):
+            raise Exception("Database note found: %s" % databases_file)
+        os.makedirs(tmp_dir)
+        shutil.copy(databases_file, tmp_dir)
+        self.tmp_dbfile = Path(databases_file).name
+        self.tmp_dir = tmp_dir
+        self.original_wd = os.getcwd()
+    def __enter__(self):
+        os.chdir(self.tmp_dir)
+        self.con = duckdb.connect(self.tmp_dbfile)
+        return self.con
+    def __exit__(self, *args):
+        self.con.close()
+        os.chdir(self.original_wd)
+        shutil.rmtree(self.tmp_dir)
+async def exec_on_db_(
+    duckdb_path: str, query: str, setup_sql: str, validate_sql: str
+) -> Tuple[str, Any]:
+    # query = replace_cur_year(query)
+    try:
+        with WithDuckDBConnectionInTmpDir(duckdb_path, TMP_DIR) as connection:
+            if setup_sql is not None:
+                print("Running Setup SQL:" + setup_sql)
+                connection.execute(setup_sql)
+            ddb_benchmark_result_rel = connection.sql(query)
+            if ddb_benchmark_result_rel is not None:
+                connection.execute(
+                    "CREATE TABLE ddb_benchmark_result AS SELECT * FROM ddb_benchmark_result_rel"
+                )
+            else:
+                connection.execute("CREATE TABLE ddb_benchmark_result(empty TEXT)")
+            print("Running Validation SQL:" + validate_sql)
+            result = connection.execute(validate_sql).fetchall()
+            return "result", result
+    except Exception as e:
+        return "exception", e
+async def exec_on_db(
+    duckdb_path: str,
+    query: str,
+    setup_sql: str,
+    validate_sql: str,
+    timeout: int = TIMEOUT,
+) -> Tuple[str, Any]:
+    try:
+        return await asyncio.wait_for(
+            exec_on_db_(duckdb_path, query, setup_sql, validate_sql), timeout
+        )
+    except asyncio.TimeoutError:
+        return ("exception", TimeoutError)
+    except Exception as e:
+        return ("exception", e)
+# postprocess the model predictions to avoid execution errors
+# e.g. removing spaces between ">" and "="
+def postprocess(query: str) -> str:
+    query = query.replace("> =", ">=").replace("< =", "<=").replace("! =", "!=")
+    return query
+# approximate whether p_str and g_str are semantically equivalent
+# db is the database path
+# we are going to evaluate whether they are equivalent in all the databases
+# that are in the same directory as db
+# 0 if denotationally equivalent
+# 1 otherwise
+# the meaning of each auxillary argument can be seen in the parser definition in evaluation.py
+def eval_exec_match(
+    db: str,
+    p_str: str,
+    g_str: str,
+    setup_sql: str,
+    validate_sql: str,
+    plug_value: bool,
+    keep_distinct: bool,
+    progress_bar_for_each_datapoint: bool,
+) -> int:
+    # post-process the prediction.
+    # e.g. removing spaces between ">" and "="
+    p_str, g_str = postprocess(p_str), postprocess(g_str)
+    if not keep_distinct:
+        try:
+            # if sqlparse can't parse p_str, we should not even try to execute it
+            p_str = remove_distinct(p_str)
+        except Exception as e:
+            return 0
+        g_str = remove_distinct(g_str)
+    # we decide whether two denotations are equivalent based on "bag semantics"
+    # https://courses.cs.washington.edu/courses/cse444/10sp/lectures/lecture16.pdf
+    # if there is order by in query, then we assume order of the rows matter
+    # order by might also be used to find the max/min instead of sorting,
+    # but in that case the result mostly only contains one row and hence order_matters does not make a difference
+    order_matters = "order by" in g_str.lower()
+    # find all databases in the same directory
+    db_dir = os.path.dirname(db)
+    db_paths = [
+        os.path.join(db_dir, basename)
+        for basename in os.listdir(db_dir)
+        if ".duckdb" in basename
+    ]
+    preds = [p_str]
+    # if plug in value (i.e. we do not consider value prediction correctness)
+    # enumerate all ways to plug in values in the gold query to the model predictions
+    # otherwise, we only evaluate the predicted query with its own value prediction
+    if plug_value:
+        _, preds = get_all_preds_for_execution(g_str, p_str)
+        # we did not add this line in our EMNLP work
+        # this reduces "false negatives" when value is substituted
+        preds = chain([p_str], preds)
+    for pred in preds:
+        pred_passes = 1
+        # compare the gold and predicted denotations on each database in the directory
+        # wrap with progress bar if required
+        if progress_bar_for_each_datapoint:
+            ranger = tqdm.tqdm(db_paths)
+        else:
+            ranger = db_paths
+        for db_path in ranger:
+            g_flag, g_denotation = asyncio.run(
+                exec_on_db(
+                    db_path, g_str, setup_sql=setup_sql, validate_sql=validate_sql
+                )
+            )
+            p_flag, p_denotation = asyncio.run(
+                exec_on_db(
+                    db_path, pred, setup_sql=setup_sql, validate_sql=validate_sql
+                )
+            )
+            # we should expect the gold to be succesfully executed on the database
+            assert (
+                g_flag != "exception"
+            ), f"gold query {g_str} has error {g_denotation} on database file {db_path}"
+            # wrong if execution fails
+            if p_flag == "exception":
+                pred_passes = 0
+            # if denotations are not equivalent, the prediction must be wrong
+            elif not result_eq(g_denotation, p_denotation, order_matters=order_matters):
+                pred_passes = 0
+            if pred_passes == 0:
+                break
+        # the model prediction has the same denotation as the gold for all databases
+        if pred_passes == 1:
+            return 1
+    # none of the predictions passed
+    return 0

duckdb-nsql/eval/metrics/test_suite_sql_eval/parse.py ADDED Viewed

	@@ -0,0 +1,252 @@

+import re
+import sqlparse
+from typing import List, Tuple, Set, Iterator, Dict, Any, Union
+from sqlparse.sql import Comparison, Identifier
+from sqlparse.tokens import Whitespace
+import itertools
+from collections import namedtuple
+Token = namedtuple("Token", ["ttype", "value"])
+VALUE_NUM_SYMBOL = "VALUERARE"
+QUOTE_CHARS = {"`", "'", '"'}
+def tokenize(query: str) -> List[Token]:
+    tokens = list([Token(t.ttype, t.value) for t in sqlparse.parse(query)[0].flatten()])
+    return tokens
+def join_tokens(tokens: List[Token]) -> str:
+    return "".join([x.value for x in tokens]).strip().replace("  ", " ")
+def round_trip_test(query: str) -> None:
+    tokens = tokenize(query)
+    reconstructed = "".join([token.value for token in tokens])
+    assert query == reconstructed, "Round trip test fails for string %s" % query
+def postprocess(query: str) -> str:
+    query = query.replace("> =", ">=").replace("< =", "<=").replace("! =", "!=")
+    return query
+# strip_query, reformat_query and replace values
+# were implemented by Yu Tao for processing CoSQL
+def strip_query(query: str) -> Tuple[List[str], List[str]]:
+    query_keywords, all_values = [], []
+    # then replace all stuff enclosed by "" with a numerical value to get it marked as {VALUE}
+    # Tao's implementation is commented out here.
+    """
+    str_1 = re.findall("\"[^\"]*\"", query)
+    str_2 = re.findall("\'[^\']*\'", query)
+    values = str_1 + str_2
+        """
+    toks = sqlparse.parse(query)[0].flatten()
+    values = [
+        t.value
+        for t in toks
+        if t.ttype == sqlparse.tokens.Literal.String.Single
+        or t.ttype == sqlparse.tokens.Literal.String.Symbol
+    ]
+    for val in values:
+        all_values.append(val)
+        query = query.replace(val.strip(), VALUE_NUM_SYMBOL)
+    query_tokenized = query.split()
+    float_nums = re.findall("[-+]?\d*\.\d+", query)
+    all_values += [qt for qt in query_tokenized if qt in float_nums]
+    query_tokenized = [
+        VALUE_NUM_SYMBOL if qt in float_nums else qt for qt in query_tokenized
+    ]
+    query = " ".join(query_tokenized)
+    int_nums = [i.strip() for i in re.findall("[^tT]\d+", query)]
+    all_values += [qt for qt in query_tokenized if qt in int_nums]
+    query_tokenized = [
+        VALUE_NUM_SYMBOL if qt in int_nums else qt for qt in query_tokenized
+    ]
+    # print int_nums, query, query_tokenized
+    for tok in query_tokenized:
+        if "." in tok:
+            table = re.findall("[Tt]\d+\.", tok)
+            if len(table) > 0:
+                to = tok.replace(".", " . ").split()
+                to = [t.lower() for t in to if len(t) > 0]
+                query_keywords.extend(to)
+            else:
+                query_keywords.append(tok.lower())
+        elif len(tok) > 0:
+            query_keywords.append(tok.lower())
+    return query_keywords, all_values
+def reformat_query(query: str) -> str:
+    query = query.strip().replace(";", "").replace("\t", "")
+    query = " ".join(
+        [t.value for t in tokenize(query) if t.ttype != sqlparse.tokens.Whitespace]
+    )
+    t_stars = ["t1.*", "t2.*", "t3.*", "T1.*", "T2.*", "T3.*"]
+    for ts in t_stars:
+        query = query.replace(ts, "*")
+    return query
+def replace_values(sql: str) -> Tuple[List[str], Set[str]]:
+    sql = sqlparse.format(sql, reindent=False, keyword_case="upper")
+    # sql = re.sub(r"(<=|>=|!=|=|<|>|,)", r" \1 ", sql)
+    sql = re.sub(r"(T\d+\.)\s", r"\1", sql)
+    query_toks_no_value, values = strip_query(sql)
+    return query_toks_no_value, set(values)
+# extract the non-value tokens and the set of values
+# from a sql query
+def extract_query_values(sql: str) -> Tuple[List[str], Set[str]]:
+    reformated = reformat_query(query=sql)
+    query_value_replaced, values = replace_values(reformated)
+    return query_value_replaced, values
+# plug in the values into query with value slots
+def plugin(query_value_replaced: List[str], values_in_order: List[str]) -> str:
+    q_length = len(query_value_replaced)
+    query_w_values = query_value_replaced[:]
+    value_idx = [
+        idx
+        for idx in range(q_length)
+        if query_value_replaced[idx] == VALUE_NUM_SYMBOL.lower()
+    ]
+    assert len(value_idx) == len(values_in_order)
+    for idx, value in zip(value_idx, values_in_order):
+        query_w_values[idx] = value
+    return " ".join(query_w_values)
+# a generator generating all possible ways of
+# filling values into predicted query
+def plugin_all_permutations(
+    query_value_replaced: List[str], values: Set[str]
+) -> Iterator[str]:
+    num_slots = len([v for v in query_value_replaced if v == VALUE_NUM_SYMBOL.lower()])
+    for values in itertools.product(*[list(values) for _ in range(num_slots)]):
+        yield plugin(query_value_replaced, list(values))
+# given the gold query and the model prediction
+# extract values from the gold, extract predicted sql with value slots
+# return 1) number of possible ways to plug in gold values and 2) an iterator of predictions with value plugged in
+def get_all_preds_for_execution(gold: str, pred: str) -> Tuple[int, Iterator[str]]:
+    _, gold_values = extract_query_values(gold)
+    pred_query_value_replaced, _ = extract_query_values(pred)
+    num_slots = len(
+        [v for v in pred_query_value_replaced if v == VALUE_NUM_SYMBOL.lower()]
+    )
+    num_alternatives = len(gold_values) ** num_slots
+    return (
+        num_alternatives,
+        plugin_all_permutations(pred_query_value_replaced, gold_values),
+    )
+def remove_distinct(s):
+    toks = [t.value for t in list(sqlparse.parse(s)[0].flatten())]
+    return "".join([t for t in toks if t.lower() != "distinct"])
+def extract_all_comparison_from_node(node: Token) -> List[Comparison]:
+    comparison_list = []
+    if hasattr(node, "tokens"):
+        for t in node.tokens:
+            comparison_list.extend(extract_all_comparison_from_node(t))
+    if type(node) == Comparison:
+        comparison_list.append(node)
+    return comparison_list
+def extract_all_comparison(query: str) -> List[Comparison]:
+    tree = sqlparse.parse(query)[0]
+    comparison_list = extract_all_comparison_from_node(tree)
+    return comparison_list
+def extract_toks_from_comparison(comparison_node: Comparison) -> List[Token]:
+    tokens = [t for t in comparison_node.tokens if t.ttype != Whitespace]
+    return tokens
+def extract_info_from_comparison(comparison_node: Comparison) -> Dict[str, Any]:
+    tokens = extract_toks_from_comparison(comparison_node)
+    left, op, right = tokens
+    returned_dict = {"left": left, "op": op.value, "right": right}
+    if type(left) != Identifier:
+        return returned_dict
+    table = None
+    if len(left.tokens) == 3 and re.match("^[tT][0-9]$", left.tokens[0].value) is None:
+        table = left.tokens[0].value.lower()
+    col = left.tokens[-1].value
+    if type(right) == Identifier:
+        if len(right.tokens) == 1 and type(right.tokens[0]) == sqlparse.sql.Token:
+            right_val = right.tokens[0].value
+        else:
+            return returned_dict
+    elif type(right) == sqlparse.sql.Token:
+        right_val = right.value
+    else:
+        return returned_dict
+    returned_dict["table_col"], returned_dict["val"] = (
+        (table, col.upper()),
+        process_str_value(right_val),
+    )
+    return returned_dict
+def extract_all_comparison_from_query(query: str) -> List[Dict[str, Any]]:
+    comparison_list = extract_all_comparison(query)
+    return [extract_info_from_comparison(c) for c in comparison_list]
+def extract_typed_value_in_comparison_from_query(
+    query: str,
+) -> List[Tuple[Tuple[Union[str, None], str], str]]:
+    cmps = extract_all_comparison_from_query(query)
+    typed_values = [
+        (cmp["table_col"], cmp["val"]) for cmp in cmps if "table_col" in cmp
+    ]
+    for table, col, val1, val2 in re.findall(
+        "(?:([^\.\s]*)\.)?([^\.\s]+) between ([^\s;]+) and ([^\s;]+)",
+        query,
+        re.IGNORECASE,
+    ):
+        if table == "":
+            table = None
+        else:
+            table = table.lower()
+        col = col.upper()
+        for v in [val1, val2]:
+            typed_values.append(((table, col), v))
+    return typed_values
+def process_str_value(v: str) -> str:
+    if len(v) > 0 and v[0] in QUOTE_CHARS:
+        v = v[1:]
+    if len(v) > 0 and v[-1] in QUOTE_CHARS:
+        v = v[:-1]
+    for c in QUOTE_CHARS:
+        v = v.replace(c + c, c)
+    return v

duckdb-nsql/eval/metrics/test_suite_sql_eval/process_sql.py ADDED Viewed

	@@ -0,0 +1,644 @@

+################################
+# Assumptions:
+#   1. sql is correct
+#   2. only table name has alias
+#   3. only one intersect/union/except
+#
+# val: number(float)/string(str)/sql(dict)
+# col_unit: (agg_id, col_id, isDistinct(bool))
+# val_unit: (unit_op, col_unit1, col_unit2)
+# table_unit: (table_type, col_unit/sql)
+# cond_unit: (not_op, op_id, val_unit, val1, val2)
+# condition: [cond_unit1, 'and'/'or', cond_unit2, ...]
+# sql {
+#   'select': (isDistinct(bool), [(agg_id, val_unit), (agg_id, val_unit), ...])
+#   'from': {'table_units': [table_unit1, table_unit2, ...], 'conds': condition}
+#   'where': condition
+#   'groupBy': [col_unit1, col_unit2, ...]
+#   'orderBy': ('asc'/'desc', [val_unit1, val_unit2, ...])
+#   'having': condition
+#   'limit': None/limit value
+#   'intersect': None/sql
+#   'except': None/sql
+#   'union': None/sql
+# }
+################################
+import json
+import duckdb
+from nltk import word_tokenize
+CLAUSE_KEYWORDS = (
+    "select",
+    "from",
+    "where",
+    "group",
+    "order",
+    "limit",
+    "intersect",
+    "union",
+    "except",
+)
+JOIN_KEYWORDS = ("join", "on", "as")
+WHERE_OPS = (
+    "not",
+    "between",
+    "=",
+    ">",
+    "<",
+    ">=",
+    "<=",
+    "!=",
+    "in",
+    "like",
+    "is",
+    "exists",
+)
+UNIT_OPS = ("none", "-", "+", "*", "/")
+AGG_OPS = ("none", "max", "min", "count", "sum", "avg")
+TABLE_TYPE = {
+    "sql": "sql",
+    "table_unit": "table_unit",
+}
+COND_OPS = ("and", "or")
+SQL_OPS = ("intersect", "union", "except")
+ORDER_OPS = ("desc", "asc")
+class Schema:
+    """
+    Simple schema which maps table&column to a unique identifier
+    """
+    def __init__(self, schema):
+        self._schema = schema
+        self._idMap = self._map(self._schema)
+    @property
+    def schema(self):
+        return self._schema
+    @property
+    def idMap(self):
+        return self._idMap
+    def _map(self, schema):
+        idMap = {"*": "__all__"}
+        id = 1
+        for key, vals in schema.items():
+            for val in vals:
+                idMap[key.lower() + "." + val.lower()] = (
+                    "__" + key.lower() + "." + val.lower() + "__"
+                )
+                id += 1
+        for key in schema:
+            idMap[key.lower()] = "__" + key.lower() + "__"
+            id += 1
+        return idMap
+def get_schema(db):
+    """
+    Get database's schema, which is a dict with table name as key
+    and list of column names as value
+    :param db: database path
+    :return: schema dict
+    """
+    schema = {}
+    conn = duckdb.connect(db)
+    # fetch table names
+    res = conn.execute("show tables").fetchall()
+    tables = [r[0] for r in res]
+    # fetch table info
+    for table in tables:
+        res = conn.execute("PRAGMA table_info({})".format(table))
+        schema[table] = [str(col[1].lower()) for col in res.fetchall()]
+    return schema
+def get_schema_from_json(fpath):
+    with open(fpath) as f:
+        data = json.load(f)
+    schema = {}
+    for entry in data:
+        table = str(entry["table"].lower())
+        cols = [str(col["column_name"].lower()) for col in entry["col_data"]]
+        schema[table] = cols
+    return schema
+def tokenize(string):
+    string = str(string)
+    string = string.replace(
+        "'", '"'
+    )  # ensures all string values wrapped by "" problem??
+    quote_idxs = [idx for idx, char in enumerate(string) if char == '"']
+    assert len(quote_idxs) % 2 == 0, "Unexpected quote"
+    # keep string value as token
+    vals = {}
+    for i in range(len(quote_idxs) - 1, -1, -2):
+        qidx1 = quote_idxs[i - 1]
+        qidx2 = quote_idxs[i]
+        val = string[qidx1 : qidx2 + 1]
+        key = "__val_{}_{}__".format(qidx1, qidx2)
+        string = string[:qidx1] + key + string[qidx2 + 1 :]
+        vals[key] = val
+    toks = [word.lower() for word in word_tokenize(string)]
+    # replace with string value token
+    for i in range(len(toks)):
+        if toks[i] in vals:
+            toks[i] = vals[toks[i]]
+    # find if there exists !=, >=, <=
+    eq_idxs = [idx for idx, tok in enumerate(toks) if tok == "="]
+    eq_idxs.reverse()
+    prefix = ("!", ">", "<")
+    for eq_idx in eq_idxs:
+        pre_tok = toks[eq_idx - 1]
+        if pre_tok in prefix:
+            toks = toks[: eq_idx - 1] + [pre_tok + "="] + toks[eq_idx + 1 :]
+    return toks
+def scan_alias(toks):
+    """Scan the index of 'as' and build the map for all alias"""
+    as_idxs = [idx for idx, tok in enumerate(toks) if tok == "as"]
+    alias = {}
+    for idx in as_idxs:
+        alias[toks[idx + 1]] = toks[idx - 1]
+    return alias
+def get_tables_with_alias(schema, toks):
+    tables = scan_alias(toks)
+    for key in schema:
+        assert key not in tables, "Alias {} has the same name in table".format(key)
+        tables[key] = key
+    return tables
+def parse_col(toks, start_idx, tables_with_alias, schema, default_tables=None):
+    """
+        :returns next idx, column id
+    """
+    tok = toks[start_idx]
+    if tok == "*":
+        return start_idx + 1, schema.idMap[tok]
+    if "." in tok:  # if token is a composite
+        alias, col = tok.split(".")
+        key = tables_with_alias[alias] + "." + col
+        return start_idx + 1, schema.idMap[key]
+    assert (
+        default_tables is not None and len(default_tables) > 0
+    ), "Default tables should not be None or empty"
+    for alias in default_tables:
+        table = tables_with_alias[alias]
+        if tok in schema.schema[table]:
+            key = table + "." + tok
+            return start_idx + 1, schema.idMap[key]
+    assert False, "Error col: {}".format(tok)
+def parse_col_unit(toks, start_idx, tables_with_alias, schema, default_tables=None):
+    """
+        :returns next idx, (agg_op id, col_id)
+    """
+    idx = start_idx
+    len_ = len(toks)
+    isBlock = False
+    isDistinct = False
+    if toks[idx] == "(":
+        isBlock = True
+        idx += 1
+    if toks[idx] in AGG_OPS:
+        agg_id = AGG_OPS.index(toks[idx])
+        idx += 1
+        assert idx < len_ and toks[idx] == "("
+        idx += 1
+        if toks[idx] == "distinct":
+            idx += 1
+            isDistinct = True
+        idx, col_id = parse_col(toks, idx, tables_with_alias, schema, default_tables)
+        assert idx < len_ and toks[idx] == ")"
+        idx += 1
+        return idx, (agg_id, col_id, isDistinct)
+    if toks[idx] == "distinct":
+        idx += 1
+        isDistinct = True
+    agg_id = AGG_OPS.index("none")
+    idx, col_id = parse_col(toks, idx, tables_with_alias, schema, default_tables)
+    if isBlock:
+        assert toks[idx] == ")"
+        idx += 1  # skip ')'
+    return idx, (agg_id, col_id, isDistinct)
+def parse_val_unit(toks, start_idx, tables_with_alias, schema, default_tables=None):
+    idx = start_idx
+    len_ = len(toks)
+    isBlock = False
+    if toks[idx] == "(":
+        isBlock = True
+        idx += 1
+    col_unit1 = None
+    col_unit2 = None
+    unit_op = UNIT_OPS.index("none")
+    idx, col_unit1 = parse_col_unit(
+        toks, idx, tables_with_alias, schema, default_tables
+    )
+    if idx < len_ and toks[idx] in UNIT_OPS:
+        unit_op = UNIT_OPS.index(toks[idx])
+        idx += 1
+        idx, col_unit2 = parse_col_unit(
+            toks, idx, tables_with_alias, schema, default_tables
+        )
+    if isBlock:
+        assert toks[idx] == ")"
+        idx += 1  # skip ')'
+    return idx, (unit_op, col_unit1, col_unit2)
+def parse_table_unit(toks, start_idx, tables_with_alias, schema):
+    """
+        :returns next idx, table id, table name
+    """
+    idx = start_idx
+    len_ = len(toks)
+    key = tables_with_alias[toks[idx]]
+    if idx + 1 < len_ and toks[idx + 1] == "as":
+        idx += 3
+    else:
+        idx += 1
+    return idx, schema.idMap[key], key
+def parse_value(toks, start_idx, tables_with_alias, schema, default_tables=None):
+    idx = start_idx
+    len_ = len(toks)
+    isBlock = False
+    if toks[idx] == "(":
+        isBlock = True
+        idx += 1
+    if toks[idx] == "select":
+        idx, val = parse_sql(toks, idx, tables_with_alias, schema)
+    elif '"' in toks[idx]:  # token is a string value
+        val = toks[idx]
+        idx += 1
+    else:
+        try:
+            val = float(toks[idx])
+            idx += 1
+        except:
+            end_idx = idx
+            while (
+                end_idx < len_
+                and toks[end_idx] != ","
+                and toks[end_idx] != ")"
+                and toks[end_idx] != "and"
+                and toks[end_idx] not in CLAUSE_KEYWORDS
+                and toks[end_idx] not in JOIN_KEYWORDS
+            ):
+                end_idx += 1
+            idx, val = parse_col_unit(
+                toks[start_idx:end_idx], 0, tables_with_alias, schema, default_tables
+            )
+            idx = end_idx
+    if isBlock:
+        assert toks[idx] == ")"
+        idx += 1
+    return idx, val
+def parse_condition(toks, start_idx, tables_with_alias, schema, default_tables=None):
+    idx = start_idx
+    len_ = len(toks)
+    conds = []
+    while idx < len_:
+        idx, val_unit = parse_val_unit(
+            toks, idx, tables_with_alias, schema, default_tables
+        )
+        not_op = False
+        if toks[idx] == "not":
+            not_op = True
+            idx += 1
+        assert (
+            idx < len_ and toks[idx] in WHERE_OPS
+        ), "Error condition: idx: {}, tok: {}".format(idx, toks[idx])
+        op_id = WHERE_OPS.index(toks[idx])
+        idx += 1
+        val1 = val2 = None
+        if op_id == WHERE_OPS.index(
+            "between"
+        ):  # between..and... special case: dual values
+            idx, val1 = parse_value(
+                toks, idx, tables_with_alias, schema, default_tables
+            )
+            assert toks[idx] == "and"
+            idx += 1
+            idx, val2 = parse_value(
+                toks, idx, tables_with_alias, schema, default_tables
+            )
+        else:  # normal case: single value
+            idx, val1 = parse_value(
+                toks, idx, tables_with_alias, schema, default_tables
+            )
+            val2 = None
+        conds.append((not_op, op_id, val_unit, val1, val2))
+        if idx < len_ and (
+            toks[idx] in CLAUSE_KEYWORDS
+            or toks[idx] in (")", ";")
+            or toks[idx] in JOIN_KEYWORDS
+        ):
+            break
+        if idx < len_ and toks[idx] in COND_OPS:
+            conds.append(toks[idx])
+            idx += 1  # skip and/or
+    return idx, conds
+def parse_select(toks, start_idx, tables_with_alias, schema, default_tables=None):
+    idx = start_idx
+    len_ = len(toks)
+    assert toks[idx] == "select", "'select' not found"
+    idx += 1
+    isDistinct = False
+    if idx < len_ and toks[idx] == "distinct":
+        idx += 1
+        isDistinct = True
+    val_units = []
+    while idx < len_ and toks[idx] not in CLAUSE_KEYWORDS:
+        agg_id = AGG_OPS.index("none")
+        if toks[idx] in AGG_OPS:
+            agg_id = AGG_OPS.index(toks[idx])
+            idx += 1
+        idx, val_unit = parse_val_unit(
+            toks, idx, tables_with_alias, schema, default_tables
+        )
+        val_units.append((agg_id, val_unit))
+        if idx < len_ and toks[idx] == ",":
+            idx += 1  # skip ','
+    return idx, (isDistinct, val_units)
+def parse_from(toks, start_idx, tables_with_alias, schema):
+    """
+    Assume in the from clause, all table units are combined with join
+    """
+    assert "from" in toks[start_idx:], "'from' not found"
+    len_ = len(toks)
+    idx = toks.index("from", start_idx) + 1
+    default_tables = []
+    table_units = []
+    conds = []
+    while idx < len_:
+        isBlock = False
+        if toks[idx] == "(":
+            isBlock = True
+            idx += 1
+        if toks[idx] == "select":
+            idx, sql = parse_sql(toks, idx, tables_with_alias, schema)
+            table_units.append((TABLE_TYPE["sql"], sql))
+        else:
+            if idx < len_ and toks[idx] == "join":
+                idx += 1  # skip join
+            idx, table_unit, table_name = parse_table_unit(
+                toks, idx, tables_with_alias, schema
+            )
+            table_units.append((TABLE_TYPE["table_unit"], table_unit))
+            default_tables.append(table_name)
+        if idx < len_ and toks[idx] == "on":
+            idx += 1  # skip on
+            idx, this_conds = parse_condition(
+                toks, idx, tables_with_alias, schema, default_tables
+            )
+            if len(conds) > 0:
+                conds.append("and")
+            conds.extend(this_conds)
+        if isBlock:
+            assert toks[idx] == ")"
+            idx += 1
+        if idx < len_ and (toks[idx] in CLAUSE_KEYWORDS or toks[idx] in (")", ";")):
+            break
+    return idx, table_units, conds, default_tables
+def parse_where(toks, start_idx, tables_with_alias, schema, default_tables):
+    idx = start_idx
+    len_ = len(toks)
+    if idx >= len_ or toks[idx] != "where":
+        return idx, []
+    idx += 1
+    idx, conds = parse_condition(toks, idx, tables_with_alias, schema, default_tables)
+    return idx, conds
+def parse_group_by(toks, start_idx, tables_with_alias, schema, default_tables):
+    idx = start_idx
+    len_ = len(toks)
+    col_units = []
+    if idx >= len_ or toks[idx] != "group":
+        return idx, col_units
+    idx += 1
+    assert toks[idx] == "by"
+    idx += 1
+    while idx < len_ and not (toks[idx] in CLAUSE_KEYWORDS or toks[idx] in (")", ";")):
+        idx, col_unit = parse_col_unit(
+            toks, idx, tables_with_alias, schema, default_tables
+        )
+        col_units.append(col_unit)
+        if idx < len_ and toks[idx] == ",":
+            idx += 1  # skip ','
+        else:
+            break
+    return idx, col_units
+def parse_order_by(toks, start_idx, tables_with_alias, schema, default_tables):
+    idx = start_idx
+    len_ = len(toks)
+    val_units = []
+    order_type = "asc"  # default type is 'asc'
+    if idx >= len_ or toks[idx] != "order":
+        return idx, val_units
+    idx += 1
+    assert toks[idx] == "by"
+    idx += 1
+    while idx < len_ and not (toks[idx] in CLAUSE_KEYWORDS or toks[idx] in (")", ";")):
+        idx, val_unit = parse_val_unit(
+            toks, idx, tables_with_alias, schema, default_tables
+        )
+        val_units.append(val_unit)
+        if idx < len_ and toks[idx] in ORDER_OPS:
+            order_type = toks[idx]
+            idx += 1
+        if idx < len_ and toks[idx] == ",":
+            idx += 1  # skip ','
+        else:
+            break
+    return idx, (order_type, val_units)
+def parse_having(toks, start_idx, tables_with_alias, schema, default_tables):
+    idx = start_idx
+    len_ = len(toks)
+    if idx >= len_ or toks[idx] != "having":
+        return idx, []
+    idx += 1
+    idx, conds = parse_condition(toks, idx, tables_with_alias, schema, default_tables)
+    return idx, conds
+def parse_limit(toks, start_idx):
+    idx = start_idx
+    len_ = len(toks)
+    if idx < len_ and toks[idx] == "limit":
+        idx += 2
+        # make limit value can work, cannot assume put 1 as a fake limit number
+        if type(toks[idx - 1]) != int:
+            return idx, 1
+        return idx, int(toks[idx - 1])
+    return idx, None
+def parse_sql(toks, start_idx, tables_with_alias, schema):
+    isBlock = False  # indicate whether this is a block of sql/sub-sql
+    len_ = len(toks)
+    idx = start_idx
+    sql = {}
+    if toks[idx] == "(":
+        isBlock = True
+        idx += 1
+    # parse from clause in order to get default tables
+    from_end_idx, table_units, conds, default_tables = parse_from(
+        toks, start_idx, tables_with_alias, schema
+    )
+    sql["from"] = {"table_units": table_units, "conds": conds}
+    # select clause
+    _, select_col_units = parse_select(
+        toks, idx, tables_with_alias, schema, default_tables
+    )
+    idx = from_end_idx
+    sql["select"] = select_col_units
+    # where clause
+    idx, where_conds = parse_where(toks, idx, tables_with_alias, schema, default_tables)
+    sql["where"] = where_conds
+    # group by clause
+    idx, group_col_units = parse_group_by(
+        toks, idx, tables_with_alias, schema, default_tables
+    )
+    sql["groupBy"] = group_col_units
+    # having clause
+    idx, having_conds = parse_having(
+        toks, idx, tables_with_alias, schema, default_tables
+    )
+    sql["having"] = having_conds
+    # order by clause
+    idx, order_col_units = parse_order_by(
+        toks, idx, tables_with_alias, schema, default_tables
+    )
+    sql["orderBy"] = order_col_units
+    # limit clause
+    idx, limit_val = parse_limit(toks, idx)
+    sql["limit"] = limit_val
+    idx = skip_semicolon(toks, idx)
+    if isBlock:
+        assert toks[idx] == ")"
+        idx += 1  # skip ')'
+    idx = skip_semicolon(toks, idx)
+    # intersect/union/except clause
+    for op in SQL_OPS:  # initialize IUE
+        sql[op] = None
+    if idx < len_ and toks[idx] in SQL_OPS:
+        sql_op = toks[idx]
+        idx += 1
+        idx, IUE_sql = parse_sql(toks, idx, tables_with_alias, schema)
+        sql[sql_op] = IUE_sql
+    return idx, sql
+def load_data(fpath):
+    with open(fpath) as f:
+        data = json.load(f)
+    return data
+def get_sql(schema, query):
+    toks = tokenize(query)
+    tables_with_alias = get_tables_with_alias(schema.schema, toks)
+    _, sql = parse_sql(toks, 0, tables_with_alias, schema)
+    return sql
+def skip_semicolon(toks, start_idx):
+    idx = start_idx
+    while idx < len(toks) and toks[idx] == ";":
+        idx += 1
+    return idx

duckdb-nsql/eval/metrics/test_suite_sql_eval/tables.json ADDED Viewed

The diff for this file is too large to render. See raw diff

duckdb-nsql/eval/metrics/test_suite_sql_eval/tmp/readme.txt ADDED Viewed

	@@ -0,0 +1 @@


1	+ This folder contains tmp files that are used in executing SQLs on the database.