Simon Sorg commited on
Commit
8cfbd9e
1 Parent(s): dc6b696

feat: add readme

Browse files
Files changed (1) hide show
  1. README.md +66 -15
README.md CHANGED
@@ -12,25 +12,73 @@ pinned: false
12
 
13
  # Metric Card for Valid Efficiency Score
14
 
15
- ***Module Card Instructions:*** *Fill out the following subsections. Feel free to take a look at existing metric cards if you'd like examples.*
16
-
17
  ## Metric Description
18
- *Give a brief overview of this metric, including what task(s) it is usually used for, if any.*
 
19
 
20
  ## How to Use
21
- *Give general statement of how to use the metric*
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
22
 
23
- *Provide simplest possible example for using the metric*
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
24
 
25
  ### Inputs
26
- *List all input arguments in the format below*
27
- - **input_field** *(type): Definition of input, with explanation if necessary. State any default value(s).*
 
 
 
28
 
29
  ### Output Values
 
30
 
31
- *Explain what this metric outputs and provide an example of what the metric output looks like. Modules should return a dictionary with one or multiple key-value pairs, e.g. {"bleu" : 6.02}*
32
-
33
- *State the range of possible values that the metric's output can take, as well as what in that range is considered good. For example: "This metric can take on any value between 0 and 100, inclusive. Higher scores are better."*
34
 
35
  #### Values from Popular Papers
36
  *Give examples, preferrably with links to leaderboards or publications, to papers that have reported this metric, along with the values they have reported.*
@@ -39,10 +87,13 @@ pinned: false
39
  *Give code examples of the metric being used. Try to include examples that clear up any potential ambiguity left from the metric description above. If possible, provide a range of examples that show both typical and atypical results, as well as examples where a variety of input parameters are passed.*
40
 
41
  ## Limitations and Bias
42
- *Note any known limitations or biases that the metric has, with links and references if possible.*
 
43
 
44
  ## Citation
45
- *Cite the source where this metric was introduced.*
46
-
47
- ## Further References
48
- *Add any useful further references.*
 
 
 
12
 
13
  # Metric Card for Valid Efficiency Score
14
 
 
 
15
  ## Metric Description
16
+ This metric measures the efficiency of the SQL queries generated by a model. It is defined as the ratio of the number of correct results to the number of SQL queries generated. The metric is computed by executing the SQL queries and comparing the results to the expected results.
17
+ It is used for the BIRD benchmark.
18
 
19
  ## How to Use
20
+ ```
21
+ from evaluate import load
22
+
23
+ module = load("Luckiestone/valid_efficiency_score")
24
+
25
+ results = module.compute(predictions=sql_queries_pred, references=sql_queries_ref, execute=execute)
26
+ print(results)
27
+ >>> {"ves": 1.0}
28
+ ```
29
+ ### Example
30
+ ```
31
+ from evaluate import load
32
+ import sqlite3
33
+
34
+ module = load("Luckiestone/valid_efficiency_score")
35
+
36
+ # Create connection to the database
37
+ database_path = "database.sqlite"
38
+ connection = sqlite3.connect(database_path)
39
+ # Cursor
40
+ cursor = connection.cursor()
41
+
42
+ # Create table
43
+ cursor.execute('''CREATE TABLE IF NOT EXISTS Player
44
+ (PlayerID INTEGER PRIMARY KEY,
45
+ PlayerName TEXT NOT NULL);''')
46
 
47
+ # Insert a row of data
48
+ cursor.execute("INSERT INTO Player VALUES (1, 'Cristiano Ronaldo')")
49
+ cursor.execute("INSERT INTO Player VALUES (2, 'Lionel Messi')")
50
+
51
+ def execute(sql_query):
52
+ # Execute the SQL query
53
+ cursor.execute(sql_query)
54
+ result = cursor.fetchall()
55
+ return result
56
+
57
+ sql_queries_pred = [
58
+ "SELECT COUNT(*) FROM Player WHERE PlayerName = 'Cristiano Ronaldo'",
59
+ "SELECT COUNT(*) FROM Player WHERE PlayerName = 'Lionel Messi'"
60
+ ]
61
+
62
+ sql_queries_ref = [
63
+ "SELECT COUNT(*) FROM Player WHERE PlayerName = 'Cristiano Ronaldo'",
64
+ "SELECT COUNT(*) FROM Player WHERE PlayerName = 'Lionel Messi'"
65
+ ]
66
+
67
+ # Compute the score
68
+ results = module.compute(predictions=sql_queries_pred, references=sql_queries_ref, execute=execute,)
69
+ print(results)
70
+ ```
71
 
72
  ### Inputs
73
+ - **predictions** *(string): SQL queries generated.*
74
+ - **references** *(string): SQL queries from the test set.*
75
+ - **execute** *(callable): Function that executes the SQL queries and returns the results.*
76
+ - **filter_func** *(callable, optional): Function that filters the results of the SQL queries.*
77
+ - **num_executions** *(int, optional): Number of times to execute each SQL query.*
78
 
79
  ### Output Values
80
+ - **ves** *(float): Valid efficiency score.* Higher scores are better. Technically ranges from 0 to 1, but if the predictions are exactly accurate and, due to some jittering, the time to execute the predictions is smaller than the time to execute the references, the score can be greater than 1.
81
 
 
 
 
82
 
83
  #### Values from Popular Papers
84
  *Give examples, preferrably with links to leaderboards or publications, to papers that have reported this metric, along with the values they have reported.*
 
87
  *Give code examples of the metric being used. Try to include examples that clear up any potential ambiguity left from the metric description above. If possible, provide a range of examples that show both typical and atypical results, as well as examples where a variety of input parameters are passed.*
88
 
89
  ## Limitations and Bias
90
+ The metric is limited to SQL queries. It is also quite slow to compute, as it requires executing the SQL queries.
91
+ Furthermore, the results are non-deterministic, as the time to execute the SQL queries can vary, even though we average over multiple executions.
92
 
93
  ## Citation
94
+ @article{li2023can,
95
+ title={Can llm already serve as a database interface? a big bench for large-scale database grounded text-to-sqls},
96
+ author={Li, Jinyang and Hui, Binyuan and Qu, Ge and Li, Binhua and Yang, Jiaxi and Li, Bowen and Wang, Bailin and Qin, Bowen and Cao, Rongyu and Geng, Ruiying and others},
97
+ journal={arXiv preprint arXiv:2305.03111},
98
+ year={2023}
99
+ }