Spaces:

LuckiestOne
/

valid_efficiency_score

Runtime error

App Files Files Community

Simon Sorg commited on Jan 10, 2024

Commit

8cfbd9e

1 Parent(s): dc6b696

feat: add readme

Browse files

Files changed (1) hide show

README.md +66 -15

README.md CHANGED Viewed

@@ -12,25 +12,73 @@ pinned: false
 # Metric Card for Valid Efficiency Score
-***Module Card Instructions:*** *Fill out the following subsections. Feel free to take a look at existing metric cards if you'd like examples.*
 ## Metric Description
-*Give a brief overview of this metric, including what task(s) it is usually used for, if any.*
 ## How to Use
-*Give general statement of how to use the metric*
-*Provide simplest possible example for using the metric*
 ### Inputs
-*List all input arguments in the format below*
-- **input_field** *(type): Definition of input, with explanation if necessary. State any default value(s).*
 ### Output Values
-*Explain what this metric outputs and provide an example of what the metric output looks like. Modules should return a dictionary with one or multiple key-value pairs, e.g. {"bleu" : 6.02}*
-*State the range of possible values that the metric's output can take, as well as what in that range is considered good. For example: "This metric can take on any value between 0 and 100, inclusive. Higher scores are better."*
 #### Values from Popular Papers
 *Give examples, preferrably with links to leaderboards or publications, to papers that have reported this metric, along with the values they have reported.*
@@ -39,10 +87,13 @@ pinned: false
 *Give code examples of the metric being used. Try to include examples that clear up any potential ambiguity left from the metric description above. If possible, provide a range of examples that show both typical and atypical results, as well as examples where a variety of input parameters are passed.*
 ## Limitations and Bias
-*Note any known limitations or biases that the metric has, with links and references if possible.*
 ## Citation
-*Cite the source where this metric was introduced.*
-## Further References
-*Add any useful further references.*

 # Metric Card for Valid Efficiency Score
 ## Metric Description
+This metric measures the efficiency of the SQL queries generated by a model. It is defined as the ratio of the number of correct results to the number of SQL queries generated. The metric is computed by executing the SQL queries and comparing the results to the expected results.
+It is used for the BIRD benchmark.
 ## How to Use
+```
+from evaluate import load
+module = load("Luckiestone/valid_efficiency_score")
+results = module.compute(predictions=sql_queries_pred, references=sql_queries_ref, execute=execute)
+print(results)
+>>> {"ves": 1.0}
+```
+### Example
+```
+from evaluate import load
+import sqlite3
+module = load("Luckiestone/valid_efficiency_score")
+# Create connection to the database
+database_path = "database.sqlite"
+connection = sqlite3.connect(database_path)
+# Cursor
+cursor = connection.cursor()
+# Create table
+cursor.execute('''CREATE TABLE IF NOT EXISTS Player
+             (PlayerID INTEGER PRIMARY KEY,
+             PlayerName TEXT NOT NULL);''')
+# Insert a row of data
+cursor.execute("INSERT INTO Player VALUES (1, 'Cristiano Ronaldo')")
+cursor.execute("INSERT INTO Player VALUES (2, 'Lionel Messi')")
+def execute(sql_query):
+    # Execute the SQL query
+    cursor.execute(sql_query)
+    result = cursor.fetchall()
+    return result
+sql_queries_pred = [
+    "SELECT COUNT(*) FROM Player WHERE PlayerName = 'Cristiano Ronaldo'",
+    "SELECT COUNT(*) FROM Player WHERE PlayerName = 'Lionel Messi'"
+]
+sql_queries_ref = [
+    "SELECT COUNT(*) FROM Player WHERE PlayerName = 'Cristiano Ronaldo'",
+    "SELECT COUNT(*) FROM Player WHERE PlayerName = 'Lionel Messi'"
+]
+# Compute the score
+results = module.compute(predictions=sql_queries_pred, references=sql_queries_ref, execute=execute,)
+print(results)
+```
 ### Inputs
+- **predictions** *(string): SQL queries generated.*
+- **references** *(string): SQL queries from the test set.*
+- **execute** *(callable): Function that executes the SQL queries and returns the results.*
+- **filter_func** *(callable, optional): Function that filters the results of the SQL queries.*
+- **num_executions** *(int, optional): Number of times to execute each SQL query.*
 ### Output Values
+- **ves** *(float): Valid efficiency score.* Higher scores are better. Technically ranges from 0 to 1, but if the predictions are exactly accurate and, due to some jittering, the time to execute the predictions is smaller than the time to execute the references, the score can be greater than 1.
 #### Values from Popular Papers
 *Give examples, preferrably with links to leaderboards or publications, to papers that have reported this metric, along with the values they have reported.*
 *Give code examples of the metric being used. Try to include examples that clear up any potential ambiguity left from the metric description above. If possible, provide a range of examples that show both typical and atypical results, as well as examples where a variety of input parameters are passed.*
 ## Limitations and Bias
+The metric is limited to SQL queries. It is also quite slow to compute, as it requires executing the SQL queries.
+Furthermore, the results are non-deterministic, as the time to execute the SQL queries can vary, even though we average over multiple executions.
 ## Citation
+@article{li2023can,
+  title={Can llm already serve as a database interface? a big bench for large-scale database grounded text-to-sqls},
+  author={Li, Jinyang and Hui, Binyuan and Qu, Ge and Li, Binhua and Yang, Jiaxi and Li, Bowen and Wang, Bailin and Qin, Bowen and Cao, Rongyu and Geng, Ruiying and others},
+  journal={arXiv preprint arXiv:2305.03111},
+  year={2023}
+}