Spaces:

TheFinAI
/

IJCAI-2024-FinLLM-Learderboard

Running

Jimin Huang commited on Jun 2, 2024

Commit

2f1ff79

1 Parent(s): b44eb8b

feat: modify leaderboard

Files changed (2) hide show

app.py CHANGED Viewed

@@ -10,6 +10,7 @@ TASK1_COLS = [
     ("Acc", "number"),
     ("F1", "number"),
     ("MCC", "number"),
 ]
 TASK2_COLS = [
@@ -19,6 +20,7 @@ TASK2_COLS = [
     ("Rouge-L", "number"),
     ("BertScore", "number"),
     ("BartScore", "number"),
 ]
 TASK3_COLS = [
@@ -88,12 +90,17 @@ Our leaderboard incorporates a comprehensive evaluation using diverse metrics li
 - **Dataset:** 291 data points.
 - **Evaluation Metrics:** Sharpe Ratio (final ranking metric), Cumulative Return, Daily and Annualized Volatility, Maximum Drawdown.
 For more details, refer to our [Challenge page](https://sites.google.com/nlg.csie.ntu.edu.tw/finnlp-agentscen/shared-task-finllm?authuser=0).
 """
 def create_data_interface(df):
     headers = df.columns
     types = ["str"] + ["number"] * (len(headers) - 1)
     return gr.components.Dataframe(

     ("Acc", "number"),
     ("F1", "number"),
     ("MCC", "number"),
+    ("DTL", "number"),
 ]
 TASK2_COLS = [
     ("Rouge-L", "number"),
     ("BertScore", "number"),
     ("BartScore", "number"),
+    ("DTL", "number"),
 ]
 TASK3_COLS = [
 - **Dataset:** 291 data points.
 - **Evaluation Metrics:** Sharpe Ratio (final ranking metric), Cumulative Return, Daily and Annualized Volatility, Maximum Drawdown.
+**Model Cheating Detection: Data Leakage Test (DLT)**
+To measure the risk of data leakage from the test set used in training, we introduce the Data Leakage Test (DLT). The DLT calculates the difference in perplexity between the training set and the test set. A larger difference indicates a lower likelihood of model cheating, while a smaller difference suggests a higher likelihood.
 For more details, refer to our [Challenge page](https://sites.google.com/nlg.csie.ntu.edu.tw/finnlp-agentscen/shared-task-finllm?authuser=0).
 """
 def create_data_interface(df):
     headers = df.columns
+    print (headers)
     types = ["str"] + ["number"] * (len(headers) - 1)
     return gr.components.Dataframe(

task1_result.csv CHANGED Viewed

+[email protected],0.7626,0.5237,0.7427,38.9031
+[email protected],0.7575,0.5174,0.7555
+[email protected],0.7544,0.5149,0.7581,2.2565
+[email protected],0.7513,0.5018,0.7406
+[email protected],0.7286,0.4554,0.7008
+catmemo,0.711,0.4199,0.6818
+[email protected],0.709,0.4166,0.6941
+[email protected],0.7079,0.4141,0.69
+[email protected],0.4933,0.0141,0.5905