Spaces:

hebrew-llm-leaderboard
/

leaderboard

Running on CPU Upgrade

App Files Files Community

Shaltiel commited on May 9

Commit

531a096

•

1 Parent(s): cf85943

Added SNLI score

Browse files

Files changed (1) hide show

src/about.py +44 -4

src/about.py CHANGED Viewed

@@ -12,10 +12,11 @@ class Task:
 # ---------------------------------------------------
 class Tasks(Enum):
     # task_key in the json file, metric_key in the json file, name to display in the leaderboard
-    task0 = Task("custom|heq-qa-tlnls|0", "heq_tlnls", "QA TLNLS (HeQ)")
-    task1 = Task("custom|sentiment-acc|0", "sentiment_acc", "Sentiment Acc (Mafat)")
-    task2 = Task("custom|winograd-acc|0", "winograd_acc", "Winograd (Binary) Acc (V. Schwartz)")
-    task3 = Task("custom|he-en-trans-bleu|0", "sentence_bleu", "Translation BLEU")
 NUM_FEWSHOT = 0 # Change with your few shot
 # ---------------------------------------------------
@@ -170,6 +171,45 @@ English: Some sentence to translate to Hebrew <br/>
 Hebrew:
 </blockquote>
 """
 EVALUATION_QUEUE_TEXT = """

 # ---------------------------------------------------
 class Tasks(Enum):
     # task_key in the json file, metric_key in the json file, name to display in the leaderboard
+    task0 = Task("custom|snli-acc|0", "snli_acc", "SNLI Accuracy")
+    task1 = Task("custom|heq-qa-tlnls|0", "heq_tlnls", "QA TLNLS (HeQ)")
+    task2 = Task("custom|sentiment-acc|0", "sentiment_acc", "Sentiment Acc (Mafat)")
+    task3 = Task("custom|winograd-acc|0", "winograd_acc", "Winograd (Binary) Acc (V. Schwartz)")
+    task4 = Task("custom|he-en-trans-bleu|0", "sentence_bleu", "Translation BLEU")
 NUM_FEWSHOT = 0 # Change with your few shot
 # ---------------------------------------------------
 Hebrew:
 </blockquote>
+5. SNLI Accuracy
+    - **Source**: We took a sample of documents from the test-subset of the official SNLI corpus.
+    - **Scoring**: We compute the accuracy score on the predictions, expecting either "סתירה", "התאמה", or "כלום".
+    - **Number of examples**: There are a total of 210 examples - 70 from each class - where each example was translated using [Dicta's translation engine](https://translate.dicta.org.il), and then manually reviewed and corrected as needed.
+    - **Few-Shot Format**: For every prompt, we provide 12 few-shot examples, 4 from each category.
+    For example:
+<blockquote dir="rtl" style='text-align: right; background-color: #f0f0f0'>
+<p>
+הנחת יסוד: נער מנגן בחצוצרתו במהלך הופעה עם להקתו.<br/>
+השערה: לאף אחד אין חצוצרה.<br/>
+תשובה: סתירה<br/>
+...
+הנחת יסוד: הנערה לבושה במעיל חום, בעודה פוסעת בשלג.<br/>
+השערה: הגברת הלובשת מעיל מחפשת את כלבה האובד.<br/>
+תשובה: כלום<br/>
+...
+הנחת יסוד: ספינת־פאר בה אנשים עולים ויורדים.<br/>
+השערה: אנשים עולים ויורדים מספינות.<br/>
+תשובה: התאמה<br/>
+...
+הנחת יסוד: הנחה חדשה<br/>
+השערה: השערה חדשה<br/>
+תשובה:
+</p>
+</blockquote>
 """
 EVALUATION_QUEUE_TEXT = """