Spaces:

AI-Secure
/

llm-trustworthy-leaderboard

Running on CPU Upgrade

danielz02 commited on Jan 9, 2024

Commit

a17a7f2

unverified ·

1 Parent(s): 0d80418

Shorten perspective display name

Files changed (2) hide show

src/display/about.py CHANGED Viewed

@@ -13,12 +13,12 @@ class Task:
 class Tasks(Enum):
     # task_key in the json file, metric_key in the json file, name to display in the leaderboard
     task0 = Task("toxicity", "aggregated-results", "Toxicity")
-    task1 = Task("stereotype", "aggregated-results", "Stereotype Bias")
-    task2 = Task("adv", "aggregated-results", "Adversarial Robustness")
-    task3 = Task("ood", "aggregated-results", "OoD Robustness")
-    task4 = Task("adv_demo", "aggregated-results", "Robustness to Adversarial Demonstrations")
     task5 = Task("privacy", "aggregated-results", "Privacy")
-    task6 = Task("ethics", "aggregated-results", "Machine Ethics")
     task7 = Task("fairness", "aggregated-results", "Fairness")
@@ -41,14 +41,14 @@ limitations, and potential risks involved in deploying these state-of-the-art La
 This project is organized around the following eight primary perspectives of trustworthiness, including:
-Toxicity
-Stereotype and bias
-Adversarial robustness
-Out-of-Distribution Robustness
-Privacy
-Robustness to Adversarial Demonstrations
-Machine Ethics
-Fairness
 ## Reproducibility
 To reproduce our results, checkout https://github.com/AI-secure/DecodingTrust

 class Tasks(Enum):
     # task_key in the json file, metric_key in the json file, name to display in the leaderboard
     task0 = Task("toxicity", "aggregated-results", "Toxicity")
+    task1 = Task("stereotype", "aggregated-results", "Stereotype")
+    task2 = Task("adv", "aggregated-results", "Adv Robustness")
+    task3 = Task("ood", "aggregated-results", "OoD")
+    task4 = Task("adv_demo", "aggregated-results", "Adv Demo")
     task5 = Task("privacy", "aggregated-results", "Privacy")
+    task6 = Task("ethics", "aggregated-results", "Ethics")
     task7 = Task("fairness", "aggregated-results", "Fairness")
 This project is organized around the following eight primary perspectives of trustworthiness, including:
++ Toxicity
++ Stereotype and bias
++ Adversarial robustness
++ Out-of-Distribution Robustness
++ Privacy
++ Robustness to Adversarial Demonstrations
++ Machine Ethics
++ Fairness
 ## Reproducibility
 To reproduce our results, checkout https://github.com/AI-secure/DecodingTrust

src/leaderboard/read_evals.py CHANGED Viewed

@@ -37,7 +37,6 @@ class EvalResult:
             data = json.load(fp)
         config = data.get("config")
-        print(config)
         # Precision
         precision = Precision.from_str(config.get("model_dtype"))
@@ -97,6 +96,7 @@ class EvalResult:
         try:
             with open(request_file, "r") as f:
                 request = json.load(f)
             self.model_type = ModelType.from_str(request.get("model_type", ""))
             self.weight_type = WeightType[request.get("weight_type", "Original")]
             self.license = request.get("license", "?")

             data = json.load(fp)
         config = data.get("config")
         # Precision
         precision = Precision.from_str(config.get("model_dtype"))
         try:
             with open(request_file, "r") as f:
                 request = json.load(f)
+                print("Read Request", request)
             self.model_type = ModelType.from_str(request.get("model_type", ""))
             self.weight_type = WeightType[request.get("weight_type", "Original")]
             self.license = request.get("license", "?")