gupta-tanish/Ultrafeedback-llama3-8b-instruct-1vs3-selection-swepo-on-policy-iteration2 Viewer • Updated 2 days ago • 63.1k • 10
gupta-tanish/Ultrafeedback-llama3-8b-Instruct-optimal-selection-1vs7_total_responses_24 Viewer • Updated 3 days ago • 60.8k • 8
gupta-tanish/Ultrafeedback-llama3-8b-Instruct-optimal-selection-1vs7_total_responses_16 Viewer • Updated 3 days ago • 60.8k • 9
gupta-tanish/Ultrafeedback-mistral-7b-instruct-v0.2-1vs3-optimal-selection Viewer • Updated 4 days ago • 62.2k • 11
gupta-tanish/Ultrafeedback-mistral-7b-instruct-1vs3-kmeans-selection Viewer • Updated 4 days ago • 62.2k • 9
gupta-tanish/Ultrafeedback-llama3-8b-instruct-1vs3-optimal-selection Viewer • Updated 5 days ago • 62.2k • 17
gupta-tanish/Ultrafeedback-llama3-8b-instruct-1vs3-kmeans-selection Viewer • Updated 5 days ago • 62.2k • 23
gupta-tanish/Ultrafeedback-mistral-7b-instruct-v0.2-1vs3-simpo-selection Viewer • Updated 5 days ago • 62.7k • 26
gupta-tanish/Ultrafeedback-llama3-8b-instruct-top2vsbottom2-selection Viewer • Updated 5 days ago • 63.1k • 29
gupta-tanish/Ultrafeedback-mistral-7b-instruct-v0.2-top2vsbottom2-selection Viewer • Updated 6 days ago • 25.1k • 50