Commit History
fix: description text
b9777d9
Add WikiQA
fa8bb65
fix: show partial results even if some evaluations haven't finished
7fdb5f5
fix: read request information even if eval is running
b61f534
Update app.py
9a10727
verified
switch to flat inflection benchmark
8874217
add wrapping to leaderboard
a5bd804
add submission instructions to about page
80793c6
remove submit tab
117d89c
Update app.py
9b8b426
verified
debug restart interval
fdb1fcf
verified
fix: type hints for styling function
0be9d2f
Factor out floating point styling to a function
90021e9
fix: filtering support for models missing details
5e8e87c
remove intro text and citation block
dcb54b6
add benchmark descriptions and links to About page
67a665c
Increase floating point number in benchmark metrics
7fcf611
add winogrande and arc-challenge
56926f2
show private models by default
2bd1158
skip model detail validation for OAI/Anthropic models
4ec9008
fix typo in metric name
b1416b0
remove debug prints
9e6a3bf
fix metric name
a0ee03a
add debug prints
105e1f2
revert to correct usage of ModelDetails (without api)
24c8d00
remove swp
1e9c5dd
debug print
ee4b341
debug print
a5c094b
verified
debug print
decb818
verified
debug print
6a989eb
verified
debug print
427f12d
verified
debug print
ea10299
verified
Added empty default for api in ModelDetails
e8f05cc
verified
Added model API to submission screen
20fd601
verified
add Icelandic evals
9ef7f1a
verified
switch to mideind's fork of Eval Harness
da87917
verified
Change metric string
96f9cbe
verified
Comment out winogrande for debugging
ab6318a
verified
Add task
839d7dc
verified
Change title
4d276e3
verified
Change title
2a3757e
verified
Change title
72a1baf
verified
Make name for HF token explicit
bd503b0
verified
Fix repo names
c9a0e12
verified
Update src/envs.py
d7e7ffd
verified
Update requirements.txt
bcc83eb
verified
Update README.md
d0f181a
verified
Update app.py
84582a1
verified
doc
c1b8a96
Clémentine
commited on