|
--- |
|
license: apache-2.0 |
|
--- |
|
|
|
|
|
<style> |
|
img{ |
|
user-select: none; |
|
transition: all 0.2s ease; |
|
border-radius: .5rem; |
|
} |
|
img:hover{ |
|
transform: rotate(2deg); |
|
filter: invert(100%); |
|
} |
|
@import url('https://fonts.googleapis.com/css2?family=Vollkorn:ital,wght@0,400..900;1,400..900&display=swap'); |
|
</style> |
|
|
|
<div style="background-color: transparent; border-radius: .5rem; padding: 2rem; font-family: monospace; font-size: .85rem; text-align: justify;"> |
|
|
|
![cubby](https://huggingface.co/appvoid/cubby/resolve/main/cubby.webp) |
|
|
|
This is a passthrough of arco with an experimental model. It improved on arc challenge, only missing 1.2 points to get to the level of modern 3b baseline performance. |
|
|
|
If you prefer answering multilingual, general knowledge, trivially simple questions chose qwen or llama. If you prefer solving trivially simple english tasks while being half the size, chose arco. |
|
|
|
#### prompt |
|
|
|
there is no prompt intentionally set. |
|
|
|
|
|
#### benchmarks |
|
|
|
zero-shot results from state-of-the-art small language models |
|
|
|
| Parameters | Model | MMLU | ARC-C | HellaSwag | PIQA | Winogrande | Average | |
|
| -----------|--------------------------------|-------|-------|-----------|--------|------------|---------| |
|
| 0.5b | qwen 2 |44.13| 28.92| 49.05 | 69.31 | 56.99 | 49.68 | |
|
| 0.3b | smollm |25.52| 37.71| 56.41| 71.93| 59.27| 50.17 | |
|
| 0.5b | danube 3 | 24.81| 36.18| 60.46| 73.78 | 61.01 | 51.25 | |
|
| 0.5b | qwen 2.5 |**47.29**|31.83|52.17|70.29|57.06|51.72| |
|
| 0.5b | arco |26.17|37.29|62.88|74.37|**62.27**|52.60| |
|
| 0.5b | arco 2 |25.51|**38.82**|**63.02**|**74.70**|61.25|**52.66**| |
|
#### supporters |
|
|
|
<a href="https://ko-fi.com/appvoid" target="_blank"><img src="https://cdn.buymeacoffee.com/buttons/v2/default-yellow.png" alt="Buy Me A Coffee" style="height: 34px !important; margin-top: -4px;width: 128px !important; filter: contrast(2) grayscale(100%) brightness(100%);" ></a> |
|
|
|
### trivia |
|
|
|
arco also means "arc optimized" hence the focus on this cognitive-based benchmark. |
|
</div> |