A Sober Look at Progress in Language Model Reasoning: Pitfalls and Paths to Reproducibility
Paper
β’
2504.07086
β’
Published
β’
17
β’
3
Nice! I'm working on a CPU-only inference layer and been looking for a good and simple alternative to work on.
I would love it if there can be q8 and q4 quantization support!
What is the performance like compared to ggml and llama.cpp?
Also do it support quantization?
Thanks
Is it possible to have the decoder and encoder separate for this model? I would like to use my RK3588 NPU for decoder. thanks!