Post
2209
Introducing AutoThink: Adaptive reasoning for LLMs that improves performance by 43% on reasoning benchmarks!
Instead of using fixed thinking budgets, AutoThink:
- Classifies query complexity (HIGH/LOW) using adaptive classification
- Dynamically allocates thinking tokens based on complexity
- Uses steering vectors derived from Pivotal Token Search to guide reasoning patterns
Results on DeepSeek-R1-Distill-Qwen-1.5B:
- GPQA-Diamond: 31.06% vs 21.72% baseline (+9.34 points)
- MMLU-Pro: 26.38% vs 25.58% baseline (+0.8 points)
- Uses fewer tokens than baseline approaches
Works with any local reasoning model - DeepSeek, Qwen, Llama, custom models. The technique combines our research on Pivotal Token Search (PTS) implementation and adaptive classification frameworks.
Paper: AutoThink: efficient inference for reasoning LLMs
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5253327
Code and examples:
https://github.com/codelion/optillm/tree/main/optillm/autothink
PTS implementation and technical details:
https://github.com/codelion/pts
https://huggingface.co/blog/codelion/pts
Adaptive classifier framework:
https://github.com/codelion/adaptive-classifier
Would love to hear your thoughts on adaptive resource allocation for LLM reasoning! Have you experimented with similar approaches?
Instead of using fixed thinking budgets, AutoThink:
- Classifies query complexity (HIGH/LOW) using adaptive classification
- Dynamically allocates thinking tokens based on complexity
- Uses steering vectors derived from Pivotal Token Search to guide reasoning patterns
Results on DeepSeek-R1-Distill-Qwen-1.5B:
- GPQA-Diamond: 31.06% vs 21.72% baseline (+9.34 points)
- MMLU-Pro: 26.38% vs 25.58% baseline (+0.8 points)
- Uses fewer tokens than baseline approaches
Works with any local reasoning model - DeepSeek, Qwen, Llama, custom models. The technique combines our research on Pivotal Token Search (PTS) implementation and adaptive classification frameworks.
Paper: AutoThink: efficient inference for reasoning LLMs
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5253327
Code and examples:
https://github.com/codelion/optillm/tree/main/optillm/autothink
PTS implementation and technical details:
https://github.com/codelion/pts
https://huggingface.co/blog/codelion/pts
Adaptive classifier framework:
https://github.com/codelion/adaptive-classifier
Would love to hear your thoughts on adaptive resource allocation for LLM reasoning! Have you experimented with similar approaches?