Representation Learning in Continuous-Time Dynamic Signed Networks Paper • 2207.03408 • Published Jul 7, 2022
Chain-of-Thought Reasoning is a Policy Improvement Operator Paper • 2309.08589 • Published Sep 15, 2023 • 1
Q-Probe: A Lightweight Approach to Reward Maximization for Language Models Paper • 2402.14688 • Published Feb 22
NATURAL PLAN: Benchmarking LLMs on Natural Language Planning Paper • 2406.04520 • Published Jun 6 • 11