Not functional, but a base for a leaderboard.
DABstep Reasoning Benchmark Leaderboard
Submit code models for evaluation on benchmarks