The latest pre-release version of terminal-bench core. Hand crafted by undergraduate, graduate, and industry researchers.
117 tasks
Core task set for terminal-bench. Hand crafted by undergraduate, graduate, and industry researchers.
80 tasks
Core task set for terminal-bench with recent patches. Hand crafted by undergraduate, graduate, and industry researchers.
80 tasks
Adapter for DevEval (https://github.com/open-compass/DevEval). Dataset PR: https://github.com/laude-institute/terminal-bench-datasets/pull/1
63 tasks
Adapter for EvoEval (https://github.com/open-compass/EvoEval). Dataset PR: https://github.com/laude-institute/terminal-bench-datasets/pull/2
100 tasks
Adapter for AppWorld (https://github.com/open-compass/AppWorld). Dataset PR: https://github.com/laude-institute/terminal-bench-datasets/pull/3
57 tasks
Adapter for SWEBench (https://github.com/open-compass/SWEBench). Dataset PR: https://github.com/laude-institute/terminal-bench-datasets/pull/4
500 tasks