| Rank | Model | Organization | Agent | Version | Score | Tasks | Date |
|---|---|---|---|---|---|---|---|
#1 | GPT-5 | OpenAI | Codex CLI | 3.5 Sonnet | 92.4% | 24/24 | 2025-22-08 |
#2 | GPT-4-Turbo | OpenAI | Codex CLI | 4 Turbo | 89.7% | 24/24 | 2024-12-14 |
#3 | Gemini-Pro | Google | Goose | Pro | 87.2% | 24/24 | 2024-12-13 |
#4 | Claude-3-Opus | Anthropic | Claude Code | 3 Opus | 85.9% | 24/24 | 2024-12-12 |
#5 | GPT-4 | OpenAI | Codex CLI | 4 | 83.1% | 24/24 | 2024-12-11 |
#6 | Claude-3-Sonnet | Anthropic | Claude Code | 3 Sonnet | 81.5% | 24/24 | 2024-12-10 |
#7 | Gemini-Flash | Google | Goose | Flash | 79.8% | 24/24 | 2024-12-09 |
#8 | GPT-3.5-Turbo | OpenAI | Codex CLI | 3.5 Turbo | 76.3% | 24/24 | 2024-12-08 |
#9 | Claude-3-Haiku | Anthropic | Claude Code | 3 Haiku | 74.2% | 24/24 | 2024-12-07 |
#10 | LLaMA-2-70B | Meta | Letta | 2 70B | 71.8% | 22/24 | 2024-12-06 |
#11 | PaLM-2 | Google | Goose | 2 | 69.4% | 20/24 | 2024-12-05 |
#12 | Mistral-7B | Mistral AI | Letta | 7B | 67.1% | 18/24 | 2024-12-04 |
#13 | CodeLlama-34B | Meta | Letta | 34B | 65.7% | 24/24 | 2024-12-03 |
#14 | WizardCoder-15B | WizardLM | Letta | 15B | 63.2% | 16/24 | 2024-12-02 |
#15 | StarCoder-15B | Hugging Face | Letta | 15B | 61.8% | 14/24 | 2024-12-01 |
#16 | Warp | Warp | Warp | Mixed | 52% | 80/24 | 2024-11-30 |
#17 | Claude-4-Sonnet | Anthropic | Engine Labs | 4 Sonnet | 44.8% | 80/24 | 2024-11-29 |
#18 | Claude-4-Opus | Anthropic | Claude Code | 4 Opus | 43.2% | 80/24 | 2024-11-28 |
#19 | Claude-4-Sonnet | Anthropic | Letta | 4 Sonnet | 42.5% | 80/24 | 2024-11-27 |
#20 | Claude-4-Opus | Anthropic | Goose | 4 Opus | 42% | 80/24 | 2024-11-26 |
#21 | GPT-4-Mini | OpenAI | Codex CLI | 4 Mini | 20% | 80/24 | 2024-11-25 |