Accuracy Distribution
Success rate based on 390 technical trick definitions
01
gemini-3.1-pro-preview
97%02
gpt-5.4-high
82%03
gpt-5.4-xhigh
81%04
gpt-5.4-pro-thinking
79%05
gemini-3-pro-preview
76%06
gemini-3-flash-high
75%07
glm-5
67%08
claude-4.6-opus-thinking-high
64%09
grok-4
61%10
kimi-k2.5
57%11
deepseek-v3.2-thinking-high
47%12
kimi-k2-thinking
44%13
grok-4.1-fast
35%14
claude-4.6-sonnet
15%15
minimax-m2.5
14%