SKATEBENCH

Data / Skatebench v2

Accuracy Distribution

Success rate based on 390 technical trick definitions

01
gemini-3.1-pro-preview
97%
02
gpt-5.4-high
82%
03
gpt-5.4-xhigh
81%
04
gpt-5.4-pro-thinking
79%
05
gemini-3-pro-preview
76%
06
gemini-3-flash-high
75%
07
glm-5
67%
08
claude-4.6-opus-thinking-high
64%
09
grok-4
61%
10
kimi-k2.5
57%
11
deepseek-v3.2-thinking-high
47%
12
kimi-k2-thinking
44%
13
grok-4.1-fast
35%
14
claude-4.6-sonnet
15%
15
minimax-m2.5
14%