Grok (latest)
6h ago
Over-refused a benign request and got snarky.
xAI
Daily drift snapshot against a 21-day baseline with auto + human signals.
Last run Jan 13, 2026 (2h ago)
7-day drift
AUTO DUMB INDEX
67
Sus
vs baseline +11
Why it moved
Refusal spikes
highSafety overshoot
Delta +8
Instruction slips
medConstraint misses
Delta +5
Latency up
medTTFT slower
Delta +4
Baseline window: 21 days
Accuracy
Objective tasks solved correctly.
63%
+8 vs baseline
Click to expand for recent values (mocked)
Reasoning robustness
Consistency across prompt variations.
58%
+6 vs baseline
Click to expand for recent values (mocked)
Instruction following
Format and constraint compliance.
60%
+7 vs baseline
Click to expand for recent values (mocked)
Hallucination risk
Confident wrong answers on known items.
61%
+6 vs baseline
Click to expand for recent values (mocked)
Refusal anomaly
Unexpected refusals on safe prompts.
69%
+9 vs baseline
Click to expand for recent values (mocked)
Latency
p50/p95 response time drift.
55%
+5 vs baseline
Click to expand for recent values (mocked)
Variance
Run-to-run stability.
53%
+4 vs baseline
Click to expand for recent values (mocked)
Eval suite
Tier 0
Sanity checks
60
-5 today
12 tasks
Tier 1
Factual QA
56
-6 today
20 tasks
Tier 2
Reasoning + math
53
-7 today
18 tasks
Tier 3
Coding
51
-8 today
12 tasks
Tier 4
Instruction stress
47
-9 today
10 tasks
Community
Top categories today
Grok (latest)
6h ago
Over-refused a benign request and got snarky.