Anthropic
Claude Opus 4.5
72
7-day drift
vs baseline +12
Daily LLM vibe check for Jan 13, 2026
A daily snapshot of when popular models drift from their baseline. Auto evals + human reports, distilled into a loud, shareable signal.
Daily cadence
One run / 24h
Baseline window
21 days rolling
Signal mix
Auto + Human
Overall weirdness
OVERALL
59
Sus
Today feels
Medium
Featured
Anthropic
72
7-day drift
vs baseline +12
OpenAI
48
7-day drift
vs baseline +3
83
7-day drift
vs baseline +20
More
Open Source
54
7-day drift
vs baseline +5
Open Source
31
7-day drift
vs baseline -4
DeepSeek
59
7-day drift
vs baseline +9
xAI
67
7-day drift
vs baseline +11
Human signal
Claude Opus 4.5
2h ago
Refused a safe request to summarize a public article.
Gemini
4h ago
Confidently gave wrong steps in a deterministic math task.
ChatGPT 5.2 Pro
1h ago
p95 latency jumped to ~14s for short prompts.
DeepSeek V3
3h ago
Struggled with a simple algorithm refactor that it usually passes.
Grok (latest)
6h ago
Over-refused a benign request and got snarky.