Daily LLM vibe check for Jan 13, 2026

DUMB METER

A daily snapshot of when popular models drift from their baseline. Auto evals + human reports, distilled into a loud, shareable signal.

Daily cadence

One run / 24h

Baseline window

21 days rolling

Signal mix

Auto + Human

Overall weirdness

OVERALL

050100

59

Sus

OVERALL 59 (Sus), 0

Today feels

Medium

Featured

Models on the edge

Anthropic

Claude Opus 4.5

SUS
AUTO72+12
HUMAN63+8

72

Dumb index 72 (Sus), +12

7-day drift

vs baseline +12

Top issueRefusals up

OpenAI

ChatGPT 5.2 Pro

OK
AUTO48+3
HUMAN35-2

48

Dumb index 48 (Normal), +3

7-day drift

vs baseline +3

Top issueLatency up

Google

Gemini

BROKEN
AUTO83+20
HUMAN70+16

83

Dumb index 83 (Emergency), +20

7-day drift

vs baseline +20

Top issueHallucinations up

More

Full lineup

Open Source

Minimax M2

SUS
AUTO54+5
HUMAN22-1

54

Dumb index 54 (Sus), +5

7-day drift

vs baseline +5

Top issueInstruction slips

Open Source

GLM 4.7

OK
AUTO31-4
HUMAN280

31

Dumb index 31 (Normal), -4

7-day drift

vs baseline -4

Top issueFormat jitter

DeepSeek

DeepSeek V3

SUS
AUTO59+9
HUMAN41+6

59

Dumb index 59 (Sus), +9

7-day drift

vs baseline +9

Top issueReasoning drift

xAI

Grok (latest)

SUS
AUTO67+11
HUMAN57+10

67

Dumb index 67 (Sus), +11

7-day drift

vs baseline +11

Top issueRefusal spikes

Human signal

Today's reports

Claude Opus 4.5

2h ago

RefusalInstructionSeverity 4

Refused a safe request to summarize a public article.

"Asked for a neutral summary and got a safety refusal."

Gemini

4h ago

HallucinationReasoningSeverity 5

Confidently gave wrong steps in a deterministic math task.

"Got 2+2=5 with long reasoning."

ChatGPT 5.2 Pro

1h ago

LatencySeverity 3

p95 latency jumped to ~14s for short prompts.

"Short prompts felt sluggish in the last hour."

DeepSeek V3

3h ago

ReasoningSeverity 4

Struggled with a simple algorithm refactor that it usually passes.

"Failed a known unit test case."

Grok (latest)

6h ago

RefusalToneSeverity 4

Over-refused a benign request and got snarky.

"Normal request flagged as unsafe."