GPT-5.2 changed behaviour on Feb 10, 2026 — did your prompts break?

Your LLM Just Changed.
Did You Notice?

GPT-5.2 Instant silently updated on Feb 10, 2026. OpenAI described it as "more measured and grounded in tone" — developers described it as "our prompts stopped working." DriftWatch catches these changes in minutes, not weeks.

Start Free — 3 prompts included ↗ Live Demo Dashboard

🔒 No card required · Free tier: 3 prompts · Upgrade to £99/mo for automated monitoring

12+
Developers monitoring
6+
Prompts watched
<5min
Alert latency
£0
Extra infra to manage
⚡ Trigger Event — 30 Days Ago
"GPT-5.2 Instant improves response style and quality... more measured and grounded in tone."

— OpenAI Model Release Notes, Feb 10, 2026 · source ↗ · full breakdown →

"We caught GPT-4o drifting this week... OpenAI changed GPT-4o in a way that significantly changed our prompt outputs. Zero advance notice."

— r/LLMDevs, February 2025

"In early 2025, developers reported that gpt-4o-2024-08-06 (a supposedly fixed, dated version) had changed behaviour."

— Agenta.ai Engineering Blog, 2025

Real Drift Detection — Live Data

These results were generated minutes ago against Claude API. Same model, consecutive runs — watch the natural variance.

drift_check — claude-3-haiku-20240307
2026-03-12 18:51 UTC · 5 prompts · avg drift: 0.213
MEDIUM
Single word response instruction-following
⚠️ Regression: exact_match — baseline: "Neutral." (with period), current: "Neutral" (period dropped)
0.575 +period dropped
MEDIUM
JSON extraction — strict schema format
Different whitespace formatting — still valid JSON but different bytes
0.316 +whitespace
LOW
Numbered list format instruction-following
Different wording, same structure — all validators pass
0.173 rewording
NONE
JSON array extraction format
Identical response — stable
0.000 ✓ stable
NONE
Nested JSON schema format
Identical response — stable
0.000 ✓ stable

This is natural LLM variance. When OpenAI or Anthropic update their models, this drift can spike to 0.8+ — and break your product.

Open Full Dashboard →
How DriftWatch Works
Set up once. Get alerts forever.
1

Upload Your Test Prompts

Add the prompts your product depends on — JSON parsers, classifiers, extractors. We provide example prompts to get started in under 5 minutes.

2

We Run Them Hourly

DriftWatch runs every prompt against your LLM endpoint every hour. We track format compliance, semantic drift, and instruction following.

3

Get Instant Alerts

The moment we detect a regression, you get a Slack or email alert with exactly which prompts changed, what changed, and by how much.

4

Debug With Full History

Every run is stored. Compare any two runs to see exactly when and how the model changed — full history from day one.

Everything Your Team Needs

Hourly Monitoring

Run your full test suite every 60 minutes. Never be caught off guard by a silent model update again.

📊

Drift Score Metrics

Quantified behavioral change: validator regression, semantic similarity, format compliance, and length drift — all tracked over time.

🚨

Instant Alerts

Slack webhook, email, or API webhook. Alert within 5 minutes of detecting a regression above your threshold.

🔀

Multi-Model Comparison

Track GPT-4o, Claude 3.5 Sonnet, Gemini 1.5 Pro, and local Llama models side by side. See which model drifts least.

📅

Full Audit History

Every test run stored indefinitely. Export your drift history as CSV for compliance or model evaluation reports.

🧩

Example Prompt Library

Get started fast with our curated example prompts covering JSON compliance, instruction following, classification, and more. Or add your own in minutes.

What Drift Monitoring Catches
Real classes of failures DriftWatch is built to detect

JSON format regression. Model starts adding whitespace or removing trailing punctuation from field values. json.loads() succeeds — your downstream string comparison silently fails.

Category: format compliance drift

Instruction compliance regression. Single-word classifier returns "Neutral." instead of "Neutral" — trailing period causes exact-match parsers to fall through to the wrong branch.

Category: instruction following drift — measured drift score: 0.575

Verbosity drift. Terse-answer prompts start returning paragraphs. No error, no alert from your stack — but your UI layout breaks and token costs spike.

Category: output length drift
Common Questions
How do I know if OpenAI changed my model without telling me?

You can't know from OpenAI directly — they don't send notifications when model behaviour changes. DriftWatch detects it automatically by running your test prompts hourly and comparing outputs against a stored baseline. When the output shifts beyond a 0.3 drift score, you get an email or Slack alert within 60 minutes.

Does pinning gpt-4o-2024-08-06 prevent behaviour changes?

No — not reliably. In January 2025, gpt-4o-2024-08-06 silently changed behaviour despite being a dated snapshot. OpenAI reserves the right to update any model for safety or policy reasons without notice. Version pinning reduces surface area; it does not eliminate drift.

How often does GPT-4o or GPT-5 change behaviour?

Multiple times per year — plus undisclosed minor patches. In 2025–2026: gpt-4o-2024-08-06 (Jan 2025), GPT-4o base (multiple undisclosed), and GPT-5.2 Instant (Feb 10, 2026) all had documented silent behaviour changes. Developers typically find out 2–7 days later, from user complaints.

What's the difference between LLM observability and LLM drift detection?

LLM observability (Langsmith, Langfuse, Helicone) monitors your pipeline — latency, token usage, errors. Drift detection monitors whether the model itself changed. Observability tells you your app is slow. Drift detection tells you your prompts stopped working because GPT updated silently. You need both; they solve different problems.

How do I get an alert when my LLM prompt stops working?

Sign up free (no card, 3 prompts included). Paste your prompt and add your API key. We run it hourly and alert you by email or Slack the moment output drifts. Setup takes under 5 minutes.

Simple, Transparent Pricing
Early access pricing — locked in for life when you sign up today
Starter
£99/month
For indie devs and small teams building LLM-powered products
  • 100 test prompts
  • Hourly monitoring
  • Email + Slack alerts
  • 3 LLM endpoints
  • 90-day history
  • Dashboard access
Get Started — £99/mo

14-day free trial on all plans · Cancel anytime · We'll help you migrate from any competitor

GPT-5.2 Changed on Feb 10. Are Your Prompts Still Working?

Most teams find out from user complaints — weeks later. Sign up free in 60 seconds.

Start Free — No Card Required 👀 See Live Demo (no signup)