Pricing

Simple pricing.
Know before your users do.

Start free. Upgrade when your prompts matter more than your monitoring budget.

Free
£0/month
Establish baselines, see drift in action. No card required.
Start free →
  • 3 prompts monitored
  • Manual drift checks
  • Drift score (0.0–1.0)
  • JSON format validator
  • Hourly automated monitoring
  • Slack alerts
  • Email alerts
  • API access
Pro
£249/month
For teams with multiple LLM products or services, or high-stakes output quality requirements.
Start Pro plan →
  • Unlimited prompts
  • Hourly automated monitoring
  • Full drift history + export
  • JSON + structured format validators
  • Email + Slack alerts (configurable threshold)
  • API access + webhooks
  • Unlimited LLM endpoints
  • Priority support

What does a silent regression actually cost?

Scenario Typical cost of late detection DriftWatch detection time
JSON extraction preamble regression breaks parser 2–4 days of corrupted data + dev time to debug <1 hour (hourly monitoring)
Classification drift (98% → 91% accuracy) Customer churn + 1–2 weeks to diagnose <1 hour (drift score alert at 0.3)
Instruction-following regression (no capitalization → capitalized) Regex parsers fail silently; 3–7 days to surface <1 hour (format compliance check)
Verbosity change causes context window overflow Downstream truncation errors; intermittent failures <1 hour (length delta alert)

The Starter plan pays for itself the first time it catches a production regression before your users do.

Common questions

Do I need to change my application code?
No. DriftWatch monitors independently — you paste prompts into the dashboard and monitoring starts. Nothing routes through DriftWatch in production. There are no SDK changes or proxy configurations required.
Which LLM providers does DriftWatch support?
Currently OpenAI (all GPT models including gpt-4o, GPT-5.x) and Anthropic (Claude 3.x, Claude 3.5). Additional providers in progress. You bring your own API keys — DriftWatch uses them to run your test prompts.
What exactly is a "drift score"?
A composite 0.0–1.0 score computed across three dimensions: semantic similarity (did the meaning of the output change?), format compliance (is JSON/structure preserved?), and instruction-following delta (did specific behavioral constraints like "no preamble" stop being honored?). Score above 0.3 = alert. Score above 0.5 = likely breaking change.
I already use LangSmith / Langfuse / Helicone. Do I still need this?
Yes — they serve different purposes. LangSmith, Langfuse, and Helicone are observability tools: they record what your model does. DriftWatch is a monitoring tool: it proactively tests whether your model's behavior has shifted. They're complementary, not interchangeable. See the detailed comparison →
Does pinning a model version (like gpt-4o-2024-08-06) prevent drift?
No. OpenAI and Anthropic have updated the behavior of pinned model versions without changing the version identifier. Version pinning reduces the frequency of changes but does not eliminate them. DriftWatch monitors the actual output behavior regardless of version identifier.
Can I cancel at any time?
Yes. Monthly subscriptions via Stripe. Cancel any time from your account dashboard — no lock-in, no exit fees.
What happens to my prompts?
Your prompts are stored securely and used only to run drift checks against your configured LLM endpoints. We don't use your prompts to train models or share them with third parties.

Start with 3 prompts free — no card

Add your most critical prompts. DriftWatch establishes a behavioral baseline and alerts you the moment your model starts responding differently.