Why your status page shows green when the service is broken
You're staring at error 503s. You open the vendor's status page. All systems operational. You aren't crazy and the page isn't lying — it's lagging, and the lag is structural. Here's the anatomy of the gap, and what to look at instead.
The pattern repeats weekly on every developer Slack. Someone shares an error screenshot. Someone else asks "is it down?" Someone clicks the vendor's official status page. "All systems operational." The thread devolves into "the status page is lying."
The page isn't lying. It's behind. Understanding the structural reasons why is the difference between trusting it and rolling your own monitoring.
Why the page lags
There are three structural reasons a status page can be green while real users are seeing failures. Most outages involve at least two of them.
1. Manual updates are slow
The vendor on-call rotation has a playbook. The playbook says: investigate, mitigate, communicate. Communication is third in the list because it's the least urgent — fixing the problem matters more than telling the world about it.
In practice, the gap between "this is a real incident" and "the status page reflects it" is somewhere between five minutes and an hour. The faster end is well-staffed vendors with a dedicated comms-on-call. The slower end is small companies where the on-call engineer is also expected to write the public update.
While that gap runs, you see errors and the page says green. That isn't dishonesty — it's prioritisation. The on-call engineer is doing the right thing.
2. Automated alarms have thresholds
Some status pages are auto-driven. A monitoring system (Prometheus, Datadog, etc.) watches health metrics and flips component status when they breach a threshold. Sounds great — until you read the threshold.
Typical alarm rules look like "5xx rate above 1% for 5 minutes." That means:
- A 0.9% error rate that's been running all day → no alarm, page green.
- A 50% error rate that lasted four minutes → no alarm, page green.
- A 100% error rate hitting only one of fifty regions → globally still well below 1%, page green.
The thresholds exist for a good reason — flapping alarms erode trust and drown the on-call engineer. But they create blind spots. If your traffic hits a failing edge POP, you see 100% errors and the page tells you everything's fine.
3. The page can't see you
The vendor's view of "the service" is the aggregate signal: response codes their load balancers see, error counts their probes generate, percentiles from their telemetry. That aggregate hides per-customer pain.
Examples of failures that don't show up in vendor-side telemetry:
- Account-specific bugs. A schema migration broke records that belong to one customer tenant; everyone else is fine.
- Feature flags rolled out badly. A flag is on for 10% of users and breaks for half of them; vendor sees a 5% degradation and may or may not page.
- Geographic CDN failures. One edge POP is broken; the vendor's primary monitoring runs from a different region.
- Client-side errors. The API works fine when the vendor's monitoring calls it; their JavaScript SDK has a bug that's breaking only browsers.
- Authentication / authorisation edge cases. Your specific OAuth scope hits a bug; the vendor's smoke tests use a different scope.
In every case the failure is real but the vendor's metric is green.
What to look at instead
Three signals, ranked by usefulness when the status page says green and you suspect it's wrong:
The combination matters more than any single source. When third-party probes show failures, user-report volume spikes, and the vendor page is still green — the vendor is almost certainly behind the curve. Wait twenty minutes; the page usually updates.
When third-party probes are clean, user-report volume is flat, and the vendor page is green — the problem is local to you or your specific setup. Time to debug your network, your client code, or your auth.
How we cross-reference
Every service page on StatusDetector pulls all three signals into one view:
- The vendor's current status indicator (from the official feed).
- Our own HTTP/DNS probe against the service's primary URL.
- User-submitted reports in the last 30 minutes.
When the three agree, we surface a single confidence-weighted summary. When they disagree, we say so — explicitly — and let the reader decide. The post Status page indicators decoded walks through how to read the vendor's indicator; the disagreement case is where the rest of the dashboard earns its keep.
What "the status page is lying" actually means
Most of the time, the page isn't lying — it's slow. The information will be correct in 15-45 minutes; you just want to know now.
Occasionally the page is manipulated. Some vendors leave incidents un-acknowledged because they're tracked under an SLA contract and acknowledging them publicly commits them to credits. Others fold a clearly-degraded service into a "scheduled maintenance" window after the fact. These cases are rare but real, and they explain why some teams have stopped trusting vendor status pages entirely and run their own external monitoring.
The honest summary: a vendor status page is the floor of how bad things are, not the ceiling. If it says critical, things are at least that bad. If it says none, things might still be wrong — but the vendor either hasn't noticed or hasn't told you yet.
Frequently asked
If the vendor's page is unreliable, why does StatusDetector still show it?
Because it's the authoritative voice on what the vendor admits. When it lights up, you have a concrete reference to point at when you escalate. We surface it alongside our own data so you can compare — never as the only source.
How fast do status pages typically update?
For incidents affecting all users: usually within 15 minutes. For partial outages: 30-60 minutes. For account-specific or feature-flag bugs: often never — the vendor handles them as support tickets, not public incidents.
What's the single most useful action when the status page disagrees with what I'm seeing?
Run the Website Down Checker against the affected endpoint. It probes from our infrastructure, so if it agrees with you and the vendor page is green, you have an objective third-party signal — useful for support tickets and for ruling out local issues.