Summary
On April 10, 2026, a planned dependency upgrade triggered a full authentication outage on Atmos Pro. Every authenticated request returned an unauthorized error, the dashboard failed to load, and sign-in was broken for all users. Production was rolled back within minutes, the root cause was identified later the same day, and a fix was deployed with regression test coverage. No customer data was lost or compromised.
Impact
- 100% authentication failure across Atmos Pro — all users unable to sign in
- Dashboard unable to load
- No data loss; no unauthorized access
Timeline
Planned dependency upgrade deployed to production
100% authentication failure detected across all endpoints
Production rolled back to last known good state
Investigation began
Root cause identified: latent defects in our authentication flow, previously masked by looser upstream validation
behavior
Fix deployed with regression test coverage
What Happened
A planned upgrade to an internal payload validation library produced stricter error behavior than the prior version. The stricter behavior exposed three pre-existing defects in adjacent systems:
- Authentication error handling. A defect in how our auth callbacks propagated errors caused validation failures to escape the controlled error path. When the new validation behavior produced errors, the authentication layer returned null sessions, causing every authenticated request to fail.
- Dashboard response completeness. A prior change had added required fields to an internal data schema without corresponding updates to the dashboard service. The stricter validation rejected the incomplete response.
- Identity provider configuration drift. An earlier fix introduced a configuration that caused our authentication library to attempt an unsupported discovery flow against our identity provider.
What We Did
- Hardened authentication error handling so validation errors are contained within the controlled error path and never produce null sessions.
- Restored missing fields in the dashboard service response.
- Corrected the identity provider configuration so discovery is handled natively.
- Tightened log redaction to ensure additional sensitive fields are never emitted in plain text.
- Added regression tests covering validation compatibility, session callback resilience, and dashboard response validation.
- Expanded exception tracking and observability across the authentication and data paths to accelerate detection and investigation of future issues.
Follow-Up
No additional follow-up is planned at this time. The regression tests added as part of the fix cover the failure modes encountered.