Atmos Pro Logo

Atmos Pro

ProductPricingDocsBlogChangelog
Create Workspace
← Back to Incidents

Workflow Dispatch Failures and Feature Access Errors Following Platform Update

Occurred: 2026-05-26 at 19:30 UTC
Mitigated: 2026-05-26 at 19:56 UTC
Resolved: 2026-05-26 at 22:04 UTC
Affected:DispatchesAI SummariesAuto-ReviewRepositories
Author: igor
Atmos Pro Logo

Atmos Pro

The fastest way to deploy your apps on AWS with Terraform and GitHub Actions.

GitHubTwitterLinkedInYouTubeSlack

For Developers

  • Quick Start
  • Example Workflows
  • Atmos Documentation

Community

  • Register for Office Hours
  • Join the Slack Community
  • Try our Newsletter

Company

  • About Cloud Posse
  • Security
  • Pricing
  • Blog
  • Media Kit

Legal

  • SaaS Agreement
  • Terms of Use
  • Privacy Policy
  • Disclaimer
  • Cookie Policy

© 2026 Cloud Posse, LLC. All rights reserved.

Checking status...

Summary

On May 26, 2026, a scheduled platform update introduced two separate bugs. The first prevented GitHub webhook events from being processed, causing workflows to not be triggered during a 26-minute window. The second caused access errors on several features — including AI summaries, auto-review, dispatch settings, badge token management, repository management, and codeowners validation — for approximately 2 hours. Most workflows that were not triggered during the outage window were subsequently reconciled by our scheduled reconciliation process. No customer data was lost or compromised.

Impact

Workflow dispatch (19:30–19:56 UTC):
  • GitHub events were not processed — workflows were not triggered for pull request and push events during this window
  • Most affected events were reconciled automatically by our scheduled reconciliation jobs after the fix was deployed
  • No events were permanently lost
Feature access errors (19:23–22:04 UTC):
  • Users received access-denied errors when attempting to use AI summaries, auto-review, dispatch settings, badge token management, repository management, and codeowners validation
  • Authentication and sign-in were not affected — users could log in and navigate the dashboard normally
  • Running workflows and previously queued jobs were not interrupted

Timeline

Scheduled platform update deployed to production
Errors in the workflow dispatch pipeline detected by our team
Root cause of the dispatch failures identified — a missing field in event parsing caused the pipeline to reject valid GitHub webhook payloads
Parsing fix deployed to production
Workflow dispatch pipeline confirmed operational — scheduled reconciliation began processing missed events
Investigation of access errors on AI summaries, auto-review, and related features began; root cause identified — an access control check was incorrectly denying all requests following the data model update
Access control fix deployed to production
All affected features confirmed operational

What Happened

We deployed a large, planned update to how Atmos Pro stores and organizes GitHub App installation data. The update itself was successful, but it introduced two bugs in adjacent parts of the system that were not caught before deployment.
Workflow dispatch failures. The update changed the internal structure of installation records. One part of the event processing pipeline was still expecting a field in the old location — when it wasn't there, the pipeline rejected the incoming event rather than processing it. GitHub webhook events received during this window were not dispatched, meaning workflows were not triggered. After we deployed the fix, our scheduled reconciliation process automatically re-evaluated the missed events and triggered the appropriate workflows. Most affected runs were recovered without customer action.
Feature access errors. A separate bug in our access control layer caused it to incorrectly conclude that every request lacked the necessary permissions — even for legitimate workspace owners. This caused access-denied errors on any feature that ran through this check. The check was designed to fail safely by denying access rather than allowing it, which was the correct behavior; the problem was that the condition triggering the denial was always evaluating to true after the update. This was a mock-reality drift issue: our automated tests populated the affected field from test fixtures, so the behavior under the updated data model went undetected until production.

What We Did

  1. 1
    Identified the cause of the dispatch pipeline failures within approximately 5 minutes of detecting the first errors
  2. 2
    Deployed a targeted fix for the event parsing issue and confirmed the dispatch pipeline was processing events normally
  3. 3
    Continued monitoring and identified the access control issue affecting AI summaries, auto-review, and related features
  4. 4
    Deployed a fix for the access control layer and confirmed all affected features were operational

What We're Doing to Prevent This

  • Contract testing for incoming webhooks and API. We a backlog item implement contract tests that verify our event processing pipeline correctly handles the full range of payloads sent by GitHub webhooks and our API — the parsing failure here would have been caught by this. These tests will run as part of our standard CI pipeline.
  • Reduced deployment downtime. We have added a backlog item to improve our deployment process to minimize potential service disruption during large schema updates, including staged rollouts and automated rollback triggers.