The GitHub Actions Problem Nobody Talks About

Erik Osterman

CEO & Founder of Cloud Posse

|February 25, 2026

Erik Osterman

CEO & Founder of Cloud Posse

Erik is the founder of Cloud Posse and creator of Atmos. With over a decade of experience helping teams adopt Terraform at scale, he is passionate about open-source infrastructure tooling and developer experience.

Book a Meeting

We love GitHub Actions. We run everything on it — our CI/CD, our deployments, our drift checks, our release automation. It powers our open-source projects, our client engagements, and our own internal infrastructure. It is, by almost any measure, a remarkable platform. The ecosystem is vast, the runner infrastructure is solid, and the developer experience of defining workflows in YAML next to your code is something we genuinely appreciate.

And after years of operating Terraform at scale through GitHub Actions, we have hit every wall it has. We have worked around its limitations, built custom tooling on top of it, and watched dozens of teams do the same. This is not a criticism of GitHub Actions. It is a recognition that GitHub Actions was built as a general-purpose CI/CD platform, and infrastructure orchestration has requirements that general-purpose platforms do not address natively. Understanding where those gaps are is the first step toward closing them.

To be clear: we are not saying GitHub Actions is the wrong choice. We are saying it is the right foundation — and Atmos Pro is the orchestration layer it needs for infrastructure at scale.

No ordering guarantees

workflow_dispatch is fire-and-forget. When you dispatch a workflow, it runs. It does not know or care that another workflow needs to finish first. But infrastructure has dependencies. Your VPC must exist before your EKS cluster. Your EKS cluster must exist before your application services. Your DNS zones must exist before the certificates that reference them.

When a pull request touches six stacks across three accounts, someone — or something — needs to figure out the right sequence. Today, that someone is usually an engineer writing custom workflow logic with workflow_run triggers and conditional steps. They chain jobs together with needs keys, add if conditions that check outputs from previous steps, and build increasingly fragile dependency graphs in YAML. That logic becomes its own maintenance burden. It is tested manually, documented poorly, and understood fully by one or two people on the team. When it breaks — and it does break — debugging it means reading through workflow run logs across multiple dispatched jobs, trying to reconstruct what happened and in what order.

No concurrency coordination

Two engineers merge pull requests that touch the same Terraform stack within minutes of each other. Two GitHub Actions workflows fire. Both run terraform apply against the same state. One of them fails with a state lock error — if you are lucky. If you are not lucky, you get state corruption. If you are really unlucky, you get a partial apply that leaves your infrastructure in a state that does not match any version of your code.

GitHub Actions has concurrency groups, but they operate per-workflow, not across workflows or repositories. There is no native mechanism to say "only one apply to prod/us-east-1/vpc at a time, across all PRs and all workflows." You can approximate this with external locking — DynamoDB tables, Redis locks, custom APIs — but now you are building and maintaining distributed coordination infrastructure just to safely run your infrastructure automation. The irony is not lost on us.

Drift is invisible

Nothing in GitHub Actions tells you when your infrastructure diverges from your code. Drift happens constantly — through console changes, through other automation, through manual CLI commands during an incident, through a colleague who "just needed to fix one thing quickly." You discover it during your next terraform plan or, worse, during an incident at 2 AM when the resource you expected to exist has different properties than your code declares.

Detecting drift requires running plans on a schedule, comparing the output to the expected state, and surfacing the results somewhere useful. GitHub Actions can run scheduled workflows, but it gives you no built-in way to track, aggregate, or act on drift across your infrastructure. The output is a workflow run log buried in the Actions tab. There is no dashboard, no status page, no alert. You get a green check or a red X, and if you want anything more nuanced, you are building it yourself.

No cross-repo visibility

When your infrastructure spans fifty repositories, the GitHub Actions UI is not designed to give you a unified view. Each repository has its own Actions tab, its own run history, its own logs. There is no native way to ask "what deployed across my organization in the last hour?" or "which stacks have failing workflows right now?" or "who triggered the last deployment to production?"

Teams end up building custom dashboards, scraping the GitHub API, piping workflow data into Datadog or Grafana, or just accepting that they do not have this visibility. The information exists — it is scattered across hundreds of workflow runs in dozens of repositories. What is missing is aggregation. A single place where a platform team can see the state of their infrastructure automation without clicking through fifty repos.

Monorepo blast radius

A single pull request in an infrastructure monorepo can touch dozens of stacks. Which ones are actually affected? Which ones need to be planned? Which ones need to be applied, and in what order? GitHub Actions does not know. It runs whatever workflows are triggered and hopes for the best. A path filter in your workflow definition is a blunt instrument — it tells you that something changed in a directory, not which specific stacks are affected or what their dependencies are.

Understanding the blast radius of a change — the affected stacks, their dependency relationships, the accounts and regions they touch — requires parsing your infrastructure configuration and building a dependency graph. That is application-level logic, not CI/CD logic. It requires understanding your stack configuration, your component relationships, and your environment topology. No general-purpose CI/CD platform is going to build that for you, because it is specific to how you organize your infrastructure.

HOW ATMOS PRO SOLVES THIS

Ordered deployments: reads your stack dependency graph and dispatches workflows in the correct sequence

Deployment locking: advisory locks prevent concurrent applies to the same stack, with automatic expiry and force unlock

Drift detection: scheduled and manual checks with color-coded status and one-click remediation

Dashboard: unified view of deployments, events, and drift across all repositories in your organization

These are not edge cases

Every problem described above is the daily reality for any team running Terraform through GitHub Actions at scale. They are not theoretical. They are the things that wake people up at night, that cause deployment freezes before long weekends, that make senior engineers reluctant to approve pull requests that touch shared infrastructure.

We built Atmos Pro because we lived these problems for years before deciding to solve them properly. Not by replacing GitHub Actions — it remains the execution engine, the thing it is excellent at — but by adding the coordination layer that infrastructure automation demands. If you are dealing with any of these challenges, take a look at the full feature overview or see what it looks like inside the dashboard.

Keep GitHub Actions. Fix the gaps.

Atmos Pro adds deployment ordering, locking, drift detection, and cross-repo visibility to your existing GitHub Actions workflows.

Get Started Read the Announcement