Tagging debt compounds. Start with the policy, not the tags.
Why retroactive tagging is slower than it looks
When we audit a mid-market estate for the first time, tagging coverage sits between 40 and 60 percent. The missing tags are not missing because engineers are careless. They are missing because the resource was created before the tagging policy existed, before the engineer knew the policy existed, or in an automation script that predates the tag schema by two years.
Tagging 500 resources retroactively means: find the resource, identify the owner (often unclear), decide on the correct tag values, apply them without breaking existing automation, and verify that cost allocation picks up the change. We time this. It averages 2.5 to 4 hours of combined engineering and FinOps time per resource class. For a 500-resource estate, that is three to four weeks of interrupted work, after which the bill still has gaps because new resources were created during the cleanup.
The schema question people skip
Most tag debates start with which tags should we require. The more important question is how to make the right answer the obvious path.
The schema that works in production is three layers. Account or subscription level: environment (prod, staging, dev), cost centre, business unit. These are static, set once at account creation. Service level: service name, team name, tier (critical, standard, batch). These map to the team ownership model and change only when a team is restructured. Workload level: application name, component, version. Optional, but precise enough for chargeback when a single team runs multiple products.
Five mandatory tags per resource. Optional tags per team. Everything validated in IaC, not documented in a wiki that no one opens before provisioning.
Policy before tags: the controls that actually work
AWS Organizations tag policies, Azure Policy tag rules, and GCP Organization Policy all provide a mechanism to prevent resource creation without required tags. None of them are on by default. None of them are hard to enable. We have yet to see a team enable them proactively without an external prompt.
The enforcement path we run: define the tag schema in the platform repo as a machine-readable spec. Enforce in Terraform via a shared validation module that fails the plan if mandatory tags are absent. Enforce at the cloud organisation level as a policy that blocks API calls for non-compliant resources. Surface violations in a Slack channel weekly with a link to the remediation runbook. Three enforcement layers means a missing tag fails before a PR merges, fails before Terraform apply, and fails before the cloud API accepts the request.
Tag coverage goes from 55 percent to 95 percent in the first sprint. The remaining 5 percent are resources that predate enforcement. They get cleaned up over the following quarter as teams cycle through normal infra changes.
"Tag coverage from 55 to 95 percent is a sprint. The remaining gap is not a tagging problem. It is a cost model problem."
Danny Zak / FinOps Lead
The allocation gap that survives good tagging
Even with 95 percent tag coverage and accurate values, 8 to 12 percent of cloud cost stays unallocated. This is the irreducible overhead of shared infrastructure: NAT gateways, shared VPCs, cross-account networking, support tier costs, and data transfer charges that carry no resource ID. None of these tag to a workload because none of them belong to one.
The allocation gap is not a tagging problem. It is a cost model problem. The fix is a shared-services budget line that is owned, forecasted, and reviewed quarterly, separate from workload cost. Teams stop treating the unallocated bucket as someone else's problem when the budget becomes explicit.
Getting from 60 percent tag coverage to 95 percent is a sprint. Getting from 95 percent to accurate financial reporting is a cost model design. Both are required. Teams that skip the second step have precise tags on 95 percent of resources and still cannot answer what team A spent last month. The tags are correct. The model was never built.