AWS outage credits: SLA vs overspend and when to claim them
How to claim AWS credits after an outage
SLA credits, overspend credits, and knowing when it’s not worth the effort
When an AWS outage happens, there is no dramatic pause to assess the situation. There is no discussion about credits. There is only one priority: get things back online.
Phones light up. Dashboards turn red. Slack fills with short messages and half-finished sentences. Teams move fast, not because it is elegant, but because customers are waiting.
Credits are an afterthought. And that is exactly how it should be.
Only later, once services are stable and everyone has caught their breath, does a more pragmatic question surface. What did this outage actually cost us, and is there any way to recover part of it?
At Unicorne, we have supported clients through outages to know that the answer is rarely straightforward. Sometimes credits are worth the effort. Sometimes they are not. It all depends on how AWS measures service unavailability, the actual cost impact, and the time required to build a solid case.
Manage the outage and then understand what AWS actually measures
Once you have restored service, validated data and confirmed downstream systems are working normally again, it is time to look at your AWS Service Level Agreement.
AWS Service Level Agreements are service specific and metric driven. They do not measure how long your customers were impacted. They measure how long AWS considers a given service unavailable based on its own internal metrics.
That distinction matters more than most people expect.
During the October AWS outage, several client applications experienced disruptions that stretched several hours. However, based on AWS metrics, the services we observed, namely DynamoDB and Lambda, were officially considered unavailable for approximately two and a half hours, between 6:50 and 9:20 UTC, even though the overall outage lasted close to 14 hours.
When the monthly calculation was done, availability landed at roughly 99.66 percent. Below the contractual SLA threshold, yes. But far less dramatic than the lived experience of engineering and operations teams.
This gap between operational reality and contractual measurement is where expectations need to be reset.
SLA credits vs overspend credits
Once the numbers are clear, the next step is understanding what type of credits may apply.
SLA credits are tied to specific services failing to meet contractual availability targets, such as AWS Lambda or Amazon DynamoDB. Each service has its own SLA definition and its own calculation logic. Credits only apply to the service that AWS officially recognizes as unavailable.
If Service A is down and Service B fails because it depends on Service A, only Service A is eligible for SLA credits. This limitation surprises many teams, especially when the business impact cascades far beyond the original failure.
Overspend credits operate very differently. They are not contractual and they are never automatic. They exist to address abnormal costs generated by an outage, such as retry storms, emergency scaling, runaway logging, or inflated monitoring usage.
We have seen Lambda retries multiply rapidly, infrastructure scale unexpectedly under error conditions, and CloudWatch logs grow at a pace no one planned for. In these cases, the outage does not just disrupt service. It quietly inflates the bill.
Requesting overspend credits means proving that inflation clearly and convincingly.
The reality check: Is it worth it?
This is the most important step, and the one most often skipped.
Not every outage deserves a credit request. Sometimes the potential credit is modest. Sometimes the documentation effort is significant. Sometimes both are true.
We have worked through situations where the savings were real, but small enough that the time spent gathering metrics, building reports, and engaging AWS support simply did not make sense.
Credits are not free money. They are a tradeoff. The question is not whether credits exist. The question is whether pursuing them is the best use of time and focus after an already stressful incident.
How we handled it in practice
Once the decision is made to move forward, timing becomes critical. While AWS typically allows up to two billing periods to submit a request, acting quickly makes the process much smoother. Supporting evidence is required, and the longer you wait, the harder it becomes to dig through logs and identify the relevant information. Staying organized is key to submitting an effective credit request.
The process starts by reviewing AWS requirements for credit requests. From there, we gather application-level and infrastructure-level metrics, clearly separating what we already collect from what must be pulled directly from AWS.
Rather than reinventing the wheel, we reuse monitoring scripts we already trust to build a focused, readable report. The goal is not to overwhelm support with data. It is to tell a clear story supported by the right metrics.
When submitting the request, clarity and concision matter. During large-scale outages, response times from AWS support can be longer than usual. In our experience, grouping support requests together is more effective, as it streamlines the process and reduces the number of stakeholders involved.
How we analyzed overspend credits
Overspend analysis relies heavily on AWS Cost Explorer. We compare costs before, during, and after the outage. When possible, we also establish a baseline using the previous month to account for normal variability.
This approach makes it possible to isolate abnormal costs directly linked to the incident and support the request with solid data and factual evidence, demonstrating that the cost increase did not originate from our operations.
Final takeaway
AWS credits are neither guaranteed nor simple. The biggest mistake is assuming that application downtime automatically leads to meaningful compensation.
A rigorous post-incident analysis, a realistic assessment of the effort versus benefit, and a disciplined approach to metrics make all the difference. Sometimes it is worth pursuing. Sometimes it is not.
In one real-world case, for a client, we were able to recover only $50 in overspend credits, while the SLA-related request resulted in between $500 and $1,000. By comparison, the time required to extract and validate the data needed to justify the overspend request far exceeded the value recovered.
Knowing how to make that distinction is what turns an outage into a lasting learning opportunity, rather than a costly and low-return exercise.