Service Level Agreement (SLA) Checklist

Steps an engineering or platform lead runs to draft, negotiate, and sign off on a customer-facing SLA — covering service scope, SLOs, error budgets, security controls, service credits, and exit terms.

8 sections 24 steps Collects data

Service Definition

Inventory the services in scope
- List the named services, APIs, and customer-facing endpoints covered. Reference the Backstage service catalog or internal service registry. Out-of-scope items (sandbox, beta endpoints, third-party SaaS pass-throughs) should be called out explicitly — ambiguity here is the most common SLA dispute trigger.
Classify the service tier and customer segment
- Tier drives uptime targets, support response, and credit structure. Enterprise customers typically get higher SLOs (99.95%+) and named TAMs; Standard customers get the public SLA with shared on-call.
Collects list
Document provider and customer responsibilities
- RACI-style split: who patches the runtime, who owns the customer's IAM config, who is responsible for client-side SDK upgrades. Customer-caused incidents (misconfigured webhooks, expired API tokens) should not consume the provider's error budget.

Performance Monitoring and Reporting

Define the SLIs and measurement windows
- SLIs typically pull from the four golden signals — latency, traffic, errors, saturation. Specify the metric source (Datadog, Prometheus, New Relic), the aggregation window (rolling 30-day vs. calendar month), and how planned maintenance is excluded. Vague language like "reasonable uptime" is unenforceable.
Configure the customer status page
- Wire Statuspage or Better Stack to the relevant Datadog monitors so component status reflects reality without manual updates. Define posting cadence for incidents — initial post within 15 minutes of SEV1 detection, updates every 30 minutes until resolution.
Set the monthly performance reporting cadence
- Specify the report contents — SLO attainment, incident summary with root causes, change-management activity, any service credits owed. Delivery channel (PDF to designated contact, customer portal, shared dashboard) and delivery deadline (e.g., by the 10th of the following month).

Service Management and Escalation

Name the service owner and on-call rotation
- Identify the named service owner (typically the engineering manager) and the PagerDuty rotation backing first response. Avoid single points of failure — every primary needs a documented secondary so vacations don't break the response chain.
Define the SEV1, SEV2, and SEV3 escalation matrix
- Concrete examples per severity, not just "high impact." SEV1 = customer-facing outage or data loss, page within 5 minutes, IC assigned within 15. SEV2 = degraded performance, business-hours response, 1-hour acknowledgement. SEV3 = single-customer issues routed via support tier-2.
Document the change-control process
- Specify customer-notification windows for breaking changes (typically 90 days for API deprecations), how planned maintenance is announced, and the freeze windows during which the customer can request no deploys (peak retail, fiscal close). Reference SOC 2 CC8.1 if the customer is audit-driven.

Security and Compliance

Identify regulated data categories handled
- Determines which compliance addenda the SLA needs. Even "None" should be confirmed in writing — customers sometimes assume a BAA is in place when one was never executed.
Collects list
Document encryption, access, and audit controls
- Encryption at rest (KMS, customer-managed keys if Enterprise tier) and in transit (TLS 1.2+). Access controls for production data — break-glass only, with audit trail to CloudTrail or equivalent. Retention period for audit logs (typically 1 year minimum, 7 for SOX-relevant).
Attach the regulatory addendum and breach notification terms
- Attach the BAA (HIPAA), DPA with SCCs (GDPR), or PCI responsibility matrix as appropriate. Breach notification timelines vary: GDPR is 72 hours to the supervisory authority, HIPAA is 60 days to affected individuals, customer contracts often require notice within 24 hours of confirmation.

Pricing and Service Credits

Define the pricing tiers and overage rates
- Document the metered units (API calls, seats, GB stored), the included quota at each tier, and overage pricing. Specify how spikes are handled — hard cutoff with 429s, soft cap with overage billing, or burst allowance with monthly true-up.
Specify service credit calculation for SLO breaches
- Tiered credits are standard: e.g., 10% credit for 99.0–99.9% monthly uptime, 25% for 95.0–99.0%, 50% below 95%. Specify the claim mechanism (customer must request within 30 days), the cap (typically one month's fees), and that credits are sole and exclusive remedy.
Collects number Collects paragraph
Set invoicing and payment terms
- Net-30 is the default; Net-60 for enterprise procurement is common. Specify accepted payment methods (ACH, wire, credit card with surcharge), late fee terms, and the price-revision notice window (typically 60 days before renewal).

Service Level Objectives and Error Budgets

Set the uptime SLO target
- Pick the SLO honestly based on historical data, not aspirationally. 99.9% allows ~43 min downtime/month; 99.95% allows ~22 min; 99.99% allows ~4 min and requires multi-region active-active. Don't promise four nines on a single-region deployment.
Collects list
Set latency targets for p95 and p99
- Specify per-endpoint or per-endpoint-class targets — read APIs typically 200ms p95 / 500ms p99, write APIs 500ms / 1000ms. Include the measurement boundary (server-side, excluding network from client) so disputes don't hinge on client-side variance.
Define the error budget policy
- What happens when budget is exhausted: feature freeze until burn rate recovers, mandatory reliability work in next sprint, exec escalation. Configure burn-rate alerts (Datadog SLO monitors, Sloth) at 2% in 1hr and 5% in 6hr — the standard fast-burn / slow-burn pair.

Dispute Resolution

Document the dispute escalation workflow
- Tier 1: account manager + customer contact (5 business days). Tier 2: VP-level both sides (10 business days). Tier 3: formal mediation. Most disputes resolve at Tier 1 if the path is documented; ambiguity is what pushes things to legal.
Specify mediation and arbitration venue
- Governing law and venue (e.g., Delaware, AAA arbitration in San Francisco). For international customers, specify ICC arbitration to avoid jurisdictional fights. Have legal review — boilerplate from a US contract may be unenforceable in EU.
Set response and resolution timelines
- Acknowledgement within 5 business days, initial response within 15, resolution targeted within 60. Without timelines, disputes drift indefinitely while goodwill erodes.

Termination and Exit

Outline termination conditions and notice period
- Termination for convenience (typically 60–90 days notice), termination for cause (material breach uncured after 30 days), and termination for repeated SLO failure (e.g., three consecutive months below target). Specify whether termination triggers a refund of prepaid fees.
Define the data export format and migration support
- Specify the export format (JSON dump, Parquet, CSV per schema), the delivery mechanism (signed S3 URL, customer-supplied bucket), and the support hours included for migration assistance. Customers without an exit plan are locked in by default — call this out as a feature.
Confirm post-termination data destruction obligations
- Destruction timeline (typically 30–90 days after termination) covering primary stores, replicas, backups, and any analytics warehouse copies. Provide a written certificate of destruction signed by the security lead. GDPR Article 28 requires this explicitly for processors; align language to satisfy the customer's DPA.
Collects number Collects signature

Use this template

Copy it to your account, customize the steps, and run it with your team in minutes.

Use this workflow Start free trial

Sections 8

Steps 24

Category Software Development

Price Free to start

Need a different process

Browse hundreds of free templates across every team and industry.

Back to template library

Related templates

More workflows your team can run.

Software Development

Run Service Level Agreement (SLA) Checklist with your team

Customize the steps, assign roles, set a schedule, and keep a complete record for every run.