Customer Support Ticket Workflow

Initial Triage

    Send a first-touch reply that confirms receipt and sets expectations. First-response SLA is typically 1 hour for paid plans, 4 hours for trial/free. Don't promise a fix ETA yet — that comes after reproduction.

    Confirm the requester is on the customer's authorized contact list before sharing account details. For HIPAA/SOC 2 customers, never confirm or deny account existence to unverified senders. Note the plan tier — it drives SLA and escalation paths.

    SEV1 = production down or data loss for one or more customers. SEV2 = major feature broken, no workaround. SEV3 = minor feature broken or workaround exists. SEV4 = cosmetic or how-to. SEV1 immediately pages on-call via PagerDuty; do not wait on reproduction.

    Trigger the PagerDuty incident on the customer-impacting service rotation, open a #incident-<ticket> Slack channel, and post the status-page incident as 'Investigating'. Do not skip the status page — customers will check before opening their own tickets.

Reproduction and Diagnosis

    Ask for browser + version, OS, account/workspace ID, the exact URL or API endpoint, the timestamp in the customer's timezone, and what the customer expected vs. what happened. 'It's broken' tickets without a timestamp eat hours in log search.

    For UI bugs, ask for a screenshot or Loom. For network errors, ask for a HAR export from DevTools. For API issues, ask for the request ID from the response headers — it links to your Datadog/Sentry trace in one click. Strip auth tokens before attaching.

    Filter by the customer's workspace ID and the timestamp window from the report. Check Sentry for unhandled exceptions, Datadog APM for the trace, and the application log for the request ID. If the error is already grouped in Sentry with prior tickets, link them.

    Try to reproduce in staging first. If state-dependent, use the customer-impersonation tool (audit-logged). Avoid logging into the customer's account directly; impersonation leaves a clean SOC 2 audit trail. Capture the result — reproducible, intermittent, or cannot reproduce.

Tier 1 Troubleshooting

    Search the support runbook in Notion/Confluence for the symptom keyword. Common hits: cache invalidation after a plan change, SSO config drift after IdP rotation, webhook retry exhaustion. If the runbook resolves it, follow the documented script.

    If a workaround exists (clear browser cache, re-issue API key, toggle a feature flag), share the exact steps with screenshots. Don't say 'try clearing your cache' as a guess — only when the runbook documents it as a known fix for this symptom.

    Check workspace settings, role/RBAC assignments, integration credentials, and webhook destinations. Misconfigured SAML metadata and expired OAuth refresh tokens account for a large share of 'broken login' tickets.

Engineering Escalation

    Include: customer workspace ID, request ID, Sentry link, timestamp, repro steps, expected vs. actual, plan tier, and a link back to the Zendesk ticket. Tag the owning team via CODEOWNERS conventions. A bug filed without a request ID gets bounced back.

    Use PagerDuty for SEV1/SEV2; Slack-mention the on-call for SEV3. Don't DM individual engineers — it bypasses the rotation and breaks the audit trail. Include the Jira link and a one-sentence customer impact summary in the page.

    Do not mark the ticket as 'pending engineering' and walk away. Watch the Jira ticket until an engineer comments or moves it to In Progress; if no acknowledgement within the SLA, re-page or escalate to the engineering manager.

Customer Communication and Follow-Up

    Translate engineering's response into customer-facing language. 'We've identified the cause in our payment-webhook retry logic and a fix is in code review' beats 'engineering is investigating'. Avoid committing to an exact deploy time unless the release is already tagged.

    Investigating → Identified → Monitoring → Resolved. Customers compare your status page cadence to your competitors'; gaps longer than 30 minutes during a SEV1 generate inbound 'is the status page down too?' tickets.

    Reply when the deploy reaches production (not when the PR merges — gradual rollouts mean merged ≠ shipped to this customer). Reference the release tag and the time the customer can expect to see the fix.

Resolution and Closure

    Ask the customer to retry the original repro path and confirm. Don't close on the engineer's say-so alone; canary deploys and CDN caching mean the fix may not have reached this customer's region yet.

    Loop in the account's CSM and schedule a 15-minute call within one business day. For enterprise tier, a written incident summary (impact, root cause, remediation) is part of the contract; draft it before the call.

    If the troubleshooting path was novel, add it to the Notion runbook with the symptom keywords the next agent will search for. Stale runbooks are why the same issue takes 4 hours the second time too.

    Tag the root-cause category (auth, billing, integration, performance, UI) so the monthly support trends report is meaningful. Link the Jira and Sentry IDs in the closing note for future ticket-search hits.