Delayed sending of emails - Lettermint Status

Write-up

Delayed sending of emails

Summary

On January 22, 2026, Lettermint experienced a service disruption resulting in a total of 48 minutes of degraded email delivery performance. The incident was caused by a regression in a merged code change. While the issue was caught by our error tracking immediately, a delay in escalation postponed the resolution. All queued emails were successfully processed by 12:49 PM CET.

Timeline Overview (CET)

10:16 AM: Code change merged to main.
12:01 PM: Change deployed to production (delayed by unrelated GitHub Actions pipeline issues).
12:05 PM: First Sentry alert received. Initial assessment incorrectly identified it as a data-related edge case rather than a regression.
12:28 PM: Internal incident declared after identifying the systemic nature of the errors.
12:31 PM: Public incident declared on the status page.
12:34 PM: Commit reverted and rollback deployment initiated.
12:41 PM: Rollback completed; service restored.
12:49 PM: Queued emails successfully retried and delivered.

Root Cause Analysis

The root cause was a type mismatch introduced in a refactor of our recipient querying logic. Although the change involved a standard SELECT query and passed existing unit tests, the query result changed from an Array to an Object when parsed.

When this data was passed to our "suppressed recipients" logic, the parser failed to handle the object, causing the mail-sending process to crash. The feedback loop was further complicated by a one-off failure in our GitHub Actions pipeline, which meant the code went live nearly two hours after the merge, making the initial Sentry alerts appear disconnected from the latest changes.

Resolution and Recovery

The incident was resolved by reverting the offending commit to the last stable version. Once the rollback was deployed at 12:41 PM, Lettermint’s automatic retry logic began processing the backlog. By 12:49 PM, all queued messages had reached their recipients.

Corrective Actions & Improvements

Improved E2E Testing: We are updating our integration and end-to-end tests to specifically validate the return types of our query layer to prevent similar Array/Object mismatches.
Automated Alert Escalation: We have reconfigured our error tracking to automatically escalate high-frequency errors in the mail-sending pipeline to our on-call rotation, ensuring severe issues are paged immediately rather than relying on manual triage.
Pipeline Resilience: While the GitHub Actions issue was a one-off event unrelated to configuration or code changes.

We sincerely apologize for the disruption this caused to your operations. Reliability is our top priority, and we are using this incident to strengthen both our automated testing and our response protocols.