Lettermint

Report a problemSubscribe to updates
Powered by
Privacy policy

·

Terms of service
Write-up
Email submissions not being processed
Full outage
View the incident

SMTP Relay Processing Incident on June 17, 2026

On June 17, 2026, Lettermint experienced an incident affecting SMTP message processing. Some messages submitted through our outbound SMTP relay were accepted at the SMTP protocol level but were not handed off to our backend processing pipeline. Our inbound SMTP relay was affected by the same underlying issue at lower volume.

When an SMTP server returns success, senders reasonably assume the message has entered the delivery pipeline. During this incident, that contract was not upheld for a subset of SMTP traffic. We are deeply sorry for the disruption this caused.

Impact

Outbound SMTP submissions were the primary affected path. Some messages submitted during the incident window may not have been processed or delivered.

If a message was submitted during the affected period but does not appear in the Lettermint UI, we do not have that message in our system. In those cases, the message needs to be resent from the sender side.

We are identifying impacted customers and will contact them directly with the number of affected messages we observed for their account.

Inbound SMTP traffic saw similar errors, but with lower observed volume.

What Happened

A routine dependency update upgraded the SMTP framework used by our relay services. That update included a breaking change in a minor release: envelope sender and recipient address objects changed from method-style access to property-style access, and a compatibility wrapper was removed.

Our SMTP relay plugins still used the previous API. As a result, the relay services remained healthy from a Kubernetes perspective, but message-processing hooks failed while reading envelope addresses.

The most important failure mode was in the outbound relay: after the processing plugin failed, the SMTP server could still continue to the discard queue and return a successful SMTP response. This meant some messages were accepted by SMTP but never submitted downstream.

Root Cause

The root cause was an incompatibility between our SMTP relay plugin code and a breaking dependency change shipped in a minor version release.

The contributing factors were:

  • Our tests and health checks verified service availability, but did not continuously verify the full SMTP acceptance contract: accepted messages must be durably handed off to backend processing.

  • Our monitoring focused on pod health and delivery pipeline health, but did not alert on a mismatch between SMTP acceptance and backend submission.

  • The relay could fail open in a critical processing path, returning SMTP success after backend submission had not completed.

Resolution

We identified the issue through production logs and metrics, patched both outbound and inbound SMTP services to support the old and new address object formats, and verified the fix.

We also added immediate alerting for this class of failure. New Grafana/Loki alerts detect:

  • Lettermint SMTP plugin crashes

  • SMTP messages accepted but not submitted downstream

  • Backend SMTP submission failures

These alerts are routed through our incident response path.

What We're Improving

We are improving alerting around the SMTP relay boundary. We now alert when SMTP acceptance and backend submission diverge. This catches the specific failure mode where infrastructure appears healthy but message processing is broken.

We are changing relay failure behaviour. Critical backend submission failures should return a retryable SMTP error, not a success response. If we cannot safely accept responsibility for processing a message, the sender should be told to retry.

We are adding SMTP-level regression tests. These tests will exercise real SMTP sessions and verify that accepted messages are submitted downstream, including envelope sender and recipient handling.

We are adding synthetic SMTP canaries, similar to what we already run for our API path. These canaries will continuously submit test messages through the same public SMTP paths customers use and verify they reach the backend pipeline.

We are tightening our dependency update process for SMTP-critical components. Minor version updates for SMTP framework dependencies will be treated as compatibility-sensitive changes and require explicit changelog review and relay-path testing.

We are also continuing work on our new in-house SMTP relay. It is designed to give us tighter control over this critical path and improve both stability and performance. This incident reinforces the priority of that work.

Closing

This incident exposed a gap between service health and message-processing correctness. The pods were running, but the SMTP acceptance contract was broken.

We are sorry to customers who had to investigate missing messages or resend traffic. We know that trust in an email platform is built on predictable correctness, especially at the moment a message is accepted. We fell short of that here, and we are making concrete changes so this class of failure is detected faster and handled more safely.