At-least-once delivery
We commit to at-least-once delivery. That means:- If your endpoint responds 2xx, we mark the delivery successful and never re-send that event ID.
- If your endpoint times out, returns non-2xx, or we hit a transient infrastructure issue mid-delivery, we retry — and you may end up seeing the same event ID twice.
Idempotency on eventId
Every event carries a unique eventId in the envelope:
Verify the signature
HMAC-SHA256 check against the raw body. Reject any request that fails verification.
Check whether you've seen this eventId before
Look it up in your dedupe store (Postgres table, Redis SET, DynamoDB item — anything atomic with a unique constraint on the ID).
- Seen → return 200 immediately, skip processing.
- New → continue.
Insert the eventId BEFORE processing
Do an atomic insert-if-not-exists. If the insert fails because of a unique-constraint violation, treat as a duplicate (another worker is processing it, or it’s a retry that’s racing with the first attempt) and return 200.
Minimal Postgres example
How long to retain dedupe entries? A month is plenty. Our retry window is well under 24 hours (see backoff below), so anything older than that won’t recur. Periodically purge old rows so the table doesn’t grow unbounded.
Retries and backoff
If your endpoint fails (non-2xx response, timeout > 30s, connection refused), we retry on an exponential schedule:| Attempt | Approximate delay since previous attempt |
|---|---|
| #2 | 1 minute |
| #3 | 5 minutes |
| #4 | 30 minutes |
| #5 | 2 hours |
| #6 | 8 hours |
| #7 | 24 hours |
| then | Subscription marked unhealthy; deliveries paused; admin email sent |
What counts as success vs failure
| Response | Meaning |
|---|---|
| 2xx (200–299) | Success. We won’t send this event again. |
| 4xx (400–499) | Permanent failure. We don’t retry — your endpoint indicated the request itself is unprocessable. Logs only. |
| 5xx (500–599) | Transient failure. Retry on the schedule above. |
| Timeout (>30s) | Transient failure. Retry on the schedule above. |
| Connection refused / DNS failure | Transient failure. Retry. |
| TLS handshake failure | Permanent failure. We don’t retry — typically misconfigured certificates. We email the subscription owner. |
Ordering
We don’t guarantee ordering across events. If a contact’s status flips Lead → Member → VIP within 200ms, you might receive the twocontact.status_changed events in either order. Plan for this:
- Sort by
eventTimestampwhen order matters - Coalesce at your end if you only care about the latest state (look up the contact’s current state at receipt time via your own data, not the event)
- Don’t make decisions from a single event’s
previousX → newXchain if multiple events of that type can fire close together — the previous values you see may be stale
Why these rules exist
Each one prevents a specific support ticket we’ve seen with other webhook integrations:| Rule | Ticket it prevents |
|---|---|
| At-least-once + dedupe by eventId | ”Our customer was charged twice / had two contacts created” |
| 30-second timeout, return 2xx fast | ”All our retries are firing because our endpoint is slow” |
| 4xx vs 5xx distinction | ”Our endpoint returned 400 for a transient DB error and we never got the event again” |
| Exponential backoff | ”We had a 5-minute outage and our subscription got disabled” |
| No ordering guarantee | ”Events arrived out of order and corrupted our state” |
| HMAC signature verification | ”How do we know it’s really FitProTracker?” |