Subscribe
Backend Infrastructure

Idempotency Key Guide for APIs, Webhooks, and Retries

a bunch of blue wires connected to each other - Photo by Scott Rodgerson on Unsplash

What Is an Idempotency Key? How to Implement Idempotent APIs and Webhooks

You ship a payment flow. The provider sends a webhook. Your app does the work, but the network drops before the 200 gets back. The provider retries. Your handler runs again. Now the customer gets two credits, two emails, or two paid seats. That is the bug idempotency is supposed to kill.

An idempotency key is not abstract HTTP trivia. It’s a concrete way to say, “If this exact operation shows up again, treat it as the same request, not a new one.” If you build APIs, webhooks, or payment integrations, you need that guarantee early, not after the first duplicate incident.

Key Takeaways

  • An idempotency key is a stable identifier for one logical operation, not one transport attempt.
  • The pattern needs three pieces: a stable key, a dedup store with TTL, and an atomic claim step before side effects.
  • Verify webhook signatures first, then claim the key transactionally, then run the handler, then return 200.
  • Stripe’s Idempotency-Key docs are the clearest industry reference point.

If you want the broader request-hardening context around signature checks and origin trust, start with API security layers including request authenticity and backend hardening.


What Is an Idempotency Key, Really?

Stripe’s API docs define the pattern cleanly: clients send a unique key with a POST, Stripe stores the first result for that key, and later retries return the same outcome instead of doing the work again (Stripe, 2026). That is the practical definition you should keep in your head.

An idempotency key is a stable identifier for one logical mutation. The request may arrive one time or five times. The server should still apply the state change once.

That’s why “same payload” is not always enough. Two identical-looking POST /charges requests might be accidental retries, or they might be two real purchases. The key tells the server which interpretation is correct.

For client-driven APIs, the key usually comes from the caller. For provider-driven webhooks, the key usually comes from the event ID or a deterministic identifier derived from the event payload. Either way, the job is the same: collapse retried delivery attempts into one side effect.


Why Does Idempotency Matter in Production?

Lemon Squeezy retries failed webhook deliveries up to three more times with exponential backoff, using intervals such as 5 seconds, 25 seconds, and 125 seconds, until your endpoint returns 200 (Lemon Squeezy, 2026). That behavior is correct. Your system has to be correct too.

Here are the failures idempotency prevents:

What catches teams is that these bugs are intermittent. You won’t see them in a happy-path local demo. You’ll see them when latency spikes, a load balancer retries, or the provider redelivers events under pressure. That’s why the pattern matters.

Most duplicate-processing bugs are not caused by “bad providers.” They come from an application that treats delivery attempts as business events. Those are different things.

For the database angle behind atomic claim logic, see ACID transactions and atomic check-then-write behavior.


Where Do You Actually Need Idempotency?

RFC 7231 defines GET, HEAD, OPTIONS, and TRACE as safe methods, and it defines safe methods plus PUT and DELETE as idempotent methods by HTTP semantics (RFC 7231). POST is not idempotent by default. That’s where most real bugs live.

You usually need an explicit idempotency pattern in these places:

You usually do not need a custom idempotency key for ordinary GET requests. They are already idempotent by HTTP semantics. But “idempotent” and “safe” are not the same thing. DELETE is idempotent because deleting the same resource twice leaves the server in the same end state, even though it is definitely not read-only.

That distinction matters in API design. When developers say “make this endpoint idempotent,” they usually mean “make retries safe for state changes,” not “turn it into a safe method.”


What Is the Core Pattern?

Firestore transactions run all reads before writes and automatically retry if a concurrently modified document invalidates the transaction’s read set (Google Cloud, 2026). That is exactly the kind of atomic guard you want around duplicate suppression.

The pattern has three parts:

  1. A stable unique key
    For client APIs, that might be a caller-generated UUID. For webhooks, it should be a provider event ID if one exists. If the provider doesn’t give you one, derive a deterministic key from fields that identify the logical event, not the raw transport attempt.

  2. A dedup store with TTL
    Store seen keys in a durable place such as Redis, Postgres, DynamoDB, or Firestore. Add a TTL so the store doesn’t grow forever. Stripe notes that idempotency keys can be pruned after they are at least 24 hours old (Stripe, 2026). Your own window should match the provider’s retry behavior plus your operational replay needs.

  3. A transactional claim wrapper
    Do an atomic “if key does not exist, create it and continue” step before the handler performs side effects. If the key already exists, you treat the request as a retry and short-circuit safely.

If you skip any one of those pieces, the pattern falls apart. A key without durable storage is memory. Storage without TTL becomes a forever-growing audit table. A dedup check without a transaction is a race condition waiting to happen.


What Does the Webhook Flow Look Like?

Lemon Squeezy signs webhook payloads with an HMAC-SHA256 digest in the X-Signature header, calculated from the raw body and your signing secret (Lemon Squeezy, 2026). That means signature verification is not optional ceremony. It is the first gate.

The sequence should look like this:

sequenceDiagram
    participant LS as Lemon Squeezy
    participant API as Your webhook endpoint
    participant FS as Firestore
    participant H as Business handler

    LS->>API: POST webhook + raw body + X-Signature
    API->>API: Verify HMAC signature
    API->>API: Build stable webhook_id
    API->>FS: Transactional claim(webhook_id)
    alt Already claimed
        FS-->>API: duplicate
        API-->>LS: 200 OK
    else New claim
        FS-->>API: claimed
        API->>H: Process event once
        H-->>API: success
        API->>FS: Mark processed
        API-->>LS: 200 OK
    end

The ordering matters more than the syntax:

If you need a bigger payments context around retry-heavy flows, see payment gateway tradeoffs and integration concerns for developers.


How Do You Implement It With Python and Firestore?

Firestore’s Python client exposes a @firestore.transactional decorator, and the docs call out an important detail: transaction functions may run more than once when there is contention (Google Cloud Python docs, 2026). That means the transaction function should claim metadata only. Do not put external side effects inside it.

Here’s a production-friendly pattern for Lemon Squeezy webhooks:

import hashlib
import hmac
import json
from datetime import datetime, timedelta, timezone

from google.cloud import firestore
from flask import Request

db = firestore.Client()
WEBHOOK_SECRET = "replace-me"
TTL_DAYS = 7

def verify_lemon_squeezy_signature(raw_body: bytes, signature: str, secret: str) -> None:
    expected = hmac.new(secret.encode("utf-8"), raw_body, hashlib.sha256).hexdigest()
    if not hmac.compare_digest(expected, signature or ""):
        raise ValueError("invalid Lemon Squeezy signature")

def build_webhook_id(payload: dict) -> str:
    """
    Prefer a provider event id if one exists.
    Lemon Squeezy's basic webhook docs emphasize event name + resource payload,
    so this example derives a stable key from fields that identify the mutation.
    """
    event_name = payload["meta"]["event_name"]
    resource_id = payload["data"]["id"]
    updated_at = payload["data"]["attributes"].get("updated_at", "")
    return f"ls:{event_name}:{resource_id}:{updated_at}"

@firestore.transactional
def claim_webhook(
    transaction: firestore.Transaction,
    claim_ref,
    *,
    webhook_id: str,
    event_name: str,
    received_at: datetime,
    expires_at: datetime,
) -> bool:
    snapshot = claim_ref.get(transaction=transaction)
    if snapshot.exists:
        return False

    transaction.set(
        claim_ref,
        {
            "webhook_id": webhook_id,
            "event_name": event_name,
            "status": "processing",
            "received_at": received_at,
            "expires_at": expires_at,  # Firestore native TTL field
        },
    )
    return True

def mark_processed(claim_ref) -> None:
    claim_ref.update(
        {
            "status": "processed",
            "processed_at": datetime.now(timezone.utc),
        }
    )

def rollback_claim(claim_ref) -> None:
    # Simple recovery path: let provider retries try again on handler failure.
    # If your handler triggers irreversible external side effects, replace this
    # with a state machine or outbox pattern instead of deleting the claim.
    claim_ref.delete()

def handle_lemonsqueezy_event(payload: dict) -> None:
    event_name = payload["meta"]["event_name"]

    if event_name == "order_created":
        # Put your real business logic here:
        # - provision account access
        # - write billing records
        # - enqueue downstream jobs
        pass

def lemonsqueezy_webhook(request: Request):
    raw_body = request.get_data()
    signature = request.headers.get("X-Signature", "")

    # 1. Verify authenticity first.
    verify_lemon_squeezy_signature(raw_body, signature, WEBHOOK_SECRET)

    payload = json.loads(raw_body)
    event_name = payload["meta"]["event_name"]
    webhook_id = build_webhook_id(payload)

    # 2. Transactionally claim the event.
    now = datetime.now(timezone.utc)
    expires_at = now + timedelta(days=TTL_DAYS)
    claim_ref = db.collection("webhook_claims").document(webhook_id)

    claimed = claim_webhook(
        db.transaction(),
        claim_ref,
        webhook_id=webhook_id,
        event_name=event_name,
        received_at=now,
        expires_at=expires_at,
    )

    if not claimed:
        return {"status": "duplicate_ignored"}, 200

    # 3. Process once.
    try:
        handle_lemonsqueezy_event(payload)
    except Exception:
        rollback_claim(claim_ref)
        raise

    # 4. Mark success and acknowledge delivery.
    mark_processed(claim_ref)
    return {"status": "ok"}, 200

What should you notice here?

In a real webhook system, the hardest part is rarely “how do I hash a key?” It’s deciding what counts as the same business event when the provider doesn’t hand you a perfect event ID.


What Mistakes Break the Pattern?

Stripe saves the first result associated with an idempotency key and replays that result for later retries of the same request (Stripe, 2026). That only works because the idempotency decision happens before the mutation, not after it.

The most common failures are predictable:

If you only remember one rule, remember this: idempotency is an ordering problem first and a storage problem second.


Why Is Stripe the Canonical Reference?

Stripe’s docs are still the best public explanation of the pattern because they make the contract explicit: the client provides an Idempotency-Key, Stripe saves the first result, and identical retries get the same result back (Stripe, 2026). That is the industry-standard mental model.

Your implementation does not need to copy Stripe feature-for-feature. You probably won’t store full response bodies for every internal webhook. But Stripe gets the core idea exactly right:

For external provider webhooks, you often cannot demand a header like Stripe’s. So you adapt the same idea to the provider’s event model instead.


What Should You Think About in Production?

Google’s Firestore TTL docs note that expired documents are usually deleted within 24 hours after expiration, not at the exact expiration timestamp (Google Cloud, 2026). That small implementation detail affects real operating decisions.

Here is the production checklist I care about most:

tradeoffs when choosing backend infrastructure and managed data stores


Frequently Asked Questions

What is idempotency?

Idempotency means repeating the same request should leave the server in the same end state as running it once. RFC 7231 defines idempotent HTTP methods as methods whose intended effect is unchanged by multiple identical requests, even if response details differ (RFC 7231, 2026).

Are GET requests idempotent?

Yes. GET is both safe and idempotent under RFC 7231 because it is defined as read-only from the client’s perspective (RFC 7231, 2026). You normally do not need a custom idempotency key for ordinary GET requests.

What is the difference between idempotent and safe methods?

Safe methods are read-only by intent. Idempotent methods can change state, but doing them multiple times has the same intended effect as doing them once. DELETE is the classic example: it changes state, so it is not safe, but deleting the same resource twice is still idempotent by HTTP semantics (RFC 7231, 2026).


The Practical Rule to Keep

If a request or event can be retried, it must have a stable identity. If it changes state, that identity must be claimed atomically before the side effect runs. Everything else is implementation detail.

That is the idempotency pattern in one sentence.

When you implement it, keep the order brutally simple: verify authenticity, derive the stable key, claim it transactionally, run the handler once, then acknowledge success. If you do that consistently, duplicate charges, duplicate provisioning, and webhook replay bugs get much harder to ship.

API security layers that complement webhook signature verification
database transaction fundamentals behind atomic claim logic
payment integration tradeoffs for developers shipping billing systems

Written by Nishil Bhave

Builder, maker, and tech writer at MakeToCreate.

Never miss a post

Get the latest tech insights delivered to your inbox. No spam, unsubscribe anytime.

Related Posts

Leave a comment