Skip to main content

Edge Idempotency Roadmap

End-to-end view of how safe retry becomes a first-class platform primitive — wired at the edge in KrakenD, delegating to Restate's native idempotency mechanism, scoped by (tenant, project, subject) so cached responses can never cross those boundaries. This page consolidates the Edge Idempotency Key PRD (v1.0) and its dependencies into a single reference. The design is complete; the 8-phase rollout has not started.

Source PRDs

This page is derived from the Edge Idempotency Key PRD and its closely related dependencies:

Primary (docs/prd/):

  • Edge Idempotency Key — the v1.0 design, three-case framing, namespace rewrite, retention classes, per-endpoint policy

Dependencies (docs/prd/multi-tenancy/):

  • 02 · Projects — resolves project_id from Unkey key metadata; Projects PRD ships before the namespace rewrite
  • 05 · Authorization — defines the typed-subject format (user:*, service:*, tenant_member:*, client:*) used in the namespace
  • 06 · API Keys — Unkey externalId + meta.project_id surface the tenant/project dimensions

Related:

Architectural Direction — Restate-Native, Edge-Enforced

The whole design delegates to primitives already shipped by Restate. No application-level dedup table, no Redis dedup middleware, no custom caching layer:

  • Dedup mechanism — Restate's partition processor atomically records each Idempotency-Key at ingress and caches the committed response (success and terminal error) for a configurable retention window. Duplicate calls with the same key return the cached response without re-invoking the handler.
  • Multi-tenant scoping — closed at the edge. KrakenD rewrites the client-supplied Idempotency-Key into t-{tenant}:{project}:{subject}:{raw-key} before forwarding to Restate, so Restate's native (service, handler, key) scope is effectively (tenant, project, subject, service, handler, key) from the platform's perspective.
  • IETF-aligned semanticsIdempotency-Key per the IETF draft is a sender-generated value. The design preserves that contract: for third-party sources without a canonical header, we either translate from their transport slot (Case B) or dedupe internally in the handler (Case C) — never synthesize keys at the edge.

Canonical reference: Edge Idempotency Key PRD.

Glossary

TermDefinition
Idempotency KeyA sender-generated, per-retry-unique value. Safe to reuse across retries of the same logical action; must change for a new action. Opaque string ^[A-Za-z0-9._~\-]{1,255}$.
Namespace rewriteKrakenD's transformation of the client-supplied key into t-{tenant}:{project}:{subject}:{raw-key} before the request reaches Restate. Closes cross-tenant, cross-project, and intra-tenant user-to-user cache leaks.
Case ACanonical case — caller sends a real Idempotency-Key header. Covers first-party clients, first-party services, the Asynq scheduler worker, and third-party IETF-conforming consumers.
Case BTransport translation — third-party webhook provider sends a per-delivery unique identifier in a non-canonical slot (JSON body, custom header, form field). KrakenD lifts it into the canonical slot.
Case CHandler-internal dedup — the source ships no per-delivery identifier. The handler derives an internal dedup token and passes it via restate.WithIdempotencyKey(...) on downstream calls. No active callers as of v1.0; retained as a template.
Retention classPer-service cache retention window: 24h (interactive default), 48h – 7d (webhooks), 7d (financial/critical). Configurable at runtime via Restate CLI.
Terminal errorAn error explicitly marked restate.TerminalError(...). Cached identically to a successful response — a retry with the same key returns the same error body, correctly.
Recovery routePer-endpoint GET …/by-idempotency-key/{key} route that lets a caller retrieve the cached response of an earlier write whose response they lost. KrakenD applies the same namespace rewrite to the path segment.
_tenant / _platform sentinelReserved project-segment values for tenant-level admin routes (no project scope) and platform-operator routes (cross-tenant). Underscore prefix cannot collide with real project slugs.

The Three Caller Cases

Every edge endpoint classifies each of its expected callers into exactly one of these three cases, declared in the per-endpoint policy table in the PRD.

CaseStatusWho it coversWhat KrakenD does
A — Canonical headerActiveMobile clients, first-party services, Asynq scheduler worker, IETF-conforming third-party consumersValidates the client-supplied Idempotency-Key, applies the namespace rewrite, forwards to Restate
B — Transport translationActiveStripe (event.id in JSON body), GitHub (X-GitHub-Delivery header), Twilio Event Streams (I-Twilio-Idempotency-Token), Twilio messaging (MessageSid+MessageStatus)After source-specific signature validation, extracts the sender's identifier from its transport slot, constructs Idempotency-Key: <provider>-<identifier>, then namespaces
C — Handler-internalNo active callersReserved template for future external sources with no per-delivery identifierNo ingress-level Idempotency-Key. The handler derives an internal dedup token and passes it via restate.WithIdempotencyKey(...) on downstream calls

The three cases are mutually exclusive per-endpoint. An endpoint declared REQUIRED_A rejects requests without an Idempotency-Key; REQUIRED_B rejects requests where transport translation can't extract the identifier; ACCEPTED routes proceed without dedup when the header is absent; HANDLER_INTERNAL_C routes do not look at the ingress header; PROHIBITED routes (reads, streaming, workflow /run) reject any Idempotency-Key.

Cache Isolation: Tenant, Project, Subject

KrakenD namespace rewrite — four-segment key structure (reserved prefix, tenant, project, subject, raw client key) with the three threat scenarios each segment closes

The SVG is editable — open it in draw.io to modify.

The load-bearing security argument. Restate's native idempotency scope is (service, handler, key)no tenant, project, or subject dimension. Without mitigation, three different principals could collide on the same key and one could read another's cached response:

ThreatScenarioClosed by
Cross-tenantTenant A sends key K; Tenant B sends the same key to the same endpoint → Restate returns A's cached response to BTenant prefix t-{tenant}: in the namespace rewrite
Cross-projectAcme's Health-Prod and Acme's Fitness-App share the same raw key → one project reads the other's cached responseProject prefix :{project}: added in v0.8 of the PRD
Intra-tenant, intra-projectAlice and Bob in the same project share a raw key (leak via log, shared device, HAR file, crash report) → Bob reads Alice's cached responseSubject prefix :{subject}: added in v0.6 of the PRD

All three scopes are closed by the same KrakenD rewrite mechanism. The rewrite happens in integrations/krakend/auth.go after authentication, using fields the auth plugin already extracts. The client contract is preserved — callers send opaque IETF-standard values and never see the prefixes.

Original client headerKrakenD-rewritten value
Idempotency-Key: 8e03978e-40d5-43e8-bc93-6894a57f9324Idempotency-Key: t-acme:health-prod:user:firebase_uid_123:8e03978e-40d5-43e8-bc93-6894a57f9324

Why at KrakenD, not in the handler

Restate's cached-response path never invokes the handler on a cache hit — the ingress layer short-circuits before the handler's tenant/authz check runs. A handler-level check cannot close these leaks. The rewrite must happen at the edge, before the partition processor keys the request.

Tenant-level routes — the _tenant sentinel

Not every authenticated route has a project dimension. TenantAdminGateway.ListProjects() operates on the tenant itself, above the project layer. These routes use the reserved sentinel _tenant as the project segment: t-acme:_tenant:tenant_member:alice:.... Platform-operator routes (cross-tenant) use _platform similarly. Per the PRD, tenant-level admin routes default to PROHIBITED for Idempotency-Key unless the specific endpoint has a real retry story — admin ops benefit more from explicit confirmation than transparent retry.

Service & Component Inventory

New / Extended Components

ComponentPurposePRD section
KrakenD auth plugin (integrations/krakend/auth.go)Extended to rewrite Idempotency-Key after authentication; validates format, length, reserved prefix§KrakenD Responsibilities
Per-endpoint policy classificationEvery write route declares one of five policies (REQUIRED_A, REQUIRED_B, ACCEPTED, HANDLER_INTERNAL_C, PROHIBITED) in the KrakenD config§Per-Endpoint Policy
Case B transport-translation rulesPer-provider extraction functions for Stripe, GitHub, Twilio Event Streams, Twilio messaging§Case B: Transport Translation
Per-service retention configurationExplicit WithIdempotencyRetention(...) in each gateway's v1/cmd/main.go§Restate-Side Behavior
Per-endpoint recovery routesGET …/by-idempotency-key/{key} for each write endpoint that accepts a key§Phase 5
x-operation-id response headerRenamed from upstream x-restate-id via KrakenD output_headers rule; avoids leaking internal infrastructure in client-visible headers§Phase 2

Integrated Components (No Changes Required)

ComponentRole
Restate ingressNative Idempotency-Key handling — hashing to partition, atomic record at the processor leader, response caching, attach-by-key, peek-by-key
Restate Go SDKrestate.WithIdempotencyKey(...) option on downstream invocations; used by Asynq worker and the Case C template
Asynq workerGenerates deterministic Idempotency-Key: sched:{schedule_id}:{ms} on every HTTP dispatch; HMAC-signs via X-Schedule-Signature
UnkeyexternalId = tenant_id, meta.project_id = project_id — source of the tenant + project dimensions for API-key routes and Case B webhook routes
OpenBaoStores per-project webhook signing secrets at {tenant_id}:{project_id}:{provider}

Observability Additions

MetricLabelsPurpose
idempotency_key_present_totaltenant, project, endpoint, caseRequests that carried a key (Case A) / had one translated (Case B)
idempotency_key_missing_totaltenant, project, endpoint, class400s on REQUIRED_* routes; bypass on ACCEPTED routes
idempotency_key_invalid_totaltenant, project, endpoint, reason400s due to validation (length, charset, reserved prefix)
transport_translation_failed_totaltenant, project, endpoint, provider400s where Case B extraction failed
idempotency_cache_hit_totaltenant, project, endpoint, restate_statusHeadline metric — same-key returns
handler_internal_dedup_hit_totaltenant, project, endpointCase C downstream dedup hits

Structured log fields added at KrakenD: idempotency_case, idempotency_key_namespaced, idempotency_source, idempotency_policy, restate_invocation_id, restate_cache_hit. All logs and metrics label (tenant, project, endpoint) so a per-project regression in a multi-project tenant isn't hidden by tenant-grain aggregation.

Flow 1: Case A — Canonical Header

The core of the design. Applies to mobile clients, first-party services, the Asynq scheduler worker, and any third-party IETF-conforming consumer.

Case A sequence diagram — mobile client POSTs with Idempotency-Key; KrakenD validates via Unkey, rewrites the key with tenant/project/subject prefix, forwards to Restate; on NEW KEY the handler runs once and Restate caches the response for retries

Client  POST /api/v1/conversations/send-message
Idempotency-Key: 8e03978e-40d5-43e8-bc93-6894a57f9324
Authorization: Bearer <token>



KrakenD auth plugin
1. Validate JWT/API key → tenant = acme, project = health-prod, subject = user:firebase_uid_123
2. Reject if Idempotency-Key starts with reserved prefix t-
3. Rewrite: Idempotency-Key: t-acme:health-prod:user:firebase_uid_123:8e03978e-...
4. Forward to Restate ingress



Restate ingress + partition processor
- NEW KEY: atomically record, transition invocation to RUNNING, invoke handler
- KEY EXISTS + COMPLETE: return cached response, do not invoke handler
- KEY EXISTS + RUNNING: attach second caller to the in-flight invocation



Gateway handler (on NEW KEY only)
- Execute, return response. Restate caches the response for the service's retention window.

Key format rules (enforced at KrakenD): regex ^[A-Za-z0-9._~\-]{1,255}$; MUST NOT start with t-; colons not permitted in the client portion (prevents prefix forgery). Max namespaced length 640 characters — well within Restate's 1 KiB ingress limit.

Flow 2: Case B — Transport Translation

Applies to third-party webhook providers that ship a per-delivery unique identifier in a non-canonical slot.

Case B sequence diagram — Stripe webhook hits an opaque per-project URL; KrakenD resolves (tenant, project) via Unkey, fetches the project&#39;s signing secret from OpenBao, verifies the Stripe signature, extracts event.id from the body, translates to Idempotency-Key, applies namespace rewrite, forwards to Restate

Provider  POST /api/v1/webhooks/stripe/{webhook_id}
Stripe-Signature: t=...,v1=...
Content-Type: application/json
{ "id": "evt_1MtB6y2eZvKYlo2CrwACPpHB", ... }



KrakenD
1. Look up {webhook_id} in Unkey → tenant = acme, project = health-prod, provider = stripe
2. Fetch signing secret from OpenBao at acme:health-prod:stripe
3. Verify Stripe-Signature with the project's signing secret (reject on mismatch)
4. Extract event.id from body: "evt_1MtB6y2eZvKYlo2CrwACPpHB"
5. Construct Idempotency-Key: stripe-evt_1MtB6y2eZvKYlo2CrwACPpHB
6. Rewrite (as in Case A): t-acme:health-prod:service:webhook-ingest:stripe-evt_...
7. Forward to Restate ingress



(Same as Case A from here — Restate's native mechanism handles dedup.)

The subject for Case B is always the webhook-ingest service principal (service:webhook-ingest) — Stripe's delivery isn't on behalf of any specific tenant user.

Case B providers — sender-designated dedup primitives

ProviderLocationSourceProvider retry window
StripeJSON bodyevent.id (docs)Up to 3 days (live mode)
GitHubHTTP headerX-GitHub-Delivery (docs)~8 hours, 3 attempts
Twilio Event StreamsHTTP headerI-Twilio-Idempotency-Token (docs)Up to 4 hours
Twilio messaging/voiceForm bodyMessageSid + MessageStatus compositeVariable

Case B webhook URL design

Each project gets an opaque per-project webhook URL when its onboarding provisions the integration:

https://api.travila.ai/api/v1/webhooks/{provider}/{webhook_id}

{webhook_id} is an opaque, non-enumerable token (e.g. wh_01JABC…) stored in Unkey with metadata {tenant_id, project_id, provider}. The customer configures that URL in their Stripe/GitHub/Twilio dashboard alongside the signing secret generated by the Console. No project ID appears in the URL — an attacker harvesting URLs cannot infer org structure.

Flow 3: Case C — Handler-Internal Dedup

No active callers as of v1.0. Reserved template for future external integrations that ship no per-delivery dedup primitive. The handler runs on every delivery (including retries), derives an internal dedup token from trusted signals, and passes it via restate.WithIdempotencyKey(...) on downstream Restate calls. The expensive work deduplicates at the downstream layer; the pre-work (parse, validation, token derivation) re-runs cheaply on each retry.

Originally the canonical Case C source was GCP Cloud Scheduler. The Asynq + Firestore scheduler rewrite promoted scheduled dispatches to Case A by having the worker set Idempotency-Key directly on every HTTP dispatch.

Flow 4: Recovery Route

A client that dropped the response to an earlier write — network blip, process crash, app kill — can recover it without resubmitting the request body:

Client  GET /api/v1/llm/gateway/send-message/by-idempotency-key/8e03978e-...
Authorization: Bearer <token>



KrakenD
1. Validate JWT → tenant = acme, project = health-prod, subject = user:firebase_uid_123
2. Apply the same namespace rewrite to the path segment:
/restate/invocation/LLMGatewayService/SendMessage/t-acme:health-prod:user:firebase_uid_123:8e03978e-.../attach



Restate ingress
- Attach to the already-completed invocation → return the cached response
- Or block on the in-flight invocation until it resolves, then return

Recovery routes are subject-scoped by construction — the namespace rewrite happens with the current caller's subject, so Alice's recovery call cannot retrieve Bob's cached response even if Alice obtained Bob's raw key.

Client-facing URL brand hygiene: the public URL is /api/v1/{service}/{endpoint}/by-idempotency-key/{key}; "restate" never appears in client-visible URLs or headers. The response header x-restate-id is renamed to x-operation-id at KrakenD via output_headers rules.

Retention Classes

Per-service retention is declared in each gateway's v1/cmd/main.go. Runtime-tunable via Restate CLI without a redeploy.

ClassServicesRetentionRationale
Interactive (default)LLM gateway, Storage gateway, Notification gateway24 hoursBalances memory usage against reasonable client retry windows
Webhook ingestionWebhook gateway (Stripe, GitHub, Twilio)7 daysOutlasts Stripe's 3-day retry window with slack
Financial / criticalAPI key rotation, future billing endpoints7 daysHigh-stakes operations need longer post-hoc recovery

Retry policy — pause vs kill on exhaustion

Restate's default retry policy is max-attempts = 70, on-max-attempts = "pause". Paused invocations indefinitely hold their idempotency slot; client retries attach to the paused slot and wait. Gateway services whose handlers can legitimately get stuck (LLM timeouts, third-party outages) should declare KillOnMaxAttempts so stuck invocations terminate and free their slot. Trade-off: kill does not run compensation logic, so handlers that accumulate partial side effects (multi-step external API calls that can half-succeed) should stay on pause and add operator runbooks.

Rollout Phases

The 8-phase plan from the PRD. Each phase is mostly mechanical — KrakenD config, service main.go edits, metric emission. No new infrastructure to provision.

PhaseScopeStatus
1. KrakenD auth plugin rewriteExtend integrations/krakend/auth.go to rewrite Idempotency-Key into t-{tenant}:{project}:{subject}:{raw} after authentication; validate format, length, reserved prefix; integration-test cross-tenant and cross-project independenceNot started
2. KrakenD route classificationClassify every write route into one of five policies (REQUIRED_A, REQUIRED_B, ACCEPTED, HANDLER_INTERNAL_C, PROHIBITED); add x-operation-id output_headers rename ruleNot started
3. Case B transport translationPer-provider extraction functions (Stripe, GitHub, Twilio Event Streams, Twilio messaging); per-project webhook URL provisioning during project onboardingNot started
4. Per-service retentionExplicit WithIdempotencyRetention(...) declarations in each gateway's v1/cmd/main.go; KillOnMaxAttempts for services where stuck handlers shouldn't block retriesNot started
5. Per-endpoint recovery routesGET …/by-idempotency-key/{key} and /output variants paired with every write endpoint that accepts a keyNot started
6. Scheduled dispatch verificationVerify the Asynq worker's Idempotency-Key flows correctly through KrakenD namespacing; confirm scheduled-trigger handlers don't need handler-internal Case C dedup codeNot started (Asynq itself: shipped in PR #1509)
7. ObservabilityEmit the metrics listed above; Grafana dashboard at (tenant, project, endpoint) grain; alerts on cache hit drop, invalid keys, translation failuresNot started
8. Client SDK + public docsMobile SDK HTTP-client wrapper generates UUID4 keys at intent-capture time; OpenAPI specs carry the policy class per endpoint; public integration guideNot started

Dependency ordering

Phase depends onReason
Phase 1 depends on Projects PRDNeeds x-project-id header injection from Unkey metadata
Phase 1 depends on Authorization PRDNeeds the typed-subject format (user:*, service:*, etc.)
Phase 3 depends on Phase 1Case B rewrite reuses Phase 1's namespace mechanism
Phase 5 depends on Phase 1Recovery route rewrite reuses Phase 1's rewrite on the path segment
Phases 4, 7, 8 are independentCan run in parallel with Phase 1–3 work

Operational Patterns

Design detail that lives in the PRD and applies to platform operators and gateway handler authors, not to client callers:

  • Cached terminal errors + manual resend. A TerminalError is cached for the full retention window. Manual resend from a provider dashboard (Stripe "Resend", GitHub "Redeliver") reuses the same event identifier, so the cached error is returned and the handler isn't re-invoked until the retention window expires. V1 escape hatch: restate invocations purge <id> via CLI, then trigger the provider-side resend. First-class admin tooling (purge proxy route under platform_member auth, cached-terminal-error metric label, audit feed) is deferred pending real-incident signal.
  • DLQ interaction. Handlers that catch their own terminal error and forward to a dead-letter queue must rethrow the terminal error after forwarding, or the cached response becomes the DLQ handler's success rather than the original business error.
  • Pause gotcha. Described under Retention Classes above.

See the PRD's Security Considerations and Background sections for the full treatment — enumeration resistance, cache poisoning defenses, replay protection, and the full list of open questions (body-hash conflict detection, outbound idempotency to external APIs, Kafka-consumer dedup, operation-ID-based catch-all recovery, admin tooling).

Out of Scope for v1

  • Body-hash conflict detection — the IETF draft says clients MUST NOT reuse a key with a different body; Restate treats same-key-different-body as "return cached response." Deferred until the idempotency_key_collision_total metric actually fires
  • Outbound idempotency from handlers to external APIs (Stripe, OpenAI, Sendgrid) — separate follow-up PRD
  • Kafka-consumer dedup — belongs in the Events Pipeline PRD's scope, not edge
  • Client-side key persistence across mobile app restarts — SDK concern, decided in Phase 8
  • Operation-ID-based catch-all recovery route (/api/v1/operations/{op_id}) — dropped in v0.6 of the PRD because Restate's attach-by-invocation-ID endpoint doesn't consult the idempotency-key namespace; captured as a future-phase Open Question in the PRD

Cross-References