Edge Idempotency Roadmap
End-to-end view of how safe retry becomes a first-class platform primitive — wired at the edge in KrakenD, delegating to Restate's native idempotency mechanism, scoped by (tenant, project, subject) so cached responses can never cross those boundaries. This page consolidates the Edge Idempotency Key PRD (v1.0) and its dependencies into a single reference. The design is complete; the 8-phase rollout has not started.
This page is derived from the Edge Idempotency Key PRD and its closely related dependencies:
Primary (docs/prd/):
- Edge Idempotency Key — the v1.0 design, three-case framing, namespace rewrite, retention classes, per-endpoint policy
Dependencies (docs/prd/multi-tenancy/):
- 02 · Projects — resolves
project_idfrom Unkey key metadata; Projects PRD ships before the namespace rewrite - 05 · Authorization — defines the typed-subject format (
user:*,service:*,tenant_member:*,client:*) used in the namespace - 06 · API Keys — Unkey
externalId+meta.project_idsurface the tenant/project dimensions
Related:
- Scheduled Jobs v2 (PR #1509) — Asynq + Firestore scheduler; scheduled dispatches became Case A callers, retiring the Cloud Scheduler Case C branch
- Tenant & Project Lifecycle Roadmap — sibling roadmap for the multi-tenant substrate this design builds on
The whole design delegates to primitives already shipped by Restate. No application-level dedup table, no Redis dedup middleware, no custom caching layer:
- Dedup mechanism — Restate's partition processor atomically records each
Idempotency-Keyat ingress and caches the committed response (success and terminal error) for a configurable retention window. Duplicate calls with the same key return the cached response without re-invoking the handler. - Multi-tenant scoping — closed at the edge. KrakenD rewrites the client-supplied
Idempotency-Keyintot-{tenant}:{project}:{subject}:{raw-key}before forwarding to Restate, so Restate's native(service, handler, key)scope is effectively(tenant, project, subject, service, handler, key)from the platform's perspective. - IETF-aligned semantics —
Idempotency-Keyper the IETF draft is a sender-generated value. The design preserves that contract: for third-party sources without a canonical header, we either translate from their transport slot (Case B) or dedupe internally in the handler (Case C) — never synthesize keys at the edge.
Canonical reference: Edge Idempotency Key PRD.
Glossary
| Term | Definition |
|---|---|
| Idempotency Key | A sender-generated, per-retry-unique value. Safe to reuse across retries of the same logical action; must change for a new action. Opaque string ^[A-Za-z0-9._~\-]{1,255}$. |
| Namespace rewrite | KrakenD's transformation of the client-supplied key into t-{tenant}:{project}:{subject}:{raw-key} before the request reaches Restate. Closes cross-tenant, cross-project, and intra-tenant user-to-user cache leaks. |
| Case A | Canonical case — caller sends a real Idempotency-Key header. Covers first-party clients, first-party services, the Asynq scheduler worker, and third-party IETF-conforming consumers. |
| Case B | Transport translation — third-party webhook provider sends a per-delivery unique identifier in a non-canonical slot (JSON body, custom header, form field). KrakenD lifts it into the canonical slot. |
| Case C | Handler-internal dedup — the source ships no per-delivery identifier. The handler derives an internal dedup token and passes it via restate.WithIdempotencyKey(...) on downstream calls. No active callers as of v1.0; retained as a template. |
| Retention class | Per-service cache retention window: 24h (interactive default), 48h – 7d (webhooks), 7d (financial/critical). Configurable at runtime via Restate CLI. |
| Terminal error | An error explicitly marked restate.TerminalError(...). Cached identically to a successful response — a retry with the same key returns the same error body, correctly. |
| Recovery route | Per-endpoint GET …/by-idempotency-key/{key} route that lets a caller retrieve the cached response of an earlier write whose response they lost. KrakenD applies the same namespace rewrite to the path segment. |
_tenant / _platform sentinel | Reserved project-segment values for tenant-level admin routes (no project scope) and platform-operator routes (cross-tenant). Underscore prefix cannot collide with real project slugs. |
The Three Caller Cases
Every edge endpoint classifies each of its expected callers into exactly one of these three cases, declared in the per-endpoint policy table in the PRD.
| Case | Status | Who it covers | What KrakenD does |
|---|---|---|---|
| A — Canonical header | Active | Mobile clients, first-party services, Asynq scheduler worker, IETF-conforming third-party consumers | Validates the client-supplied Idempotency-Key, applies the namespace rewrite, forwards to Restate |
| B — Transport translation | Active | Stripe (event.id in JSON body), GitHub (X-GitHub-Delivery header), Twilio Event Streams (I-Twilio-Idempotency-Token), Twilio messaging (MessageSid+MessageStatus) | After source-specific signature validation, extracts the sender's identifier from its transport slot, constructs Idempotency-Key: <provider>-<identifier>, then namespaces |
| C — Handler-internal | No active callers | Reserved template for future external sources with no per-delivery identifier | No ingress-level Idempotency-Key. The handler derives an internal dedup token and passes it via restate.WithIdempotencyKey(...) on downstream calls |
The three cases are mutually exclusive per-endpoint. An endpoint declared REQUIRED_A rejects requests without an Idempotency-Key; REQUIRED_B rejects requests where transport translation can't extract the identifier; ACCEPTED routes proceed without dedup when the header is absent; HANDLER_INTERNAL_C routes do not look at the ingress header; PROHIBITED routes (reads, streaming, workflow /run) reject any Idempotency-Key.
Cache Isolation: Tenant, Project, Subject
The SVG is editable — open it in draw.io to modify.
The load-bearing security argument. Restate's native idempotency scope is (service, handler, key) — no tenant, project, or subject dimension. Without mitigation, three different principals could collide on the same key and one could read another's cached response:
| Threat | Scenario | Closed by |
|---|---|---|
| Cross-tenant | Tenant A sends key K; Tenant B sends the same key to the same endpoint → Restate returns A's cached response to B | Tenant prefix t-{tenant}: in the namespace rewrite |
| Cross-project | Acme's Health-Prod and Acme's Fitness-App share the same raw key → one project reads the other's cached response | Project prefix :{project}: added in v0.8 of the PRD |
| Intra-tenant, intra-project | Alice and Bob in the same project share a raw key (leak via log, shared device, HAR file, crash report) → Bob reads Alice's cached response | Subject prefix :{subject}: added in v0.6 of the PRD |
All three scopes are closed by the same KrakenD rewrite mechanism. The rewrite happens in integrations/krakend/auth.go after authentication, using fields the auth plugin already extracts. The client contract is preserved — callers send opaque IETF-standard values and never see the prefixes.
| Original client header | KrakenD-rewritten value |
|---|---|
Idempotency-Key: 8e03978e-40d5-43e8-bc93-6894a57f9324 | Idempotency-Key: t-acme:health-prod:user:firebase_uid_123:8e03978e-40d5-43e8-bc93-6894a57f9324 |
Why at KrakenD, not in the handler
Restate's cached-response path never invokes the handler on a cache hit — the ingress layer short-circuits before the handler's tenant/authz check runs. A handler-level check cannot close these leaks. The rewrite must happen at the edge, before the partition processor keys the request.
Tenant-level routes — the _tenant sentinel
Not every authenticated route has a project dimension. TenantAdminGateway.ListProjects() operates on the tenant itself, above the project layer. These routes use the reserved sentinel _tenant as the project segment: t-acme:_tenant:tenant_member:alice:.... Platform-operator routes (cross-tenant) use _platform similarly. Per the PRD, tenant-level admin routes default to PROHIBITED for Idempotency-Key unless the specific endpoint has a real retry story — admin ops benefit more from explicit confirmation than transparent retry.
Service & Component Inventory
New / Extended Components
| Component | Purpose | PRD section |
|---|---|---|
KrakenD auth plugin (integrations/krakend/auth.go) | Extended to rewrite Idempotency-Key after authentication; validates format, length, reserved prefix | §KrakenD Responsibilities |
| Per-endpoint policy classification | Every write route declares one of five policies (REQUIRED_A, REQUIRED_B, ACCEPTED, HANDLER_INTERNAL_C, PROHIBITED) in the KrakenD config | §Per-Endpoint Policy |
| Case B transport-translation rules | Per-provider extraction functions for Stripe, GitHub, Twilio Event Streams, Twilio messaging | §Case B: Transport Translation |
| Per-service retention configuration | Explicit WithIdempotencyRetention(...) in each gateway's v1/cmd/main.go | §Restate-Side Behavior |
| Per-endpoint recovery routes | GET …/by-idempotency-key/{key} for each write endpoint that accepts a key | §Phase 5 |
x-operation-id response header | Renamed from upstream x-restate-id via KrakenD output_headers rule; avoids leaking internal infrastructure in client-visible headers | §Phase 2 |
Integrated Components (No Changes Required)
| Component | Role |
|---|---|
| Restate ingress | Native Idempotency-Key handling — hashing to partition, atomic record at the processor leader, response caching, attach-by-key, peek-by-key |
| Restate Go SDK | restate.WithIdempotencyKey(...) option on downstream invocations; used by Asynq worker and the Case C template |
| Asynq worker | Generates deterministic Idempotency-Key: sched:{schedule_id}:{ms} on every HTTP dispatch; HMAC-signs via X-Schedule-Signature |
| Unkey | externalId = tenant_id, meta.project_id = project_id — source of the tenant + project dimensions for API-key routes and Case B webhook routes |
| OpenBao | Stores per-project webhook signing secrets at {tenant_id}:{project_id}:{provider} |
Observability Additions
| Metric | Labels | Purpose |
|---|---|---|
idempotency_key_present_total | tenant, project, endpoint, case | Requests that carried a key (Case A) / had one translated (Case B) |
idempotency_key_missing_total | tenant, project, endpoint, class | 400s on REQUIRED_* routes; bypass on ACCEPTED routes |
idempotency_key_invalid_total | tenant, project, endpoint, reason | 400s due to validation (length, charset, reserved prefix) |
transport_translation_failed_total | tenant, project, endpoint, provider | 400s where Case B extraction failed |
idempotency_cache_hit_total | tenant, project, endpoint, restate_status | Headline metric — same-key returns |
handler_internal_dedup_hit_total | tenant, project, endpoint | Case C downstream dedup hits |
Structured log fields added at KrakenD: idempotency_case, idempotency_key_namespaced, idempotency_source, idempotency_policy, restate_invocation_id, restate_cache_hit. All logs and metrics label (tenant, project, endpoint) so a per-project regression in a multi-project tenant isn't hidden by tenant-grain aggregation.
Flow 1: Case A — Canonical Header
The core of the design. Applies to mobile clients, first-party services, the Asynq scheduler worker, and any third-party IETF-conforming consumer.
Client POST /api/v1/conversations/send-message
Idempotency-Key: 8e03978e-40d5-43e8-bc93-6894a57f9324
Authorization: Bearer <token>
▼
KrakenD auth plugin
1. Validate JWT/API key → tenant = acme, project = health-prod, subject = user:firebase_uid_123
2. Reject if Idempotency-Key starts with reserved prefix t-
3. Rewrite: Idempotency-Key: t-acme:health-prod:user:firebase_uid_123:8e03978e-...
4. Forward to Restate ingress
▼
Restate ingress + partition processor
- NEW KEY: atomically record, transition invocation to RUNNING, invoke handler
- KEY EXISTS + COMPLETE: return cached response, do not invoke handler
- KEY EXISTS + RUNNING: attach second caller to the in-flight invocation
▼
Gateway handler (on NEW KEY only)
- Execute, return response. Restate caches the response for the service's retention window.
Key format rules (enforced at KrakenD): regex ^[A-Za-z0-9._~\-]{1,255}$; MUST NOT start with t-; colons not permitted in the client portion (prevents prefix forgery). Max namespaced length 640 characters — well within Restate's 1 KiB ingress limit.
Flow 2: Case B — Transport Translation
Applies to third-party webhook providers that ship a per-delivery unique identifier in a non-canonical slot.
Provider POST /api/v1/webhooks/stripe/{webhook_id}
Stripe-Signature: t=...,v1=...
Content-Type: application/json
{ "id": "evt_1MtB6y2eZvKYlo2CrwACPpHB", ... }
▼
KrakenD
1. Look up {webhook_id} in Unkey → tenant = acme, project = health-prod, provider = stripe
2. Fetch signing secret from OpenBao at acme:health-prod:stripe
3. Verify Stripe-Signature with the project's signing secret (reject on mismatch)
4. Extract event.id from body: "evt_1MtB6y2eZvKYlo2CrwACPpHB"
5. Construct Idempotency-Key: stripe-evt_1MtB6y2eZvKYlo2CrwACPpHB
6. Rewrite (as in Case A): t-acme:health-prod:service:webhook-ingest:stripe-evt_...
7. Forward to Restate ingress
▼
(Same as Case A from here — Restate's native mechanism handles dedup.)
The subject for Case B is always the webhook-ingest service principal (service:webhook-ingest) — Stripe's delivery isn't on behalf of any specific tenant user.
Case B providers — sender-designated dedup primitives
| Provider | Location | Source | Provider retry window |
|---|---|---|---|
| Stripe | JSON body | event.id (docs) | Up to 3 days (live mode) |
| GitHub | HTTP header | X-GitHub-Delivery (docs) | ~8 hours, 3 attempts |
| Twilio Event Streams | HTTP header | I-Twilio-Idempotency-Token (docs) | Up to 4 hours |
| Twilio messaging/voice | Form body | MessageSid + MessageStatus composite | Variable |
Case B webhook URL design
Each project gets an opaque per-project webhook URL when its onboarding provisions the integration:
https://api.travila.ai/api/v1/webhooks/{provider}/{webhook_id}
{webhook_id} is an opaque, non-enumerable token (e.g. wh_01JABC…) stored in Unkey with metadata {tenant_id, project_id, provider}. The customer configures that URL in their Stripe/GitHub/Twilio dashboard alongside the signing secret generated by the Console. No project ID appears in the URL — an attacker harvesting URLs cannot infer org structure.
Flow 3: Case C — Handler-Internal Dedup
No active callers as of v1.0. Reserved template for future external integrations that ship no per-delivery dedup primitive. The handler runs on every delivery (including retries), derives an internal dedup token from trusted signals, and passes it via restate.WithIdempotencyKey(...) on downstream Restate calls. The expensive work deduplicates at the downstream layer; the pre-work (parse, validation, token derivation) re-runs cheaply on each retry.
Originally the canonical Case C source was GCP Cloud Scheduler. The Asynq + Firestore scheduler rewrite promoted scheduled dispatches to Case A by having the worker set Idempotency-Key directly on every HTTP dispatch.
Flow 4: Recovery Route
A client that dropped the response to an earlier write — network blip, process crash, app kill — can recover it without resubmitting the request body:
Client GET /api/v1/llm/gateway/send-message/by-idempotency-key/8e03978e-...
Authorization: Bearer <token>
▼
KrakenD
1. Validate JWT → tenant = acme, project = health-prod, subject = user:firebase_uid_123
2. Apply the same namespace rewrite to the path segment:
/restate/invocation/LLMGatewayService/SendMessage/t-acme:health-prod:user:firebase_uid_123:8e03978e-.../attach
▼
Restate ingress
- Attach to the already-completed invocation → return the cached response
- Or block on the in-flight invocation until it resolves, then return
Recovery routes are subject-scoped by construction — the namespace rewrite happens with the current caller's subject, so Alice's recovery call cannot retrieve Bob's cached response even if Alice obtained Bob's raw key.
Client-facing URL brand hygiene: the public URL is /api/v1/{service}/{endpoint}/by-idempotency-key/{key}; "restate" never appears in client-visible URLs or headers. The response header x-restate-id is renamed to x-operation-id at KrakenD via output_headers rules.
Retention Classes
Per-service retention is declared in each gateway's v1/cmd/main.go. Runtime-tunable via Restate CLI without a redeploy.
| Class | Services | Retention | Rationale |
|---|---|---|---|
| Interactive (default) | LLM gateway, Storage gateway, Notification gateway | 24 hours | Balances memory usage against reasonable client retry windows |
| Webhook ingestion | Webhook gateway (Stripe, GitHub, Twilio) | 7 days | Outlasts Stripe's 3-day retry window with slack |
| Financial / critical | API key rotation, future billing endpoints | 7 days | High-stakes operations need longer post-hoc recovery |
Retry policy — pause vs kill on exhaustion
Restate's default retry policy is max-attempts = 70, on-max-attempts = "pause". Paused invocations indefinitely hold their idempotency slot; client retries attach to the paused slot and wait. Gateway services whose handlers can legitimately get stuck (LLM timeouts, third-party outages) should declare KillOnMaxAttempts so stuck invocations terminate and free their slot. Trade-off: kill does not run compensation logic, so handlers that accumulate partial side effects (multi-step external API calls that can half-succeed) should stay on pause and add operator runbooks.
Rollout Phases
The 8-phase plan from the PRD. Each phase is mostly mechanical — KrakenD config, service main.go edits, metric emission. No new infrastructure to provision.
| Phase | Scope | Status |
|---|---|---|
| 1. KrakenD auth plugin rewrite | Extend integrations/krakend/auth.go to rewrite Idempotency-Key into t-{tenant}:{project}:{subject}:{raw} after authentication; validate format, length, reserved prefix; integration-test cross-tenant and cross-project independence | Not started |
| 2. KrakenD route classification | Classify every write route into one of five policies (REQUIRED_A, REQUIRED_B, ACCEPTED, HANDLER_INTERNAL_C, PROHIBITED); add x-operation-id output_headers rename rule | Not started |
| 3. Case B transport translation | Per-provider extraction functions (Stripe, GitHub, Twilio Event Streams, Twilio messaging); per-project webhook URL provisioning during project onboarding | Not started |
| 4. Per-service retention | Explicit WithIdempotencyRetention(...) declarations in each gateway's v1/cmd/main.go; KillOnMaxAttempts for services where stuck handlers shouldn't block retries | Not started |
| 5. Per-endpoint recovery routes | GET …/by-idempotency-key/{key} and /output variants paired with every write endpoint that accepts a key | Not started |
| 6. Scheduled dispatch verification | Verify the Asynq worker's Idempotency-Key flows correctly through KrakenD namespacing; confirm scheduled-trigger handlers don't need handler-internal Case C dedup code | Not started (Asynq itself: shipped in PR #1509) |
| 7. Observability | Emit the metrics listed above; Grafana dashboard at (tenant, project, endpoint) grain; alerts on cache hit drop, invalid keys, translation failures | Not started |
| 8. Client SDK + public docs | Mobile SDK HTTP-client wrapper generates UUID4 keys at intent-capture time; OpenAPI specs carry the policy class per endpoint; public integration guide | Not started |
Dependency ordering
| Phase depends on | Reason |
|---|---|
| Phase 1 depends on Projects PRD | Needs x-project-id header injection from Unkey metadata |
| Phase 1 depends on Authorization PRD | Needs the typed-subject format (user:*, service:*, etc.) |
| Phase 3 depends on Phase 1 | Case B rewrite reuses Phase 1's namespace mechanism |
| Phase 5 depends on Phase 1 | Recovery route rewrite reuses Phase 1's rewrite on the path segment |
| Phases 4, 7, 8 are independent | Can run in parallel with Phase 1–3 work |
Operational Patterns
Design detail that lives in the PRD and applies to platform operators and gateway handler authors, not to client callers:
- Cached terminal errors + manual resend. A
TerminalErroris cached for the full retention window. Manual resend from a provider dashboard (Stripe "Resend", GitHub "Redeliver") reuses the same event identifier, so the cached error is returned and the handler isn't re-invoked until the retention window expires. V1 escape hatch:restate invocations purge <id>via CLI, then trigger the provider-side resend. First-class admin tooling (purge proxy route underplatform_memberauth, cached-terminal-error metric label, audit feed) is deferred pending real-incident signal. - DLQ interaction. Handlers that catch their own terminal error and forward to a dead-letter queue must rethrow the terminal error after forwarding, or the cached response becomes the DLQ handler's success rather than the original business error.
- Pause gotcha. Described under Retention Classes above.
See the PRD's Security Considerations and Background sections for the full treatment — enumeration resistance, cache poisoning defenses, replay protection, and the full list of open questions (body-hash conflict detection, outbound idempotency to external APIs, Kafka-consumer dedup, operation-ID-based catch-all recovery, admin tooling).
Out of Scope for v1
- Body-hash conflict detection — the IETF draft says clients MUST NOT reuse a key with a different body; Restate treats same-key-different-body as "return cached response." Deferred until the
idempotency_key_collision_totalmetric actually fires - Outbound idempotency from handlers to external APIs (Stripe, OpenAI, Sendgrid) — separate follow-up PRD
- Kafka-consumer dedup — belongs in the Events Pipeline PRD's scope, not edge
- Client-side key persistence across mobile app restarts — SDK concern, decided in Phase 8
- Operation-ID-based catch-all recovery route (
/api/v1/operations/{op_id}) — dropped in v0.6 of the PRD because Restate's attach-by-invocation-ID endpoint doesn't consult the idempotency-key namespace; captured as a future-phase Open Question in the PRD
Cross-References
- Edge Idempotency Key PRD — the v1.0 design in full
- Tenant & Project Lifecycle Roadmap — multi-tenant substrate this design builds on
- Error Handling guide — the
RpcErrorshape andis_terminalflag that determines what Restate caches - Restate — Invocations — Idempotency — upstream reference
- IETF draft-ietf-httpapi-idempotency-key-header-07 — the standard the edge design conforms to