Skip to main content

RequestContext Refactor Roadmap

End-to-end view of how identity, scope, and routing become three separate concerns instead of one overloaded caller_key string. This page consolidates the RequestContext Refactor PRD (v0.5, Draft) and its dependencies into a single reference. The design is complete; the 9-phase rollout has not started, and this work is sequenced before the Tenant & Project Lifecycle and Edge Idempotency rollouts because every gateway, actor, workflow, and Kafka consumer downstream of those plans relies on the typed-subject and protovalidate guarantees this refactor establishes.

Source PRDs

This page is derived from the RequestContext Refactor PRD and its closely related dependencies:

Primary (docs/prd/):

  • RequestContext Refactor — the v0.5 design, three-concern split, EventContext identity gap, protovalidate enforcement, 9-phase migration

Dependencies (docs/prd/multi-tenancy/):

  • 02 · Projects — adds RequestContext.project (field 8) and EventContext.project_id (field 12); routing keys move to 3-part tenant:project:resource constructed at SDK call sites, never carried as RequestContext fields
  • 04 · Identity — Zitadel issues the tenant_member:* typed subjects this refactor expects
  • 05 · Authorization — defines the typed-subject format (user:*, agent:*, tenant_member:*) and is the blocker for Phase 4 (storage cannot drop the CallerKey-override hack until it can authorize agent:conv-abc via OpenFGA)
  • 06 · API Keys — OQ #3 adds meta.tenant_member_id to every Unkey key so admin RPCs always carry a tenant_member:* subject

Related:

  • FILE_ID Resolution — runs in parallel; storage authorization changes align with the typed-subject model
  • MCP Tool Annotations — the MCP service is the only service that today validates tenant context manually (becomes obsolete after Phase 0)
  • Tenant & Project Lifecycle Roadmap — ships on top of this refactor's typed-subject and protovalidate guarantees
  • Edge Idempotency Roadmap — sibling roadmap; the namespace rewrite uses the same typed-subject format this PRD establishes
Architectural Direction — Validated Context, Propagated Once

The refactor implements a pattern every major platform that operates at scale has independently converged on: construct a canonical, structured identity context once at the edge, propagate it through every RPC, validate it at every hop, never let context-less requests reach a handler.

  • Netflix Passport — protobuf-encoded identity built at the Zuul gateway, integrity-protected, propagated to every backend. Replaced O(n) per-service token parsing with O(1) edge construction. Our RequestContext proto serves the same role.
  • Google BeyondProd — short-lived End-User Context (EUC) tickets validated at every hop alongside service-to-service mTLS. Missing or invalid context is a hard error — no fallback. Our [(buf.validate.field).required = true] on every ctx field is the same posture.
  • Google Zanzibar / SpiceDB / OpenFGA — typed subjects (user:alice not bare alice) prevent ID collisions across entity types and let the authorization layer validate at tuple-write time, not check time. Our subject = "type:id" convention maps directly.
  • CloudEvents subject attribute — the spec explicitly recommends: "If identity attributes happen to be part of the event data, the event producer SHOULD also add them to context attributes" — so routing layers can inspect identity without deserializing the payload. Our EventContext.subject (NEW field 7) closes the same async audit gap.
  • OWASP Multi-Tenant Cheat Sheet / NIST SP 800-207 — both mandate "establish tenant context early, derive from verified claims, reject requests lacking authenticated context." Today our protos have zero buf.validate annotations on context messages — that gap is what Phase 0 closes.

Canonical reference: RequestContext Refactor PRD. The PRD's Production Validation section catalogs 30+ citations spanning Netflix, Google, Uber, Shopify, Confluent, Segment, DoorDash, Lyft, Temporal, Restate, Azure, AWS, OWASP, NIST, and W3C.

Glossary

TermDefinition
RequestContextThe synchronous-RPC envelope. Proto field ctx on every request message. Carries identity, scope, tracing. Defined in apis/common/v1/context.proto.
EventContextThe asynchronous-Kafka envelope. Wraps every domain event with identity, correlation, and tenant scope. Built via pkg/restatex/event_context.go.
subjectTyped Zanzibar principal — type:id. Three permitted types: user:* (end user), tenant_member:* (Zitadel-authenticated team member), agent:* (AI agent — typically agent:{conversation_id}). Validated min_len = 1.
on_behalf_ofNEW field. Delegation chain. The original principal when subject is acting transitively (e.g., agent acting on a user's request → subject = "agent:conv-abc", on_behalf_of = "user:alice"). Empty for direct actions.
session_idNEW, optional. Client session identifier for cost attribution and analytics. From x-session-id header or derived from Firebase auth_time. Empty for server-to-server calls.
tenantHard tenant boundary. Every request has exactly one. Validated min_len = 1. (Field name is tenant_id on EventContext.)
trace_idOTel trace ID for infrastructure observability, extracted from W3C traceparent at the gateway. NOT the business correlation ID. Lifetime: ms–seconds. Sampled. New trace on Kafka consume.
correlation_idBusiness correlation, on EventContext only. MUST always be conversation_id, set explicitly via WithCorrelationID(). Lifetime: days–weeks. Survives Kafka. Never sampled.
caller_keyDEPRECATED. The single overloaded string this refactor replaces — meant different things at different layers (actor-routing key / conversation ID / user ID / Kafka partition key / tenant ID).
Typed Zanzibar subjecttype:id convention from Google Zanzibar. The namespace prefix prevents ID collisions across entity types and is what OpenFGA's [type, type#relation] syntax validates at tuple-write time.
ProtovalidateThe buf.validate plugin enforced inside every auto-generated *_restate_wrappers.go via w.Validator.Validate(req). The runtime is wired today; the rules don't exist yet.

The CallerKey Overload

RequestContext.caller_key is a single string field that means five different things depending on which layer reads it:

Layercaller_key valueWhat it actually means
Gateway → Actortenant:userIdRestate actor routing key + storage partitioning
Conversation → WorkflowconversationIdWhich conversation triggered the workflow (for event correlation)
Workflow → StorageuserId (re-overridden)Who owns the files being accessed
Workflow → KafkaconversationIdKafka partition key for event ordering
Admin operationstenantId aloneTenant-level scope, no user dimension

The override chain in production today (actors/conversation/v1/impl.go:413-425workflows/generation/v1/impl.go:1032-1036):

CallerKey override chain — six layers (Gateway, ConvManager, ConvActor, GenerationWorkflow, StorageGateway, Kafka) with caller_key value annotated at each hop and two override points highlighted in red where the field's meaning changes mid-flight

Each override is a symptom of one field carrying multiple concerns. Downstream services cannot tell what caller_key is without knowing which layer set it. The authorization PRD's typed-subject model has nowhere clean to live. And protovalidate has no annotations, so requests with an empty tenant or no context at all pass silently — the only manual check is if tenant == "" inside the MCP service.

Target State: One Field, One Concern

The refactor splits caller_key into purpose-specific fields. Each field carries exactly one concern at every layer.

message RequestContext {
// ─── Identity — WHO is acting ──────────────────────────────────
string subject = 1 [(buf.validate.field).string.min_len = 1]; // typed: "user:alice"
string on_behalf_of = 6; // NEW — delegation chain

// ─── Scope — WHERE the action is authorized ────────────────────
string tenant = 2 [(buf.validate.field).string.min_len = 1];

// ─── Tracing — HOW to correlate logs/traces ────────────────────
string trace_id = 3; // OTel, infra-only

// ─── Session — for cost attribution ────────────────────────────
string session_id = 7; // NEW — optional

// ─── Existing ──────────────────────────────────────────────────
map<string, string> metadata = 4;

// ─── Deprecated ────────────────────────────────────────────────
string caller_key = 5 [deprecated = true];
}

message EventContext {
// ─── Event identity ────────────────────────────────────────────
string event_name = 1 [(buf.validate.field).string.min_len = 1];
string version = 2 [(buf.validate.field).string.min_len = 1];
string event_id = 3 [(buf.validate.field).string.min_len = 1];

// ─── Correlation — ALWAYS conversation_id ──────────────────────
string correlation_id = 4 [(buf.validate.field).string.min_len = 1];
string emitted_at = 5 [(buf.validate.field).string.min_len = 1];

// ─── Identity (NEW) ────────────────────────────────────────────
string subject = 7; // NEW — flows from RequestContext
string on_behalf_of = 8; // NEW — delegation flows
string session_id = 9; // NEW — for metering

// ─── Scope ─────────────────────────────────────────────────────
string tenant_id = 11 [(buf.validate.field).string.min_len = 1];

// ─── Existing / Deprecated ─────────────────────────────────────
map<string, string> metadata = 10;
string caller_key = 6 [deprecated = true];
}

Routing keys move to SDK call sites. Restate routing was already passed at client construction — the refactor stops smuggling it through RequestContext:

// Routing key belongs in the SDK call, NOT in RequestContext
conv := convpb.NewConversationActorServiceClient(ctx, conversationKey)
storage := storagepb.NewStorageManagerActorServiceClient(ctx, storageKey)

The ID Hierarchy

The system has seven distinct identifiers serving different purposes at different scopes. Conflating any two of them was the root cause of the override chain.

Tenant ("socayo")                          ← tenant: hard boundary, every request
└─ User ("alice") ← subject: typed identity (user:alice)
└─ Session (app open → close) ← session_id: NEW, optional
└─ Conversation ("conv-abc") ← correlation_id: primary business correlation
└─ Run ("run-xyz") ← run_id: one workflow execution (top-level event field)
└─ Tool Call ← tool_call_id: one MCP invocation
└─ Request ← trace_id: OTel, one HTTP request fan-out

trace_id vs correlation_id — They Are Not the Same

trace_id (OTel)correlation_id (Business)
ScopeOne HTTP request fan-outEntire conversation (days/weeks)
LifetimeMilliseconds to secondsDays to weeks
Survives Kafka?No — new trace on consumeYes — carried in event payload
Sampled?Yes (head/tail sampling)Never
StandardW3C traceparent headerApp-defined
Purpose"Why was this request slow?""Show me everything for conversation X"

Current bug — correlation_id inconsistency. Nobody passes WithCorrelationID() to restatex.Publish() today. The fallback in pkg/restatex/kafka.go uses restate.Key(ctx), which returns run_id from a workflow and conversation_id from an actor — events from the same conversation end up with different correlation_id values. Phase 5 makes WithCorrelationID(conversationId) explicit at every call site and considers removing the fallback entirely (OQ #9).

Component Inventory

Touched by the Refactor

ComponentWhat changesPRD section
apis/common/v1/context.protoAdds on_behalf_of (RC field 6), session_id (RC field 7); adds subject, on_behalf_of, session_id to EventContext (fields 7–9); adds buf.validate rules; marks caller_key deprecated§Target State, §Phase 0–1
All *Request messagesAdd [(buf.validate.field).required = true] on ctx; standardize on ctx = 1 (some gateways carry ctx = 100 today)§Phase 0, §Phase 7
pkg/tenancy / new pkg/requestcontextCentralizes deriveRequestContext() — currently copy-pasted across 6+ gateways — into a single FromHeaders(headers) factory§Phase 2
6 gateways (gateway, storage-gateway, notification-gateway, webhook-gateway, integrations-gateway, apikey-gateway)Set typed subjects (subject = "user:{id}" or "tenant_member:{id}"); call the centralized factory; ctx = 100ctx = 1§Phase 2, §Phase 7
actors/conversation/v1/impl.goReplaces the CallerKey = conversationId override with subject = "agent:{conversationId}" + on_behalf_of = original subject§Phase 3
workflows/generation/v1/impl.goRemoves the storage CallerKey-override hack (line 1032); derives Kafka partition key from subject instead of caller_key; reads identity from subject for event correlation§Phase 4, §Phase 5
pkg/restatex/event_context.goBuildEventContext() propagates subject + on_behalf_of + session_id from RequestContext; drops callerKey parameter§Phase 5
actors/firebasebridge/v1/impl.goExtracts conversation ID from EventContext.subject instead of caller_key§Phase 6

Existing Surfaces with Missing or Inconsistent Context

Proto fileGapSeverity
apis/auth/v1/services/openfga/openfga.protoAll 13 RPCs missing ctxCritical — authorization service has no caller scope
apis/notification/v1/services/gateway/gateway.protoAll 55+ RPCs missing ctxCritical — notification ops have no tenant scope
apis/github/v1/github.protoAll RPCs missing ctxHigh
apis/rstt/v1/services/admin/admin.proto, introspection.proto, deployment.protoMissing ctx on Restate-admin pathsHigh
apis/llm/v1/workflows/generation/generation.protoGetStateRequest missing ctxMedium
apis/pipecat/v1/services/cerebrium/cerebrium.protoInvokeRequest missing ctxMedium
All gateway protosctx = 100 instead of ctx = 1 (notification inner services use field name context, not ctx)Pre-user; mechanical fix
services/gateway/v1/impl.go:387Real bug: ListPendingApprovals derives reqCtx but doesn't pass it on the downstream callIndependent of refactor; fix in passing

Integrated (No Changes Required)

ComponentRole
buf.build/bufbuild/protovalidateAlready imported across 20+ services. Wrappers already call Validator.Validate(req) — Phase 0 just adds the rules to enforce.
ZitadelIssues the tenant_member:* typed subjects this refactor consumes (no changes — Identity PRD already accepted).
buf.build/bufbuild/protovalidate-goRuntime validator, no changes.
Restate Go SDKRouting keys flow through SDK constructors, where they belong.

Flow 1: External Request — Typed Subject from the Edge

The canonical happy path. KrakenD authenticates, derives the typed subject, and the same RequestContext flows unmodified through every hop.

Client  POST /api/v1/llm/gateway/send-message
Authorization: Bearer <Zitadel JWT> (or pk_live_xxx)
x-session-id: <client UUID> (optional)



KrakenD auth plugin
Validates token → injects x-tenant-id, x-user-id, x-session-id
(For sk_/pk_ paths: also resolves meta.tenant_member_id from Unkey)



LLMGateway.SendMessage
pkg/tenancy.FromHeaders(headers) builds RequestContext:
subject = "user:firebase_uid_123" ← typed
tenant = "socayo" ← validated min_len=1
on_behalf_of = ""
session_id = "<client UUID or auth_time>"
trace_id = <from W3C traceparent>
Routes to ConversationActor via SDK:
convpb.NewConversationActorServiceClient(ctx, "socayo:conv-abc")



ConversationActor → MCPService → … → Workflow → Storage
Same RequestContext at every hop. Each wrapper calls
Validator.Validate(req) before invoking the handler — empty
tenant or empty subject = TerminalError(400, "ctx.tenant must be at least 1 character").

Why at the gateway, not the handler. The MCP service is the only service today that manually enforces tenant != "". Every other service trusts whatever it gets — a bug that violates OWASP's "establish tenant context early" mandate. Phase 0 makes the check automatic at every wrapper, not optional in one service.

Flow 2: Conversation Acting on Behalf of User

The override-elimination flow. The conversation is the principal for the workflow's actions; the original user is preserved in on_behalf_of.

Typed subject flow — Gateway, ConvManager, ConvActor, Workflow, Storage in a horizontal chain with subject/on_behalf_of/tenant snapshots below each. ConvActor sets subject=agent and on_behalf_of=user on the way out (single transformation). Workflow publishes to Kafka with the same identity intact

ConversationActor (handles incoming message from user:alice)
Receives RequestContext { subject: "user:alice", tenant: "socayo", … }



ConversationActor → GenerationWorkflow.Run
Constructs new RequestContext for the workflow:
subject = "agent:conv-abc" ← conversation IS the principal
tenant = "socayo" ← preserved
on_behalf_of = "user:alice" ← user preserved in delegation chain
(no caller_key override — that field is dead)
Routes to workflow via SDK: wfpb.NewGenerationWorkflowServiceClient(ctx, runId)



GenerationWorkflow → StorageGateway (e.g., fetch frame)
Passes the SAME RequestContext through. No re-override.
Storage authorizes via OpenFGA:
Check(agent:conv-abc, can_view, file:xyz)
(OpenFGA enforcement is the Phase 4 blocker — until storage can
authorize "agent:*" subjects, the override hack stays.)



GenerationWorkflow → Kafka publish
EventContext built from RequestContext via BuildEventContext():
subject = "agent:conv-abc"
on_behalf_of = "user:alice"
tenant_id = "socayo"
correlation_id = "conv-abc" ← ALWAYS conversation_id, explicit
Kafka partition key:
tenancy.ResourceID(subject) → "conv-abc"
(Derived from subject — no longer reads caller_key.)

Where audit comes from. With both subject and on_behalf_of on the event envelope, "show every action conv-abc took for user:alice in the last 24h" is a single SQL filter on consumed events. Today, that question is unanswerable — only caller_key = "conv-abc" survives, and the user identity is gone.

Flow 3: Async Identity in Events

BuildEventContext() after the refactor:

func BuildEventContext(ctx sdkgo.Context, eventName, correlationID string,
requestCtx *commonv1.RequestContext) *commonv1.EventContext {
ec := &commonv1.EventContext{
EventName: eventName,
CorrelationId: correlationID, // MUST be conversation_id, always explicit
EventId: uuid.NewString(),
EmittedAt: time.Now().UTC().Format(time.RFC3339),
}
if requestCtx != nil {
ec.TenantId = requestCtx.GetTenant()
ec.Subject = requestCtx.GetSubject() // identity flows
ec.OnBehalfOf = requestCtx.GetOnBehalfOf() // delegation flows
ec.SessionId = requestCtx.GetSessionId() // session flows
}
return ec
}

Three things change for downstream consumers:

  1. FirebaseBridge extracts the conversation ID from EventContext.subject ("agent:conv-abc""conv-abc") instead of from the overloaded caller_key. Same data, unambiguous source.
  2. Webhook routing (Convoy) can authorize event delivery against the subject — today it sees only the tenant.
  3. Audit pipelines can trace an event back to a typed principal without joining against any other store.

This is the CloudEvents subject pattern: "If identity attributes happen to be part of the event data, the event producer SHOULD also add them to context attributes" — so routing layers can inspect them without deserializing the payload.

Protovalidate Enforcement

Today, zero buf.validate annotations exist on RequestContext or EventContext. The runtime is fully wired — every *_restate_wrappers.go calls w.Validator.Validate(req) before the handler — but there are no rules to enforce. A request with an empty tenant or no ctx at all passes silently.

Enforcement is two layers:

Layer 1 — Field rules on context messages (apis/common/v1/context.proto):

import "buf/validate/validate.proto";

message RequestContext {
string subject = 1 [(buf.validate.field).string.min_len = 1];
string tenant = 2 [(buf.validate.field).string.min_len = 1];
// ...
}

Layer 2 — Required ctx on every request message:

message SendMessageRequest {
.common.v1.RequestContext ctx = 1 [(buf.validate.field).required = true];
// domain fields start at 2
}

Both layers in place, the wrapper rejects:

Bad inputWrapper response
Missing ctx entirely400 — "ctx is required"
Empty tenant400 — "ctx.tenant must be at least 1 character"
Empty subject400 — "ctx.subject must be at least 1 character"
EventContext with empty tenant_id400 — "tenant_id must be at least 1 character"

No Go code changes — the existing Validator.Validate(req) call handles it.

Risk. Phase 0 may surface latent bugs where today's services send empty context. Run the full integration suite before merging. OQ #6 is open on whether to roll Phase 0 out gradually (one domain at a time) or all at once; the PRD recommends all at once with integration test coverage.

Proto Audit & ctx = 1 Standardization

Two conventions exist today: gateways use ctx = 100, inner services use ctx = 1. The refactor standardizes on ctx = 1 everywhere (OQ #5 resolved). Context is the primary input every RPC validates first; it belongs at field 1, not in the system-fields gutter at 100.

DomainTodayAfter
services/gatewayctx = 100ctx = 1 — renumber
services/storage-gatewayctx = 100ctx = 1 — renumber
services/notification-gatewayctx = 100ctx = 1 — renumber
services/webhook-gatewayctx = 100ctx = 1 — renumber
services/integrations-gatewayctx = 100ctx = 1 — renumber
services/apikey-gatewayctx = 100ctx = 1 — renumber
notification (inner)context = 1ctx = 1 — rename to match convention
notification/platform_servicecontext = 1ctx = 1 — rename
storage, webhook, schema, conversation actorctx = 1unchanged

Single proto-only PR per service, or one combined PR — no runtime risk because there are no users yet. Renumbering happens in Phase 7, bundled with adding ctx = 1 to RPCs that lack it entirely.

Independent bug. services/gateway/v1/impl.go:387 derives reqCtx on line 377 but does not pass Ctx: reqCtx into ListPendingApprovals. Every other call in the same handler does. Fix in passing — it's a missing-call bug, unrelated to the field-number work.

Rollout Phases

The 9-phase plan from the PRD. Each phase is mostly mechanical — proto edits, one-shot Go renames, regenerate. No new infrastructure to provision.

PhaseScopeStatus
0. Protovalidate enforcementAdd buf.validate annotations to RequestContext + EventContext; add [(buf.validate.field).required = true] on every *Request.ctx. Purely additive — wrappers already call Validate(req).Not started
1. Add new fieldsRequestContext.on_behalf_of (6) and session_id (7); EventContext.subject (7), on_behalf_of (8), and session_id (9). No consumers yet — zero risk. The session_id generation strategy is OQ #8 (deferred to product); the proto field lands in Phase 1 either way and pass-throughs whatever KrakenD injects.Not started
2. Gateways set typed subjectsAll 6 gateways set subject = "user:{id}" or "tenant_member:{id}"; centralize copy-pasted deriveRequestContext() into pkg/tenancy.FromHeaders().Not started
3. ConversationActor sets agent subjectReplace CallerKey = conversationId override with subject = "agent:{convId}" + on_behalf_of = original subject.Not started
4. Eliminate workflow CallerKey overridesRemove the storage-gateway CallerKey hack (workflows/generation/v1/impl.go:1032). Blocker: OpenFGA enforcement (tenant isolation Layer 4) must accept agent:* subjects first.Not started — blocked on Authorization PRD
5. Update event publishing + BuildEventContextsubject, on_behalf_of, session_id flow into EventContext; drop callerKey param; derive Kafka partition key from subject via tenancy.ResourceID().Not started
6. FirebaseBridgeExtract conversation ID from EventContext.subject instead of caller_key.Not started
7. Standardize ctx = 1 everywhereRenumber gateway protos (ctx = 100ctx = 1); rename notification's context to ctx; add ctx = 1 to RPCs missing it (openfga, github, rstt admin/introspection/deployment, cerebrium, generation GetStateRequest). Proto-only PRs.Not started
8. Mark caller_key deprecated, removeMark deprecated in both protos; remove from pkg/tenancy/keys.go (BuildActorKey stays — used for SDK client construction); remove gateway references; remove from MCP generated schemas; remove callerKey parameter from BuildEventContext().Not started

Dependency ordering

Phase dependency graph — 9 phases color-coded by category. Phases 0/1/2 are independent roots. Phase 4 is blocked on the external Authorization PRD (red dashed arrow). Phase 8 (caller_key removal) converges from all other phases. Phase 7 depends on Phase 0; Phase 6 depends on Phase 5; Phases 3/4/5 depend on Phase 1

Phase depends onReason
Phase 4 depends on Authorization PRD (OpenFGA)Storage cannot drop the CallerKey-override hack until OpenFGA can authorize agent:* subjects via relationship checks
Phase 3 depends on Phase 1ConversationActor needs on_behalf_of to exist before it can populate it
Phase 5 depends on Phase 1EventContext needs the new identity fields before publishing wires them up
Phase 6 depends on Phase 5FirebaseBridge consumes what Phase 5 starts publishing
Phase 7 depends on Phase 0Renumber after enforcement is in place — fewer moving parts
Phase 8 depends on Phases 2–7Cannot remove caller_key until every consumer has migrated off it
Phases 0–2 can run in parallelPhase 0 is additive; Phase 1 adds unused fields; Phase 2 is gateway-by-gateway

Open Questions

#QuestionStatus
1Should on_behalf_of support multi-level delegation chains (agent → agent → user)?Open — single-level sufficient for now; revisit if Travila staff start acting on a tenant's agent: subjects
2Should EventContext.caller_key follow the same refactor?Resolved — yes, subject + on_behalf_of (fields 7–8) added; caller_key deprecated alongside RequestContext
3How do notification-gateway admin operations work with typed subjects? Today admin mode sets CallerKey = tenantId with no user.Proposed (revisit at implementation) — every Unkey key gains meta.tenant_member_id; KrakenD resolves it into x-user-id; admin RPCs always carry subject = "tenant_member:{x-user-id}"; no service_account: type. Cross-PRD dependency on API Keys PRD.
4Should tenancy.ResourceID() parse typed subjects ("agent:conv-abc""conv-abc")?Likely yes — simple prefix strip; needed by Phase 5 Kafka-key derivation
5Should inner services standardize on ctx = 100, or is ctx = 1 acceptable for non-gateway services?Resolved — standardize on ctx = 1 everywhere; aligns with the Zero Trust thesis (context is the primary input, belongs at field 1)
6Phase 0 may surface latent bugs in services that send empty context. Roll out gradually or all at once?Open — recommend all at once with integration test coverage
7Should EventContext.subject be validated as required (min_len = 1), or optional during migration?Open — recommend optional initially, required after Phase 5 ships
8How should session_id be generated?Deferred to product — not a merge blocker. Proto contract supports any of three implementations (client-generated header, server-derived from auth_time, server-generated + Redis); engineering proceeds with session_id as an optional unvalidated field. Final strategy is a product call driven by Travila's billing model and SDK story.
9Should the Publish() fallback for correlation_id (using restate.Key(ctx)) be removed entirely, or kept as a safety net?Open — recommend removing to force explicit WithCorrelationID() and surface bugs early

Out of Scope for v0.5

  • Multi-level delegation chains (agent → agent → user). Single-level only for now.
  • service_account:* typed subject. OQ #3 deliberately does not introduce one — every action is attributable to a real tenant_member:* via the API Keys PRD's meta.tenant_member_id requirement.
  • Backfill plan for legacy Unkey keys without meta.tenant_member_id. Captured in OQ #3 sub-questions; depends on production key inventory at refactor time.
  • RequestContext.project and EventContext.project_id. Owned by the Projects PRD — fields 8 and 12 respectively. Can ship in parallel from Phase 0 once this refactor's protovalidate posture is in place.
  • OTel baggage egress stripping. Not a refactor blocker, but pkg/restatex should strip non-traceparent OTel headers at network egress per W3C Baggage's "visible to anyone inspecting network traffic" warning.

Cross-References