RequestContext Refactor Roadmap

End-to-end view of how identity, scope, and routing become three separate concerns instead of one overloaded caller_key string. This page consolidates the RequestContext Refactor PRD (v0.5, Draft) and its dependencies into a single reference. The design is complete; the 9-phase rollout has not started, and this work is sequenced before the Tenant & Project Lifecycle and Edge Idempotency rollouts because every gateway, actor, workflow, and Kafka consumer downstream of those plans relies on the typed-subject and protovalidate guarantees this refactor establishes.

Source PRDs

This page is derived from the RequestContext Refactor PRD and its closely related dependencies:

Primary (docs/prd/):

RequestContext Refactor — the v0.5 design, three-concern split, EventContext identity gap, protovalidate enforcement, 9-phase migration

Dependencies (docs/prd/multi-tenancy/):

02 · Projects — adds RequestContext.project (field 8) and EventContext.project_id (field 12); routing keys move to 3-part tenant:project:resource constructed at SDK call sites, never carried as RequestContext fields
04 · Identity — Zitadel issues the tenant_member:* typed subjects this refactor expects
05 · Authorization — defines the typed-subject format (user:*, agent:*, tenant_member:*) and is the blocker for Phase 4 (storage cannot drop the CallerKey-override hack until it can authorize agent:conv-abc via OpenFGA)
06 · API Keys — OQ #3 adds meta.tenant_member_id to every Unkey key so admin RPCs always carry a tenant_member:* subject

Related:

FILE_ID Resolution — runs in parallel; storage authorization changes align with the typed-subject model
MCP Tool Annotations — the MCP service is the only service that today validates tenant context manually (becomes obsolete after Phase 0)
Tenant & Project Lifecycle Roadmap — ships on top of this refactor's typed-subject and protovalidate guarantees
Edge Idempotency Roadmap — sibling roadmap; the namespace rewrite uses the same typed-subject format this PRD establishes

Architectural Direction — Validated Context, Propagated Once

The refactor implements a pattern every major platform that operates at scale has independently converged on: construct a canonical, structured identity context once at the edge, propagate it through every RPC, validate it at every hop, never let context-less requests reach a handler.

Netflix Passport — protobuf-encoded identity built at the Zuul gateway, integrity-protected, propagated to every backend. Replaced O(n) per-service token parsing with O(1) edge construction. Our RequestContext proto serves the same role.
Google BeyondProd — short-lived End-User Context (EUC) tickets validated at every hop alongside service-to-service mTLS. Missing or invalid context is a hard error — no fallback. Our [(buf.validate.field).required = true] on every ctx field is the same posture.
Google Zanzibar / SpiceDB / OpenFGA — typed subjects (user:alice not bare alice) prevent ID collisions across entity types and let the authorization layer validate at tuple-write time, not check time. Our subject = "type:id" convention maps directly.
CloudEvents subject attribute — the spec explicitly recommends: "If identity attributes happen to be part of the event data, the event producer SHOULD also add them to context attributes" — so routing layers can inspect identity without deserializing the payload. Our EventContext.subject (NEW field 7) closes the same async audit gap.
OWASP Multi-Tenant Cheat Sheet / NIST SP 800-207 — both mandate "establish tenant context early, derive from verified claims, reject requests lacking authenticated context." Today our protos have zero buf.validate annotations on context messages — that gap is what Phase 0 closes.

Canonical reference: RequestContext Refactor PRD. The PRD's Production Validation section catalogs 30+ citations spanning Netflix, Google, Uber, Shopify, Confluent, Segment, DoorDash, Lyft, Temporal, Restate, Azure, AWS, OWASP, NIST, and W3C.

Glossary

Term	Definition
`RequestContext`	The synchronous-RPC envelope. Proto field `ctx` on every request message. Carries identity, scope, tracing. Defined in `apis/common/v1/context.proto`.
`EventContext`	The asynchronous-Kafka envelope. Wraps every domain event with identity, correlation, and tenant scope. Built via `pkg/restatex/event_context.go`.
`subject`	Typed Zanzibar principal — `type:id`. Three permitted types: `user:` (end user), `tenant_member:` (Zitadel-authenticated team member), `agent:*` (AI agent — typically `agent:{conversation_id}`). Validated `min_len = 1`.
`on_behalf_of`	NEW field. Delegation chain. The original principal when `subject` is acting transitively (e.g., agent acting on a user's request → `subject = "agent:conv-abc"`, `on_behalf_of = "user:alice"`). Empty for direct actions.
`session_id`	NEW, optional. Client session identifier for cost attribution and analytics. From `x-session-id` header or derived from Firebase `auth_time`. Empty for server-to-server calls.
`tenant`	Hard tenant boundary. Every request has exactly one. Validated `min_len = 1`. (Field name is `tenant_id` on `EventContext`.)
`trace_id`	OTel trace ID for infrastructure observability, extracted from W3C `traceparent` at the gateway. NOT the business correlation ID. Lifetime: ms–seconds. Sampled. New trace on Kafka consume.
`correlation_id`	Business correlation, on `EventContext` only. MUST always be `conversation_id`, set explicitly via `WithCorrelationID()`. Lifetime: days–weeks. Survives Kafka. Never sampled.
`caller_key`	DEPRECATED. The single overloaded string this refactor replaces — meant different things at different layers (actor-routing key / conversation ID / user ID / Kafka partition key / tenant ID).
Typed Zanzibar subject	`type:id` convention from Google Zanzibar. The namespace prefix prevents ID collisions across entity types and is what OpenFGA's `[type, type#relation]` syntax validates at tuple-write time.
Protovalidate	The `buf.validate` plugin enforced inside every auto-generated `*_restate_wrappers.go` via `w.Validator.Validate(req)`. The runtime is wired today; the rules don't exist yet.

The CallerKey Overload

RequestContext.caller_key is a single string field that means five different things depending on which layer reads it:

Layer	`caller_key` value	What it actually means
Gateway → Actor	`tenant:userId`	Restate actor routing key + storage partitioning
Conversation → Workflow	`conversationId`	Which conversation triggered the workflow (for event correlation)
Workflow → Storage	`userId` (re-overridden)	Who owns the files being accessed
Workflow → Kafka	`conversationId`	Kafka partition key for event ordering
Admin operations	`tenantId` alone	Tenant-level scope, no user dimension

The override chain in production today (actors/conversation/v1/impl.go:413-425 → workflows/generation/v1/impl.go:1032-1036):

CallerKey override chain — six layers (Gateway, ConvManager, ConvActor, GenerationWorkflow, StorageGateway, Kafka) with caller_key value annotated at each hop and two override points highlighted in red where the field's meaning changes mid-flight

Each override is a symptom of one field carrying multiple concerns. Downstream services cannot tell what caller_key is without knowing which layer set it. The authorization PRD's typed-subject model has nowhere clean to live. And protovalidate has no annotations, so requests with an empty tenant or no context at all pass silently — the only manual check is if tenant == "" inside the MCP service.

Target State: One Field, One Concern

The refactor splits caller_key into purpose-specific fields. Each field carries exactly one concern at every layer.

message RequestContext {
    // ─── Identity — WHO is acting ──────────────────────────────────
    string subject = 1 [(buf.validate.field).string.min_len = 1];     // typed: "user:alice"
    string on_behalf_of = 6;                                          // NEW — delegation chain

    // ─── Scope — WHERE the action is authorized ────────────────────
    string tenant = 2 [(buf.validate.field).string.min_len = 1];

    // ─── Tracing — HOW to correlate logs/traces ────────────────────
    string trace_id = 3;                                              // OTel, infra-only

    // ─── Session — for cost attribution ────────────────────────────
    string session_id = 7;                                            // NEW — optional

    // ─── Existing ──────────────────────────────────────────────────
    map<string, string> metadata = 4;

    // ─── Deprecated ────────────────────────────────────────────────
    string caller_key = 5 [deprecated = true];
}

message EventContext {
    // ─── Event identity ────────────────────────────────────────────
    string event_name = 1     [(buf.validate.field).string.min_len = 1];
    string version = 2        [(buf.validate.field).string.min_len = 1];
    string event_id = 3       [(buf.validate.field).string.min_len = 1];

    // ─── Correlation — ALWAYS conversation_id ──────────────────────
    string correlation_id = 4 [(buf.validate.field).string.min_len = 1];
    string emitted_at = 5     [(buf.validate.field).string.min_len = 1];

    // ─── Identity (NEW) ────────────────────────────────────────────
    string subject = 7;          // NEW — flows from RequestContext
    string on_behalf_of = 8;     // NEW — delegation flows
    string session_id = 9;       // NEW — for metering

    // ─── Scope ─────────────────────────────────────────────────────
    string tenant_id = 11     [(buf.validate.field).string.min_len = 1];

    // ─── Existing / Deprecated ─────────────────────────────────────
    map<string, string> metadata = 10;
    string caller_key = 6 [deprecated = true];
}

Routing keys move to SDK call sites. Restate routing was already passed at client construction — the refactor stops smuggling it through RequestContext:

// Routing key belongs in the SDK call, NOT in RequestContext
conv := convpb.NewConversationActorServiceClient(ctx, conversationKey)
storage := storagepb.NewStorageManagerActorServiceClient(ctx, storageKey)

The ID Hierarchy

The system has seven distinct identifiers serving different purposes at different scopes. Conflating any two of them was the root cause of the override chain.

Tenant ("socayo")                          ← tenant: hard boundary, every request
  └─ User ("alice")                        ← subject: typed identity (user:alice)
       └─ Session (app open → close)       ← session_id: NEW, optional
            └─ Conversation ("conv-abc")   ← correlation_id: primary business correlation
                 └─ Run ("run-xyz")        ← run_id: one workflow execution (top-level event field)
                      └─ Tool Call         ← tool_call_id: one MCP invocation
                           └─ Request      ← trace_id: OTel, one HTTP request fan-out

`trace_id` vs `correlation_id` — They Are Not the Same

	`trace_id` (OTel)	`correlation_id` (Business)
Scope	One HTTP request fan-out	Entire conversation (days/weeks)
Lifetime	Milliseconds to seconds	Days to weeks
Survives Kafka?	No — new trace on consume	Yes — carried in event payload
Sampled?	Yes (head/tail sampling)	Never
Standard	W3C `traceparent` header	App-defined
Purpose	"Why was this request slow?"	"Show me everything for conversation X"

Current bug — correlation_id inconsistency. Nobody passes WithCorrelationID() to restatex.Publish() today. The fallback in pkg/restatex/kafka.go uses restate.Key(ctx), which returns run_id from a workflow and conversation_id from an actor — events from the same conversation end up with different correlation_id values. Phase 5 makes WithCorrelationID(conversationId) explicit at every call site and considers removing the fallback entirely (OQ #9).

Component Inventory

Touched by the Refactor

Component	What changes	PRD section
`apis/common/v1/context.proto`	Adds `on_behalf_of` (RC field 6), `session_id` (RC field 7); adds `subject`, `on_behalf_of`, `session_id` to EventContext (fields 7–9); adds `buf.validate` rules; marks `caller_key` deprecated	§Target State, §Phase 0–1
*All `Request` messages**	Add `[(buf.validate.field).required = true]` on `ctx`; standardize on `ctx = 1` (some gateways carry `ctx = 100` today)	§Phase 0, §Phase 7
`pkg/tenancy` / new `pkg/requestcontext`	Centralizes `deriveRequestContext()` — currently copy-pasted across 6+ gateways — into a single `FromHeaders(headers)` factory	§Phase 2
6 gateways (`gateway`, `storage-gateway`, `notification-gateway`, `webhook-gateway`, `integrations-gateway`, `apikey-gateway`)	Set typed subjects (`subject = "user:{id}"` or `"tenant_member:{id}"`); call the centralized factory; `ctx = 100` → `ctx = 1`	§Phase 2, §Phase 7
`actors/conversation/v1/impl.go`	Replaces the `CallerKey = conversationId` override with `subject = "agent:{conversationId}"` + `on_behalf_of = original subject`	§Phase 3
`workflows/generation/v1/impl.go`	Removes the storage CallerKey-override hack (line 1032); derives Kafka partition key from `subject` instead of `caller_key`; reads identity from `subject` for event correlation	§Phase 4, §Phase 5
`pkg/restatex/event_context.go`	`BuildEventContext()` propagates `subject` + `on_behalf_of` + `session_id` from RequestContext; drops `callerKey` parameter	§Phase 5
`actors/firebasebridge/v1/impl.go`	Extracts conversation ID from `EventContext.subject` instead of `caller_key`	§Phase 6

Existing Surfaces with Missing or Inconsistent Context

Proto file	Gap	Severity
`apis/auth/v1/services/openfga/openfga.proto`	All 13 RPCs missing `ctx`	Critical — authorization service has no caller scope
`apis/notification/v1/services/gateway/gateway.proto`	All 55+ RPCs missing `ctx`	Critical — notification ops have no tenant scope
`apis/github/v1/github.proto`	All RPCs missing `ctx`	High
`apis/rstt/v1/services/admin/admin.proto`, `introspection.proto`, `deployment.proto`	Missing `ctx` on Restate-admin paths	High
`apis/llm/v1/workflows/generation/generation.proto`	`GetStateRequest` missing `ctx`	Medium
`apis/pipecat/v1/services/cerebrium/cerebrium.proto`	`InvokeRequest` missing `ctx`	Medium
All gateway protos	`ctx = 100` instead of `ctx = 1` (notification inner services use field name `context`, not `ctx`)	Pre-user; mechanical fix
`services/gateway/v1/impl.go:387`	Real bug: `ListPendingApprovals` derives `reqCtx` but doesn't pass it on the downstream call	Independent of refactor; fix in passing

Integrated (No Changes Required)

Component	Role
`buf.build/bufbuild/protovalidate`	Already imported across 20+ services. Wrappers already call `Validator.Validate(req)` — Phase 0 just adds the rules to enforce.
Zitadel	Issues the `tenant_member:*` typed subjects this refactor consumes (no changes — Identity PRD already accepted).
`buf.build/bufbuild/protovalidate-go`	Runtime validator, no changes.
Restate Go SDK	Routing keys flow through SDK constructors, where they belong.

Flow 1: External Request — Typed Subject from the Edge

The canonical happy path. KrakenD authenticates, derives the typed subject, and the same RequestContext flows unmodified through every hop.

Client  POST /api/v1/llm/gateway/send-message
        Authorization: Bearer <Zitadel JWT>     (or pk_live_xxx)
        x-session-id: <client UUID>              (optional)

  ▼

KrakenD auth plugin
  Validates token → injects x-tenant-id, x-user-id, x-session-id
  (For sk_/pk_ paths: also resolves meta.tenant_member_id from Unkey)

  ▼

LLMGateway.SendMessage
  pkg/tenancy.FromHeaders(headers) builds RequestContext:
    subject      = "user:firebase_uid_123"        ← typed
    tenant       = "socayo"                        ← validated min_len=1
    on_behalf_of = ""
    session_id   = "<client UUID or auth_time>"
    trace_id     = <from W3C traceparent>
  Routes to ConversationActor via SDK:
    convpb.NewConversationActorServiceClient(ctx, "socayo:conv-abc")

  ▼

ConversationActor → MCPService → … → Workflow → Storage
  Same RequestContext at every hop. Each wrapper calls
  Validator.Validate(req) before invoking the handler — empty
  tenant or empty subject = TerminalError(400, "ctx.tenant must be at least 1 character").

Why at the gateway, not the handler. The MCP service is the only service today that manually enforces tenant != "". Every other service trusts whatever it gets — a bug that violates OWASP's "establish tenant context early" mandate. Phase 0 makes the check automatic at every wrapper, not optional in one service.

Flow 2: Conversation Acting on Behalf of User

The override-elimination flow. The conversation is the principal for the workflow's actions; the original user is preserved in on_behalf_of.

Typed subject flow — Gateway, ConvManager, ConvActor, Workflow, Storage in a horizontal chain with subject/on_behalf_of/tenant snapshots below each. ConvActor sets subject=agent and on_behalf_of=user on the way out (single transformation). Workflow publishes to Kafka with the same identity intact

ConversationActor (handles incoming message from user:alice)
  Receives RequestContext { subject: "user:alice", tenant: "socayo", … }

  ▼

ConversationActor → GenerationWorkflow.Run
  Constructs new RequestContext for the workflow:
    subject      = "agent:conv-abc"      ← conversation IS the principal
    tenant       = "socayo"               ← preserved
    on_behalf_of = "user:alice"           ← user preserved in delegation chain
    (no caller_key override — that field is dead)
  Routes to workflow via SDK: wfpb.NewGenerationWorkflowServiceClient(ctx, runId)

  ▼

GenerationWorkflow → StorageGateway (e.g., fetch frame)
  Passes the SAME RequestContext through. No re-override.
  Storage authorizes via OpenFGA:
    Check(agent:conv-abc, can_view, file:xyz)
  (OpenFGA enforcement is the Phase 4 blocker — until storage can
   authorize "agent:*" subjects, the override hack stays.)

  ▼

GenerationWorkflow → Kafka publish
  EventContext built from RequestContext via BuildEventContext():
    subject      = "agent:conv-abc"
    on_behalf_of = "user:alice"
    tenant_id    = "socayo"
    correlation_id = "conv-abc"           ← ALWAYS conversation_id, explicit
  Kafka partition key:
    tenancy.ResourceID(subject) → "conv-abc"
  (Derived from subject — no longer reads caller_key.)

Where audit comes from. With both subject and on_behalf_of on the event envelope, "show every action conv-abc took for user:alice in the last 24h" is a single SQL filter on consumed events. Today, that question is unanswerable — only caller_key = "conv-abc" survives, and the user identity is gone.

Flow 3: Async Identity in Events

BuildEventContext() after the refactor:

func BuildEventContext(ctx sdkgo.Context, eventName, correlationID string,
    requestCtx *commonv1.RequestContext) *commonv1.EventContext {
    ec := &commonv1.EventContext{
        EventName:     eventName,
        CorrelationId: correlationID,    // MUST be conversation_id, always explicit
        EventId:       uuid.NewString(),
        EmittedAt:     time.Now().UTC().Format(time.RFC3339),
    }
    if requestCtx != nil {
        ec.TenantId    = requestCtx.GetTenant()
        ec.Subject     = requestCtx.GetSubject()      // identity flows
        ec.OnBehalfOf  = requestCtx.GetOnBehalfOf()   // delegation flows
        ec.SessionId   = requestCtx.GetSessionId()    // session flows
    }
    return ec
}

Three things change for downstream consumers:

FirebaseBridge extracts the conversation ID from EventContext.subject ("agent:conv-abc" → "conv-abc") instead of from the overloaded caller_key. Same data, unambiguous source.
Webhook routing (Convoy) can authorize event delivery against the subject — today it sees only the tenant.
Audit pipelines can trace an event back to a typed principal without joining against any other store.

This is the CloudEvents subject pattern: "If identity attributes happen to be part of the event data, the event producer SHOULD also add them to context attributes" — so routing layers can inspect them without deserializing the payload.

Protovalidate Enforcement

Today, zero buf.validate annotations exist on RequestContext or EventContext. The runtime is fully wired — every *_restate_wrappers.go calls w.Validator.Validate(req) before the handler — but there are no rules to enforce. A request with an empty tenant or no ctx at all passes silently.

Enforcement is two layers:

Layer 1 — Field rules on context messages (apis/common/v1/context.proto):

import "buf/validate/validate.proto";

message RequestContext {
    string subject = 1 [(buf.validate.field).string.min_len = 1];
    string tenant  = 2 [(buf.validate.field).string.min_len = 1];
    // ...
}

Layer 2 — Required ctx on every request message:

message SendMessageRequest {
    .common.v1.RequestContext ctx = 1 [(buf.validate.field).required = true];
    // domain fields start at 2
}

Both layers in place, the wrapper rejects:

Bad input	Wrapper response
Missing `ctx` entirely	`400 — "ctx is required"`
Empty `tenant`	`400 — "ctx.tenant must be at least 1 character"`
Empty `subject`	`400 — "ctx.subject must be at least 1 character"`
`EventContext` with empty `tenant_id`	`400 — "tenant_id must be at least 1 character"`

No Go code changes — the existing Validator.Validate(req) call handles it.

Risk. Phase 0 may surface latent bugs where today's services send empty context. Run the full integration suite before merging. OQ #6 is open on whether to roll Phase 0 out gradually (one domain at a time) or all at once; the PRD recommends all at once with integration test coverage.

Proto Audit & `ctx = 1` Standardization

Two conventions exist today: gateways use ctx = 100, inner services use ctx = 1. The refactor standardizes on ctx = 1 everywhere (OQ #5 resolved). Context is the primary input every RPC validates first; it belongs at field 1, not in the system-fields gutter at 100.

Domain	Today	After
`services/gateway`	`ctx = 100`	`ctx = 1` — renumber
`services/storage-gateway`	`ctx = 100`	`ctx = 1` — renumber
`services/notification-gateway`	`ctx = 100`	`ctx = 1` — renumber
`services/webhook-gateway`	`ctx = 100`	`ctx = 1` — renumber
`services/integrations-gateway`	`ctx = 100`	`ctx = 1` — renumber
`services/apikey-gateway`	`ctx = 100`	`ctx = 1` — renumber
`notification` (inner)	`context = 1`	`ctx = 1` — rename to match convention
`notification/platform_service`	`context = 1`	`ctx = 1` — rename
`storage`, `webhook`, `schema`, `conversation` actor	`ctx = 1` ✓	unchanged

Single proto-only PR per service, or one combined PR — no runtime risk because there are no users yet. Renumbering happens in Phase 7, bundled with adding ctx = 1 to RPCs that lack it entirely.

Independent bug. services/gateway/v1/impl.go:387 derives reqCtx on line 377 but does not pass Ctx: reqCtx into ListPendingApprovals. Every other call in the same handler does. Fix in passing — it's a missing-call bug, unrelated to the field-number work.

Rollout Phases

The 9-phase plan from the PRD. Each phase is mostly mechanical — proto edits, one-shot Go renames, regenerate. No new infrastructure to provision.

Phase	Scope	Status
0. Protovalidate enforcement	Add `buf.validate` annotations to `RequestContext` + `EventContext`; add `[(buf.validate.field).required = true]` on every `*Request.ctx`. Purely additive — wrappers already call `Validate(req)`.	Not started
1. Add new fields	`RequestContext.on_behalf_of` (6) and `session_id` (7); `EventContext.subject` (7), `on_behalf_of` (8), and `session_id` (9). No consumers yet — zero risk. The `session_id` generation strategy is OQ #8 (deferred to product); the proto field lands in Phase 1 either way and pass-throughs whatever KrakenD injects.	Not started
2. Gateways set typed subjects	All 6 gateways set `subject = "user:{id}"` or `"tenant_member:{id}"`; centralize copy-pasted `deriveRequestContext()` into `pkg/tenancy.FromHeaders()`.	Not started
3. ConversationActor sets agent subject	Replace `CallerKey = conversationId` override with `subject = "agent:{convId}"` + `on_behalf_of = original subject`.	Not started
4. Eliminate workflow CallerKey overrides	Remove the storage-gateway CallerKey hack (`workflows/generation/v1/impl.go:1032`). Blocker: OpenFGA enforcement (tenant isolation Layer 4) must accept `agent:*` subjects first.	Not started — blocked on Authorization PRD
5. Update event publishing + `BuildEventContext`	`subject`, `on_behalf_of`, `session_id` flow into `EventContext`; drop `callerKey` param; derive Kafka partition key from `subject` via `tenancy.ResourceID()`.	Not started
6. FirebaseBridge	Extract conversation ID from `EventContext.subject` instead of `caller_key`.	Not started
7. Standardize `ctx = 1` everywhere	Renumber gateway protos (`ctx = 100` → `ctx = 1`); rename notification's `context` to `ctx`; add `ctx = 1` to RPCs missing it (openfga, github, rstt admin/introspection/deployment, cerebrium, generation `GetStateRequest`). Proto-only PRs.	Not started
8. Mark `caller_key` deprecated, remove	Mark deprecated in both protos; remove from `pkg/tenancy/keys.go` (BuildActorKey stays — used for SDK client construction); remove gateway references; remove from MCP generated schemas; remove `callerKey` parameter from `BuildEventContext()`.	Not started

Dependency ordering

Phase dependency graph — 9 phases color-coded by category. Phases 0/1/2 are independent roots. Phase 4 is blocked on the external Authorization PRD (red dashed arrow). Phase 8 (caller_key removal) converges from all other phases. Phase 7 depends on Phase 0; Phase 6 depends on Phase 5; Phases 3/4/5 depend on Phase 1

Phase depends on	Reason
Phase 4 depends on Authorization PRD (OpenFGA)	Storage cannot drop the CallerKey-override hack until OpenFGA can authorize `agent:*` subjects via relationship checks
Phase 3 depends on Phase 1	ConversationActor needs `on_behalf_of` to exist before it can populate it
Phase 5 depends on Phase 1	EventContext needs the new identity fields before publishing wires them up
Phase 6 depends on Phase 5	FirebaseBridge consumes what Phase 5 starts publishing
Phase 7 depends on Phase 0	Renumber after enforcement is in place — fewer moving parts
Phase 8 depends on Phases 2–7	Cannot remove `caller_key` until every consumer has migrated off it
Phases 0–2 can run in parallel	Phase 0 is additive; Phase 1 adds unused fields; Phase 2 is gateway-by-gateway

Open Questions

#	Question	Status
1	Should `on_behalf_of` support multi-level delegation chains (agent → agent → user)?	Open — single-level sufficient for now; revisit if Travila staff start acting on a tenant's `agent:` subjects
2	~~Should `EventContext.caller_key` follow the same refactor?~~	Resolved — yes, `subject` + `on_behalf_of` (fields 7–8) added; `caller_key` deprecated alongside RequestContext
3	~~How do notification-gateway admin operations work with typed subjects? Today admin mode sets `CallerKey = tenantId` with no user.~~	Proposed (revisit at implementation) — every Unkey key gains `meta.tenant_member_id`; KrakenD resolves it into `x-user-id`; admin RPCs always carry `subject = "tenant_member:{x-user-id}"`; no `service_account:` type. Cross-PRD dependency on API Keys PRD.
4	Should `tenancy.ResourceID()` parse typed subjects (`"agent:conv-abc"` → `"conv-abc"`)?	Likely yes — simple prefix strip; needed by Phase 5 Kafka-key derivation
5	~~Should inner services standardize on `ctx = 100`, or is `ctx = 1` acceptable for non-gateway services?~~	Resolved — standardize on `ctx = 1` everywhere; aligns with the Zero Trust thesis (context is the primary input, belongs at field 1)
6	Phase 0 may surface latent bugs in services that send empty context. Roll out gradually or all at once?	Open — recommend all at once with integration test coverage
7	Should `EventContext.subject` be validated as required (`min_len = 1`), or optional during migration?	Open — recommend optional initially, required after Phase 5 ships
8	~~How should `session_id` be generated?~~	Deferred to product — not a merge blocker. Proto contract supports any of three implementations (client-generated header, server-derived from `auth_time`, server-generated + Redis); engineering proceeds with `session_id` as an optional unvalidated field. Final strategy is a product call driven by Travila's billing model and SDK story.
9	Should the `Publish()` fallback for `correlation_id` (using `restate.Key(ctx)`) be removed entirely, or kept as a safety net?	Open — recommend removing to force explicit `WithCorrelationID()` and surface bugs early

Out of Scope for v0.5

Multi-level delegation chains (agent → agent → user). Single-level only for now.
service_account:* typed subject. OQ #3 deliberately does not introduce one — every action is attributable to a real tenant_member:* via the API Keys PRD's meta.tenant_member_id requirement.
Backfill plan for legacy Unkey keys without meta.tenant_member_id. Captured in OQ #3 sub-questions; depends on production key inventory at refactor time.
RequestContext.project and EventContext.project_id. Owned by the Projects PRD — fields 8 and 12 respectively. Can ship in parallel from Phase 0 once this refactor's protovalidate posture is in place.
OTel baggage egress stripping. Not a refactor blocker, but pkg/restatex should strip non-traceparent OTel headers at network egress per W3C Baggage's "visible to anyone inspecting network traffic" warning.

Cross-References

RequestContext Refactor PRD — the v0.5 design in full, including 30+ industry citations
Tenant & Project Lifecycle Roadmap — multi-tenant substrate that ships on top of this refactor
Edge Idempotency Roadmap — sibling roadmap; the namespace rewrite consumes the typed-subject format
Authorization PRD — typed-subject model and the Phase 4 OpenFGA blocker
Identity PRD (Zitadel) — issuer of tenant_member:* subjects
API Keys PRD — meta.tenant_member_id required for OQ #3
Google Zanzibar paper — typed-subject foundation
Netflix Edge Authentication & Token-Agnostic Identity Propagation — Passport precedent
Google BeyondProd — EUC tickets, hard-reject-on-missing-context posture
CloudEvents specification — the subject attribute pattern
W3C Trace Context — traceparent propagation, the source of trace_id
OWASP Multi-Tenant Security Cheat Sheet — the "establish tenant context early, derive from verified claims, reject otherwise" mandate
NIST SP 800-207 — Zero Trust Architecture — every-service-validates-independently posture

Glossary​

The CallerKey Overload​

Target State: One Field, One Concern​

The ID Hierarchy​

trace_id vs correlation_id — They Are Not the Same​

Component Inventory​

Touched by the Refactor​

Existing Surfaces with Missing or Inconsistent Context​

Integrated (No Changes Required)​

Flow 1: External Request — Typed Subject from the Edge​

Flow 2: Conversation Acting on Behalf of User​

Flow 3: Async Identity in Events​

Protovalidate Enforcement​

Proto Audit & ctx = 1 Standardization​

Rollout Phases​

Dependency ordering​

Open Questions​

Out of Scope for v0.5​

Cross-References​