GCP Eventarc → Kafka Pipeline Roadmap

End-to-end view of how GCP-emitted events become a first-class part of the platform's async substrate — one ingestion pipe (Eventarc → Pub/Sub → Kafka Connect → Kafka), one shared envelope, multi-tenant routing built into the headers, idempotency built into the consumer. This page consolidates the GCP Eventarc → Kafka Pipeline PRD (Draft, 2026-04-24) and its dependencies into a single reference. The design is complete; the 3-phase rollout has not started.

Source PRDs

This page is derived from the GCP Eventarc → Kafka Pipeline PRD and its closely related dependencies:

Primary (docs/prd/):

GCP Eventarc → Kafka Pipeline — the rewritten draft (supersedes the 2026-01-27 gcp-events-pipeline.md); single-topic ingest, multi-tenant envelope, Handler-Internal idempotency, two-surface DLQ

Dependencies:

Remove Manager Actor Pattern — replaces StorageManagerActor with stateless StorageService; the day-1 consumer of this pipeline. NFR-5 (consumer-side tenant isolation) is reused here verbatim.
Projects — establishes the tenants/{tid}/projects/{pid}/... path convention every event must carry as tenant_id + project_id headers
EndUserService (Identity Capture) — explains why Firestore user_settings events are out of scope (no consumer; EndUserService owns user state directly)
Identity Platform — Zitadel — explains why Firebase Auth events are out of scope (Zitadel emits its own lifecycle events)

Related:

Edge Idempotency Key — the consumer side reuses the Handler-Internal pattern. This pipeline is the first active caller of that pattern (see Idempotency Roadmap)
Scheduled Jobs — explains why Cloud Scheduler events are out of scope (scheduler dispatches are synchronous HTTP, not async Kafka)
Notification Gateway — why device_tokens Firestore events are not plumbed (NotificationService registers tokens directly with Novu)
Firestore ADR — projection-store context behind the source-of-truth split

Architectural Direction — One Pipe, Many Sources

The whole design refuses to provision per-source infrastructure. Every GCP source the platform ever cares about lands on the same Pub/Sub topic, the same Kafka Connect connector, the same Kafka topic — and consumers discriminate on the CloudEvents ce-type header in-process:

Single shared topic. gcp.v1.events.event carries every GCP event. Topic name is auto-derived by protoc-gen-kafka from the FQN of the envelope message (gcp.v1.events.Event). 12 partitions, 7-day retention — sized for ≤ 50 events/sec at day 1 with ~6× headroom for the future-sources catalog.
CloudEvents-native filtering. ce-type (google.cloud.storage.object.v1.finalized, google.cloud.cloudbuild.build.v1.statusCompleted, etc.) is preserved as a Kafka header by the Pub/Sub Source connector. Consumers subscribe to the topic and switch on the type. No ksqlDB, no Kafka Streams, no enrichment job.
Multi-tenant routing in the envelope, not the body. A small Single Message Transform on the connector parses the tenants/{tid}/projects/{pid}/... prefix from ce-subject and emits x-tenant-id + x-project-id Kafka headers. Consumers MUST treat these headers as authoritative — never re-derive tenancy from the payload.
Adding a source is a fixed recipe. New Eventarc trigger + new typed payload proto + new case in the consumer's switch. No new Pub/Sub topic, no new Kafka topic, no new connector instance, no new partitions. The recipe fits in a single PR with no infrastructure approval.

Canonical reference: GCP Eventarc → Kafka Pipeline PRD.

Glossary

Term	Definition
Eventarc trigger	One per GCP source instance (one bucket, one Firestore database, one IAM scope). Subscribes to a CloudEvent stream and publishes to the shared `gcp-events` Pub/Sub topic.
Pub/Sub topic `gcp-events`	Single shared durable buffer for all GCP events. One topic, one subscription, one DLQ — provisioned once, never grows with the source count.
Kafka topic `gcp.v1.events.event`	Single shared Kafka topic carrying every GCP event. Name derived by `protoc-gen-kafka` from the envelope FQN. Partition key = `ce-subject`.
Envelope	The `gcp.v1.events.Event` proto — CloudEvents 1.0 attributes plus the platform's required `tenant_id` / `project_id` and a `google.protobuf.Any` payload. The only message in the package with the `(kafka.options.v1.topic)` annotation.
Typed payload	A per-source proto (e.g., `GCSObjectFinalized`, `GCSObjectDeleted`, future `CloudBuildCompleted`) carried inside `Event.payload` as `Any`. No `oneof` — `Any` keeps the envelope stable as new sources are added.
`ce-type` header	CloudEvents type attribute (`google.cloud.storage.object.v1.finalized`). The canonical discriminator — consumers `switch` on it.
`x-tenant-id` / `x-project-id` headers	Platform-required Kafka headers populated by the connector's SMT from the `tenants/{tid}/projects/{pid}/...` prefix in `ce-subject`. Authoritative for routing — handlers MUST NOT re-derive from payload.
`_platform` sentinel	Reserved value of `tenant_id` / `project_id` for organization-wide events (IAM audit logs, billing notifications, GKE control-plane). Underscore prefix cannot collide with real tenant/project IDs.
Two-surface DLQ	Independent failure modes with separate dead-letter topics: Surface A = Pub/Sub-side delivery failure (`gcp-events-dlq` Pub/Sub topic), Surface B = connector/consumer-side processing failure (`gcp.v1.events.event-dlq` Kafka topic). Not automatically mirrored — both must be monitored.
Handler-Internal dedup	The Edge Idempotency PRD pattern where the consumer derives a dedup token from a trusted server-side signal. Here that signal is `ce.id` — sender-generated by GCP, unique per delivery, stable across retries. Passed to `restate.WithIdempotencyKey("gcp-event:" + event.Id)` on every effecting downstream call.
`last_event_id`	Belt-and-braces dedup field on the file Firestore document. The upsert is a transactional read-then-write that no-ops if `last_event_id == event.id`. Catches Pub/Sub redeliveries that arrive after Restate's idempotency cache has expired.

What This Replaces

The pipeline rewrite is driven by three platform changes since the original 2026-01-27 PRD landed.

Change	Effect on the original design
Manager actors are being removed (Remove Manager Actor PRD, 2026-04-22). `StorageManagerActor` → stateless `StorageService` backed by Firestore.	The original storage-event consumer, written against the actor's EXCLUSIVE handlers and Restate K/V state, no longer applies. New consumer is a stateless service handler — concurrent uploads to the same user run in parallel.
Projects exist (Projects PRD, April 2026). Tenant data is project-scoped: `tenants/{tid}/projects/{pid}/...`.	Every event must carry both `tenant_id` and `project_id`. Path-prefix convention is canonical for path-based subjects (Cloud Storage object names, Firestore document paths, BigQuery table IDs).
End-user identity is owned by `EndUserService` (EndUserService PRD, April 2026). `SocayoUserActor` is gone.	Firestore `user_settings` events flowing through Kafka would land nowhere. Removed from scope; `EndUserService` owns user state, Firestore is the source of truth, no event round-trip needed.

Three categories of events appeared in the 2026-04-10 draft and are out of scope here:

Removed	Where it lives now
Firebase Auth events	Identity Platform — Zitadel — Zitadel emits its own lifecycle events; tenant member auth no longer routes through Firebase.
Cloud Scheduler events	Scheduled Jobs PRD — scheduler dispatches are synchronous HTTP through KrakenD, not async Kafka.
Firestore document events	No active consumer. `device_tokens` is owned by `NotificationService` (registered with Novu directly). `user_settings` is owned by `EndUserService` (Firestore is source of truth, written from the gateway upsert). `shared_plans` has no event-driven consumer planned. Eventarc supports Firestore natively if a future need appears — one-line addition, no need to plumb in advance.

The Pipeline

The narrow waist is the single shared topic. Adding a new GCP source is a fixed recipe: declare an Eventarc trigger, add a typed payload proto, add a case to the consumer's switch. No new Pub/Sub topic, no new Kafka topic, no new connector instance, no new partitions.

Why Pub/Sub Between Eventarc and Kafka Connect

Eventarc supports Pub/Sub destinations natively, and Pub/Sub gives the pipeline three properties for free:

Backpressure handling. Ack-deadline (60s), retry policy, DLQ — provided by Pub/Sub without writing code.
Connector outage tolerance. Kafka Connect can be redeployed, upgraded, or sit behind a broker outage without losing events. They accumulate in the subscription.
Replay. If the Kafka topic is rebuilt or repartitioned, Kafka Connect resumes from a fresh subscription cursor without involving the GCP source.

Why a Single Kafka Topic

Property	Single shared topic	Per-type topics
Provisioning cost	1 topic, 1 subscription, 1 connector — provisioned once	N topics, N connectors, growing with source count
Per-source PR scope	1 trigger + 1 proto + 1 `case`	+ topic config + connector config + Terraform module
Filtering	In-process `switch` on `ce-type` header	Native, per-topic
Stream processing required	None	None
Splittable later?	Yes — promote any one source to its own topic without touching others; envelope unchanged	N/A

The single-topic decision survives the rewrite. CloudEvents ce-type filtering is the canonical discriminator pattern for the spec; the cost of the in-process switch is negligible at platform-event volumes (storage finalize/delete dominates day 1; future agent sources are even lower volume).

Multi-Tenant Addressing

Every event carries both tenant_id and project_id. Hard requirement — the consumer side has no fallback path. An event without these fields is discarded to the DLQ.

Path-Prefix Convention

For sources whose subject is a path (Cloud Storage object names, Firestore document paths, BigQuery table IDs), the prefix is canonical:

tenants/{tenant_id}/projects/{project_id}/{source-specific-suffix}

Source	Subject
Cloud Storage	`tenants/acme/projects/healthcoach-prod/users/u_42/profile.jpg`
Cloud Storage	`tenants/acme/projects/fitness-prod/conversations/c_99/audio.webm`
Firestore (future)	`tenants/acme/projects/healthcoach-prod/end_users/u_42/sessions/s_1`

`pkg/tenancy` Path Helpers

Two helpers are added to pkg/tenancy alongside the existing actor-key helpers (BuildActorKey / ParseActorKey):

// PathFor returns "tenants/{tenant_id}/projects/{project_id}/{suffix}".
// Validates that tenant_id and project_id are non-empty and not the
// reserved "_platform" sentinel.
func PathFor(tenantID, projectID, suffix string) string

// ParsePath parses a path produced by PathFor (or written by uploaders
// that follow the convention) and returns its components, or an error
// if the prefix is malformed.
func ParsePath(path string) (tenantID, projectID, suffix string, err error)

Centralizing prefix construction keeps consumers from drifting and makes a future refactor (e.g., adding a region segment) a one-diff change.

The `_platform` Sentinel

Some GCP events are organization-wide and don't belong to a tenant — IAM audit logs, billing notifications, project-level GKE control-plane events. These flow through the pipeline with tenant_id = "_platform", project_id = "_platform". Reserved value, rejected as a real ID elsewhere in the platform. Tenant-scoped consumers ignore them naturally.

Follow-up scope

Today, _platform is recognized only inside the Edge Idempotency Key PRD's key-namespacing layer. Treating it as a valid tenant_id / project_id value end-to-end requires updates to pkg/tenancy validators, Firestore security rules, and the OpenFGA model — none of which currently whitelist a _-prefixed ID. Out of scope for this PRD; will land with the first _platform-scoped consumer (DevOps or Security agent) under its own PRD. No day-1 source emits _platform-scoped events, so this caveat does not block the day-1 rollout.

Header Propagation

Eventarc → Pub/Sub uses CloudEvents natively. The Pub/Sub Source connector preserves CloudEvent attributes as Kafka record headers (ce-id, ce-type, ce-source, ce-subject, ce-time). On top of those, a small Single Message Transform — living in integrations/kafka-connect/transforms/, unit-tested independently — extracts tenant_id / project_id from the subject's path prefix and emits them as x-tenant-id / x-project-id headers.

For sources whose subject is not a path (e.g., Eventarc resources keyed by project number or bucket name), the trigger's filter restricts it to one (tenant, project) at trigger creation time, and the headers are populated from trigger metadata via a small Cloud Function adapter sitting between Eventarc and Pub/Sub. This is the rare case, not the default.

Tenant Isolation at the Consumer

Consumers MUST treat the x-tenant-id and x-project-id headers as authoritative. They MUST NOT re-derive tenancy from event payload fields. This matches Remove Manager Actor Pattern PRD's NFR-5 on consumer-side tenant isolation: the source of truth for routing is the envelope, not the body.

Event Schema

Envelope

syntax = "proto3";

package gcp.v1.events;

import "google/protobuf/any.proto";
import "google/protobuf/timestamp.proto";
import "kafka/options/v1/options.proto";

option go_package = "github.com/bo-socayo/workflows-v7/apis/gcp/v1/events;gcpeventsv1";

// Event is the Kafka envelope for every event ingested through the
// GCP Eventarc pipeline. CloudEvents 1.0 plus tenant_id and project_id.
message Event {
  option (kafka.options.v1.topic) = true;

  // CloudEvents 1.0 attributes
  string spec_version = 1;     // "1.0"
  string id = 2;               // Sender-generated unique ID; used for handler-internal dedup
  string type = 3;             // "google.cloud.storage.object.v1.finalized"
  string source = 4;
  string subject = 5;
  google.protobuf.Timestamp time = 6;

  // Platform-required routing fields, populated by the connector
  string tenant_id = 10;       // From subject path or trigger metadata; "_platform" for org-wide events
  string project_id = 11;      // Same provenance as tenant_id

  // Source-specific typed payload — see Payloads
  google.protobuf.Any payload = 20;
}

Why Any and not oneof. Platform convention avoids oneof in new protos (same precedent as ContentPart). Any keeps the envelope stable as new sources are added — adding a new payload type does not change Event. The trade-off (consumers must payload.UnmarshalTo(&typedMsg)) is acceptable because consumers already discriminate on type first.

Day-1 Payloads (Cloud Storage)

// apis/gcp/v1/events/storage.proto

message GCSObjectFinalized {
  string bucket = 1;
  string object_path = 2;     // Full path inside the bucket
  string content_type = 3;
  int64 size_bytes = 4;
  string md5_hash = 5;
  google.protobuf.Timestamp created_at = 6;
  map<string, string> object_metadata = 7;  // GCS object metadata (custom-headers)
}

message GCSObjectDeleted {
  string bucket = 1;
  string object_path = 2;
  google.protobuf.Timestamp deleted_at = 3;
}

No Per-Source Kafka Topics

A previous draft suggested per-typed-payload topic annotations (storage.v1.object-created, etc.). Removed. All payloads ride on the single gcp.v1.events.event topic; typed protos exist only as Any-unwrapped structures consumers decode after dispatch.

Component Inventory

New / Extended Components

Component	Purpose	PRD section
Pub/Sub topic `gcp-events`	Single shared durable buffer for all GCP events; one subscription `gcp-events-kafka-connect` consumed by the connector	§3.1
Pub/Sub topic `gcp-events-dlq`	Surface-A DLQ — Kafka Connect can't pull or ack	§8.4
Kafka Connect Pub/Sub Source connector	Pulls from Pub/Sub, writes to `gcp.v1.events.event`. Preserves CloudEvent attributes as Kafka headers; SMT extracts tenant/project	§3.1, §11.2
Kafka topic `gcp.v1.events.event`	Single shared topic, 12 partitions, 7-day retention. Name auto-derived by `protoc-gen-kafka` from the envelope FQN	§11.1
Kafka topic `gcp.v1.events.event-dlq`	Surface-B DLQ — connector or consumer can't process	§8.4, §11.1
`apis/gcp/v1/events/event.proto`	The shared `Event` envelope; `(kafka.options.v1.topic)` annotation is here, on the envelope only	§5.1
`apis/gcp/v1/events/storage.proto`	Day-1 typed payloads (`GCSObjectFinalized`, `GCSObjectDeleted`)	§5.2
`pkg/tenancy.PathFor` / `ParsePath`	Centralized path-prefix construction and parsing for `tenants/{tid}/projects/{pid}/...`	§4.1
`integrations/kafka-connect/transforms/`	Single Message Transform that parses `tenants/{}/projects/{}` from `ce-subject` into `x-tenant-id` / `x-project-id` headers	§4.3, §11.2
`StorageService.ReceiveEvent` (subscribed handler)	Day-1 consumer; filters for storage event types and dispatches to typed handlers	§7
Eventarc trigger Terraform module (`infrastructure/terraform/modules/gcp-events-trigger-gcs/`)	Reusable per-bucket trigger module	§12
Pipeline Terraform module (`infrastructure/terraform/modules/gcp-events-pipeline/`)	Pipe-level resources (Pub/Sub, subscription, DLQ, connector, IAM) — instantiated once per env	§12

Integrated Components (No Changes Required)

Component	Role
Restate Go SDK	`restate.WithIdempotencyKey(...)` is the dedup primitive for Handler-Internal calls. No SDK changes.
`pkg/tenancy` (existing)	Already carries `BuildActorKey` / `ParseActorKey`; the path helpers fit alongside them.
`protoc-gen-kafka`	Auto-derives the Kafka topic name from the envelope's lowercased proto FQN. The envelope is the only message in `gcp.v1.events` with the topic annotation — no changes to the plugin.
`pkg/telemetry`	Existing audit/observability path. The original draft's bespoke `AuditService` consumer is removed — telemetry uses the standard integration.
Lago metering	`storage_gb` usage events flow through the existing Lago path. Envelope's `tenant_id` is the dimension key.

Observability Additions

Metrics map to the two failure surfaces in §8.4. Each surface has its own alert.

Metric	Source	Surface	Alert threshold
`pubsub_subscription_num_undelivered_messages` (subscription = `gcp-events-kafka-connect`)	Pub/Sub	A	> 1000 for 5 min
`pubsub_subscription_oldest_unacked_message_age`	Pub/Sub	A	> 300s
`pubsub_dlq_message_count` (custom, on `gcp-events-dlq` Pub/Sub topic)	Pub/Sub	A	> 0 for 15 min
`kafka_connect_source_record_poll_total`	Kafka Connect	—	observed only
`kafka_consumer_group_lag` (group = `storage-service`)	Kafka	B	> 1000 for 5 min
`gcp_events_dlq_message_count` (custom, on `gcp.v1.events.event-dlq` Kafka topic)	Kafka exporter	B	> 0 for 15 min

Dashboards: GCP Events Pipeline (Pub/Sub backlog, connector record rate, consumer-group lag), GCP Events by Type (record count + DLQ rate per ce-type), GCP Events by Tenant/Project (record count by x-tenant-id × x-project-id — useful when a tenant onboards or a project's traffic spikes).

Day-1 Source: Cloud Storage

Multi-Bucket From Day One

The pipeline is bucket-agnostic. Day 1 uses one bucket (travila-uploads), but the architecture supports any number without code changes:

The Eventarc trigger is a parameterized Terraform module — adding bucket #2 is a one-call instantiation.
StorageService reads the bucket name from GCSObjectFinalized.bucket, never from a constant.
Path conventions inside a bucket follow §6.3 by default; a bucket can opt out (e.g., a future travila-public-assets bucket holding tenant-agnostic platform assets).

Bucket Registry

Bucket	Purpose	Path convention	Day 1?
`travila-uploads`	All end-user file uploads (profile photos, audio, documents, exports)	`tenants/{tid}/projects/{pid}/users/{uid}/{file_id}.{ext}`	Yes
(future) `travila-generated-content`	AI-generated images and reports	Same as above	No
(future) `travila-public-assets`	Platform-wide static assets (no tenant scope)	`assets/{asset_id}.{ext}`	No

Adding a new bucket: Terraform module instantiation + an entry in this table + (if path convention differs) consumer-side handling.

Event Types

Event	CloudEvent type	Trigger condition
Object finalized	`google.cloud.storage.object.v1.finalized`	Upload complete (initial PUT or compose finished)
Object deleted	`google.cloud.storage.object.v1.deleted`	Object removed via API or lifecycle rule

object.metadataUpdated is excluded — the platform does not mutate object metadata after upload.

Sample Event (After Connector Processing)

Kafka record on gcp.v1.events.event
  Headers:
    ce-id:        13284820103245192
    ce-type:      google.cloud.storage.object.v1.finalized
    ce-source:    //storage.googleapis.com/projects/_/buckets/travila-uploads
    ce-subject:   objects/tenants/acme/projects/hcprod/users/u_42/file_8a91.jpg
    ce-time:      2026-04-24T12:00:00.000Z
    x-tenant-id:  acme
    x-project-id: hcprod
  Body:
    Event {
      id:         "13284820103245192"
      type:       "google.cloud.storage.object.v1.finalized"
      tenant_id:  "acme"
      project_id: "hcprod"
      payload:    Any { GCSObjectFinalized { bucket, object_path, … } }
    }

Day-1 Consumer: StorageService

StorageService (the stateless Restate service from Remove Manager Actor Pattern PRD §FR-1.2) gets one new Kafka-subscribed handler.

service StorageService {
  // Existing CRUD handlers unchanged.

  // Reconciles a GCS event against Firestore + Lago metering.
  // Subscribed to gcp.v1.events.event; filters for storage event types.
  rpc ReceiveEvent(gcp.v1.events.Event) returns (google.protobuf.Empty) {
    option (kafka.options.v1.subscribe) = true;
  }
}

No EXCLUSIVE handler. StorageService is stateless; concurrent uploads to the same user run in parallel, bounded only by GCS itself.

ReceiveEvent(event)
  1. Validate envelope: event.tenant_id and event.project_id must be set
     (else TerminalError → DLQ via Restate's default policy)
  2. Switch on event.type:
       google.cloud.storage.object.v1.finalized → handleObjectFinalized
       google.cloud.storage.object.v1.deleted   → handleObjectDeleted
       (default — not a storage event)          → return nil  (ignore)

handleObjectFinalized(event, payload)
  1. ParsePath(payload.object_path) → (tenant, project, suffix)
       Reject (DLQ) on parse failure.
  2. Cross-check parsed tenant/project against envelope tenant_id/project_id.
       Discrepancy → TerminalError → DLQ.
  3. Upsert Firestore file doc at tenants/{tid}/projects/{pid}/users/{uid}/files/{file_id}
       with size_bytes, content_type, md5_hash, storage_ref, last_event_id = event.id.
       Pass restate.WithIdempotencyKey("gcp-event:" + event.Id) on the call.
  4. Emit storage_gb usage event to Kafka via the existing Lago metering path.
       Same idempotency key.

handleObjectDeleted(event, payload)
  1. ParsePath as above.
  2. Delete Firestore file doc (idempotent — already-absent is fine).
  3. Emit storage_gb decrement usage event.

What the Old Manager Actor Did That This Doesn't

Old behavior	Now
Wrote into the actor's K/V file tree	Writes a Firestore document; tree is queried via Firestore subcollection reads
EXCLUSIVE lock per user serialized concurrent uploads	No lock; concurrent uploads run in parallel
Quota enforcement via in-actor counter	Quota enforcement via Lago + Redis (Remove Manager Actor Pattern §FR-1.3)
Thumbnail generation inline	Out of scope here; if added later, a separate handler subscribes to the same event and is observed independently

Idempotency: Handler-Internal via `ce.id`

Pub/Sub is at-least-once. The Pub/Sub Source connector inherits that. Consumers will see the same event more than once during retries, broker rebalances, or connector restarts. The platform's Edge Idempotency Key PRD addresses this for HTTP edges; Kafka consumers need the equivalent.

The Pattern

Per the Edge Idempotency PRD's Handler-Internal pattern, the consumer derives a dedup token from a trusted server-side signal. CloudEvents ce.id is exactly that — sender-generated by GCP, unique per delivery, stable across retries. The consumer passes it to restate.WithIdempotencyKey(...) on every downstream invocation that has effects we care about deduping:

// Inside StorageService.handleObjectFinalized:
fileDoc := buildFileDoc(event, payload)

_, err := s.firestoreClient.UpsertFile(
    ctx,
    fileDoc,
    restate.WithIdempotencyKey("gcp-event:"+event.Id),
)
if err != nil {
    return err
}

return s.lagoMetering.RecordStorageUsage(
    ctx,
    event.TenantId,
    payload.SizeBytes,
    restate.WithIdempotencyKey("gcp-event:"+event.Id),
)

The gcp-event: prefix namespaces the key against unrelated callers (per Edge Idempotency PRD §Cache Isolation). Restate's native dedup mechanism handles the rest.

Cross-PRD State

The Edge Idempotency Key PRD v1.1 currently states that Handler-Internal has no active callers — Asynq scheduler became Client-Generated, leaving the pattern dormant as a template. The StorageService event consumer specified here is the first active caller. When this PRD ships, the Edge Idempotency PRD's "no active callers" caveat needs to be updated to point at this consumer.

Belt and Braces: `last_event_id` on the File Document

In addition to per-call WithIdempotencyKey, the file Firestore document carries last_event_id. The upsert is a transactional read-then-write that no-ops if last_event_id == event.id already. This catches the case where Restate's idempotency cache has expired (default 24h) but a Pub/Sub redelivery still arrives.

Out-of-Order Events

Pub/Sub does not guarantee order, but the Storage event volume per object is low (one finalized + at most one deleted). The consumer treats both as commutative: deleted-then-finalized leaves the document present; finalized-then-deleted leaves it absent. If a future source needs ordering (e.g., Cloud Build with multiple status updates), it adopts a per-resource sequence number in its typed payload and the consumer compares against last_seq on the target document.

Two-Surface DLQ

The pipeline has two independent failure surfaces, each with its own DLQ. They are not the same flow and are not automatically mirrored — oncall must monitor both.

Surface	Where the failure happens	DLQ topic	Detection metric	Typical causes
A — Pub/Sub-side delivery	Kafka Connect is unreachable, stuck, or its credentials are revoked. Pub/Sub retries up to 5 times, then routes to a Pub/Sub DLQ.	`gcp-events-dlq` (Pub/Sub, not Kafka)	`pubsub_subscription_num_undelivered_messages`, `pubsub_subscription_oldest_unacked_message_age`, `pubsub_dlq_message_count`	Connector outage, IAM credential rotation, Kafka broker partition
B — Connector / consumer-side processing	Kafka Connect successfully pulled the message but failed to publish to Kafka — bad payload encoding, SMT failure parsing the prefix, or transient broker write error.	`gcp.v1.events.event-dlq` (Kafka)	`gcp_events_dlq_message_count`, `kafka_consumer_group_lag`	Malformed subject path, missing tenant/project headers, payload type mismatch with `ce-type`

No automatic mirroring. A failure on Surface A is not visible on Surface B and vice versa. If a future operational need calls for one unified DLQ, the right move is a separate, narrowly-scoped Kafka Connect connector that reads gcp-events-dlq (Pub/Sub) and writes to gcp.v1.events.event-dlq (Kafka). Out of scope for day 1; the two-surface model is sufficient.

Adding a New GCP Source — The Recipe

The whole point of the design. The recipe, in order:

Pick the CloudEvent type(s) the source emits. Look it up in the Eventarc event reference.
Define a typed payload proto at apis/gcp/v1/events/{source_name}.proto. Mirror only the fields the consumer needs; don't replicate the entire CloudEvent body.
Instantiate the Eventarc trigger module in Terraform with the source-specific filter (bucket, document path pattern, project, etc.). Triggers always target google_pubsub_topic.gcp_events.
Add a case to the target consumer's ReceiveEvent switch matching the new ce.type. Decode event.Payload into the typed message; do the work.
Document the source in the Future Sources Catalog, promoting it from "future" to "active" once the consumer ships.

There is no infrastructure step. No new Pub/Sub topic. No new Kafka topic. No new connector instance. No new partitions.

Future Sources Catalog (DevOps Agents)

These are the agent-relevant sources we expect to add. None are in this PRD's implementation scope; they're listed so reviewers can sanity-check that the pipeline shape supports them. For each, the recipe above applies unchanged.

Source	Eventarc type(s)	Likely consumer	Tenant scope	Notes
Cloud Build	`google.cloud.cloudbuild.build.v1.{statusChanged,statusFailed,statusCompleted}`	`DevOpsAgentService` (future)	`_platform`	Build outcomes per repo / per branch. Trigger ID identifies the workflow.
Cloud Run	`google.cloud.run.revision.v1.{ready,failed}`	`DevOpsAgentService`	`_platform`	Revision rollout health — useful for "did the deploy succeed?" agent queries.
GKE control plane	`google.cloud.audit.log.v1.written` (filtered to `container.googleapis.com`)	`DevOpsAgentService`	`_platform`	Cluster-level changes (node pool resize, version upgrades).
IAM audit log	`google.cloud.audit.log.v1.written` (filtered to IAM methods)	`SecurityAgentService` (future)	`_platform`	High-signal security trail — grants, key creates, policy changes.
Cloud Logging severity sink	`google.cloud.audit.log.v1.written` filtered to `severity >= ERROR`	`OpsAlertConsumer` (future)	`_platform` or per-tenant if log labels carry tenancy	Catch-all for unhandled errors that should reach an agent.

Security & Compliance

Requirement	Mechanism
Encryption in transit	TLS for Pub/Sub, mTLS between Kafka Connect ↔ Kafka
Encryption at rest	GCP default for Pub/Sub; Kafka volumes encrypted at the disk level
Retention	7 days on `gcp.v1.events.event`; 14 days on `gcp.v1.events.event-dlq`. Aligns with the platform privacy retention window.
Audit trail	Cloud Audit Logs on the Pub/Sub topic; Restate / Kafka logging via `pkg/telemetry`. No bespoke `AuditService` consumer — removed in favor of the standard telemetry path.
Tenant isolation	`x-tenant-id` / `x-project-id` headers are authoritative; consumers MUST NOT re-derive tenancy from the body. Protects against payload-spoofing in the rare case where a misconfigured trigger would otherwise emit a path the consumer would interpret as a different tenant.

IAM

Service account	Roles	Scope
`eventarc-gcp-events`	`roles/pubsub.publisher` on `gcp-events`	Eventarc → Pub/Sub
`kafka-connect-gcp`	`roles/pubsub.subscriber` on `gcp-events-kafka-connect`	Kafka Connect → Pub/Sub

Rollout Phases

The 3-phase plan from the PRD. Each phase is mostly mechanical — Terraform module instantiation, proto + handler edits, observability config.

Phase	Scope	Status
1. Pipeline infrastructure	Deploy `gcp-events-pipeline` Terraform module (Pub/Sub topic, subscription, DLQ, IAM, Kafka Connect connector). Verify a synthetic Pub/Sub publish appears in `gcp.v1.events.event` with headers preserved. Deploy monitoring dashboards and alerts.	Not started
2. Cloud Storage source + StorageService consumer	Instantiate `gcp-events-trigger-gcs` for `travila-uploads`. Land `StorageService.ReceiveEvent` (subscribed handler + `handleObjectFinalized` / `handleObjectDeleted`). End-to-end test: upload → Firestore document + Lago metering event. Migrate any pre-existing storage workflows that previously called `StorageManagerActor` synchronously (covered by Remove Manager Actor Pattern PRD §FR-4, not duplicated here).	Not started
3. First DevOps source (Cloud Build)	Add `apis/gcp/v1/events/cloud_build.proto`. Instantiate Eventarc trigger for Cloud Build status change events. Land `DevOpsAgentService.ReceiveEvent` (skeleton consumer that logs build outcomes; agent reasoning out of scope). Validate the recipe works end-to-end with no infra change. Promote Cloud Build from "future" to "active" in the Future Sources Catalog.	Not started

Dependency Ordering

Phase depends on	Reason
Phase 1 depends on the Projects PRD	Path convention `tenants/{tid}/projects/{pid}/...` is established by Projects; the SMT in §11.2 parses against that exact prefix
Phase 2 depends on Phase 1	The `StorageService` consumer needs the Kafka topic and connector running before it can subscribe
Phase 2 depends on the Remove Manager Actor Pattern PRD	The day-1 consumer is the new stateless `StorageService` from that PRD; it doesn't exist as a destination until that PRD ships
Phase 3 depends on Phase 2	Validates that the recipe is what's claimed — second source, no infra change
Phase 3 is independent of `_platform` end-to-end support	Cloud Build is `_platform`-scoped, but the day-1 consumer's logging skeleton doesn't need `_platform` to be a valid `tenant_id` everywhere — full `_platform` support is a follow-up scope per the warning in §4.2

Out of Scope (v1)

Real-time blocking validation. Eventarc is asynchronous; if you need to prevent an action, use a Cloud Function or a synchronous gateway, not this pipeline.
Bidirectional sync. This pipeline is GCP → Kafka only. Travila → GCP writes go through service-specific code paths (e.g., the services/storage GCS client wrapper).
Non-GCP event sources. Inbound webhooks from Stripe, GitHub, Twilio, Pipedream go through the existing webhook gateway and the Edge Idempotency Key PRD flow, not this pipeline.
Firestore document events. No active consumer day 1. device_tokens, user_settings, shared_plans all have non-event ownership stories. Eventarc supports Firestore natively if a future need appears — one-line addition; the design doesn't need to plumb it in advance.
_platform-scoped tenancy end-to-end. Recognized only in the Edge Idempotency PRD's namespacing today; pkg/tenancy validators, Firestore security rules, and the OpenFGA model don't whitelist a _-prefixed ID. Lands with the first _platform-scoped consumer's PRD (DevOps or Security agent).
Replicating provider strengths. Not aiming to replicate Firestore's change-stream UX, Pub/Sub filtering, or Kafka stream processing inside this pipeline. The pipeline is the dumb pipe; intelligence lives in the consumer.
Unified DLQ across both surfaces. The two-surface model is intentional. If operational need eventually surfaces, a small connector reading the Pub/Sub DLQ into the Kafka DLQ is the way; not in scope day 1.

Cross-References

GCP Eventarc → Kafka Pipeline PRD — the rewritten draft in full, including the appendix detailing what changed from the 2026-01-27 version
Remove Manager Actor Pattern PRD — the day-1 consumer's home; consumer-side tenant isolation NFR is shared
Edge Idempotency Roadmap — sibling roadmap; this pipeline is the first active Handler-Internal caller
Tenant & Project Lifecycle Roadmap — multi-tenant substrate (tenants/{tid}/projects/{pid}/...) the path convention is built on
RequestContext Refactor Roadmap — typed-subject and protovalidate guarantees that downstream consumers of this pipeline rely on
Eventarc event reference — the source-of-truth list of CloudEvent types
CloudEvents specification — ce-type, ce-subject, ce-id semantics
Kafka Connect Pub/Sub Source connector — the connector this pipeline uses

Glossary​

What This Replaces​

The Pipeline​

Why Pub/Sub Between Eventarc and Kafka Connect​

Why a Single Kafka Topic​

Multi-Tenant Addressing​

Path-Prefix Convention​

pkg/tenancy Path Helpers​

The _platform Sentinel​

Header Propagation​

Tenant Isolation at the Consumer​

Event Schema​

Envelope​

Day-1 Payloads (Cloud Storage)​

No Per-Source Kafka Topics​

Component Inventory​

New / Extended Components​

Integrated Components (No Changes Required)​

Observability Additions​

Day-1 Source: Cloud Storage​

Multi-Bucket From Day One​

Bucket Registry​

Event Types​

Sample Event (After Connector Processing)​

Day-1 Consumer: StorageService​

What the Old Manager Actor Did That This Doesn't​

Idempotency: Handler-Internal via ce.id​

The Pattern​

Belt and Braces: last_event_id on the File Document​

Out-of-Order Events​

Two-Surface DLQ​

Adding a New GCP Source — The Recipe​

Future Sources Catalog (DevOps Agents)​

Security & Compliance​

IAM​

Rollout Phases​

Dependency Ordering​

Out of Scope (v1)​

Cross-References​