Billing, Metering & Invoicing

End-to-end view of how usage becomes revenue — Lago for metering and invoicing, Stripe for collection, OPA for sync entitlement enforcement, and a single proto-FQN Kafka path that keeps producers ignorant of billing. This page consolidates the Billing, Metering & Invoicing PRD (v0.6, Draft — POC validated, architectural review applied) and its dependencies into a single reference. The design is complete; MVP rollout has not started, and real-time enforcement is deferred to Policy Engine Phase 1.

Source PRDs

This page is derived from the Billing, Metering & Invoicing PRD and its closely related dependencies:

Primary (docs/prd/):

Billing, Metering & Invoicing — the v0.6 design: Lago + Stripe + OPA, proto-FQN topic ingestion, MVP=LLM-only, additive-metric rule, deferred real-time enforcement

Dependencies (docs/prd/):

Policy Engine — owns OPA deployment topology (sidecar vs central), fail-open/closed posture, decision logging. Phase 5 of this PRD blocks on Policy Engine Phase 1 for the opa-consumer and gateway middleware.
Edge Idempotency Key — Lago is treated as a Provider-Translated webhook source: lago_id is lifted into the canonical Idempotency-Key at the trust boundary. Producer-side billing events use restatex.Publish()'s deterministic event_id.
Multi-Tenancy: Projects — establishes (tenant_id, project_id) as the metering boundary. Production target is data.entitlements[tenant_id][project_id]; POC keys on tenant_id only.
Multi-Tenancy: API Keys — Unkey resolves the (tenant_id, project_id) dimensions billing tags every event with.
Remove Manager Actor Pattern — StorageService (post-removal) is the day-1 source for the post-MVP storage_gb metric.
Enterprise Readiness — strategic context for plan structures, MSAs, and per-customer negotiated rates.

Related:

Edge Idempotency Roadmap — Lago as a Provider-Translated source uses the same KrakenD namespace rewrite mechanism this PRD piggybacks on
Tenant & Project Lifecycle Roadmap — multi-tenant substrate; tenant_id → Lago Customer external_id
GCP Eventarc → Kafka Pipeline Roadmap — sibling proto-FQN-topic pipeline; same restatex.Publish() envelope and Handler-Internal idempotency posture

Architectural Direction — One Pipe to Lago, One Pipe Back, Composable Decisions

The whole design refuses to invent billing-specific abstractions when an existing platform primitive already does the job:

Producers stay billing-agnostic. Generation Workflow and OpenRouter publish their normal domain events (LLMGenerationCompletedEvent) via restatex.Publish() to proto-FQN-derived Kafka topics. The BillingService subscribes from the consumer side. There is no billing.usage-events topic, no shared JSON schema, no producer-side knowledge that "Lago" exists.
Idempotency is the journal, not a UUID. Every billing-source event carries a deterministic event_id from Restate's journaled sdkgo.UUID(ctx), which maps directly to Lago's transaction_id. Direct uuid.New() is forbidden for billing event IDs and CI-enforced via per-module forbidigo lint rules. Replays are safe by construction.
Inbound Lago webhooks reuse Edge Idempotency. A standalone webhook-receiver Restate service lifts lago_id into the canonical Idempotency-Key per the Provider-Translated pattern and republishes onto a per-provider Kafka topic. The receiver is a thin HTTP→Kafka translator; interpretation lives in the downstream consumer.
One decision, many domains. Sync entitlement enforcement runs through OPA, not a Redis flag. The gateway issues a single POST /v1/data/... per request that evaluates entitlement alongside model access, feature access, RBAC, and data residency — one composable decision, one structured audit log, no N-fan-out of per-domain caches.
MVP is LLM-only; new metrics are additive. Adding api_calls, storage_gb, voice_minutes, or active_users later is the same fixed recipe (new proto event + new emit site + new BillingService handler + new Lago metric). No existing event shape, handler, or topic gets refactored — guaranteed by the additive-future rule baked into Key Design Decisions.

Canonical reference: Billing, Metering & Invoicing PRD. The PRD's POC findings section catalogs measured latencies (Lago dispatch arrival → OPA-visible decision: ~30ms end-to-end, <1ms OPA decision read) validated in playground/lago-poc/.

Glossary

Term	Definition
Lago	Open-source usage-based billing platform (AGPL v3, self-hosted). Owns metering aggregation, plan/charge calculation, invoice generation, Stripe orchestration. Selected over OpenMeter, Kill Bill, and Stripe Billing.
BillingService	New Restate service (`services/billing/v1/`) that consumes proto-FQN topics, maps each event to a Lago payload via `restatex.Run()`, and posts to `/api/v1/events`. Greenfield — does not yet exist in the repo.
LagoProxyService	New Restate service that wraps Lago's REST API for admin/query operations (customer CRUD, subscriptions, plans, wallets, invoices). Tenant-scoped at the proxy layer; Lago has no native tenant isolation.
`webhook-receiver`	Standalone Restate service for inbound third-party webhooks. Per-provider routes (`/webhooks/lago` first), per-provider Kafka topics (`lago.webhook.v1.event` first). HTTP→Kafka translator only — does not interpret events.
`opa-consumer`	Projection service that tails `lago.webhook.v1.event`, maps each Lago lifecycle event to an entitlement-state diff, and writes via `PUT /v1/data/entitlements/{tenant_id}/{project_id}` on OPA. Built alongside Policy Engine Phase 1, NOT in billing MVP.
`LLMGenerationCompletedEvent`	The primary billing event. One per workflow run with aggregated `Usage` (prompt/completion/total tokens, cost estimate). Published by Generation Workflow via `restatex.Publish()` on topic `llm.v1.common.v1.llmgenerationcompletedevent`.
Billable metric	Lago primitive: a name + aggregation (SUM, COUNT, COUNT_DISTINCT, MAX) + optional filter dimensions. MVP ships `token_usage` (SUM with `model`/`type`/`modality` filters) and `image_generation` (COUNT with `model`).
Filter	A dimension on a single billable metric, NOT a separate metric. Each filter combination gets its own price and appears as a separate invoice line item. Same pattern Mistral uses with Lago.
`plan_overrides`	Lago primitive on `CreateSubscription` for per-customer negotiated rates within a shared plan structure. Avoids per-tenant plan code pollution. Mandatory structured audit log on use.
Provider-Translated webhook	Edge Idempotency pattern: a third-party identifier (Lago's `lago_id`, Stripe's `event.id`) is lifted into the canonical `Idempotency-Key` at the receiver.
Additive-future rule	Every post-MVP billable signal lands as a new proto event + new handler + new Lago config. No existing event shape, handler, or topic is modified. The mechanical safety net behind "MVP = LLM-only".
`lago.webhook.v1.event`	Per-provider Kafka topic carrying lifted Lago lifecycle webhooks. Flows in production from billing MVP so the future `opa-consumer` has live changes to tail when it ships.
`lago.deadletter.v1.event`	Lago-outage / poison-pill DLQ. BillingService publishes here after N retry attempts; an operator alert fires on first message. Replay is idempotent because `event_id` is deterministic.

Service & Component Inventory

New Services

Component	Purpose	PRD section
`services/billing/v1/` (BillingService)	Kafka consumer on proto-FQN topics; maps each event to Lago `/api/v1/events`; wraps Lago calls in `restate.Run()` for at-least-once + Lago dedup via `transaction_id`	§7 The Missing Piece
`services/lagoproxy/v1/` (LagoProxyService)	Restate proxy in front of Lago's REST API; tenant-scoped at the handler layer; mandatory two-tenant integration tests on every `List*`	§LagoProxyService API Definition + DoD
`services/webhook-receiver/v1/`	Generic third-party webhook ingestion (`/webhooks/<provider>`); HMAC slot per route (deferred for Lago in MVP); lifts provider dedupe ID into `Idempotency-Key`; publishes to per-provider Kafka topics	§6 + §Production Hardening
`services/opa-consumer/v1/`	Projects `lago.webhook.v1.event` into `data.entitlements[tenant_id][project_id]` on OPA. Cold-start bootstraps from `LagoProxyService.ListSubscriptions` + `ListWallets`, then tails Kafka. Built in Policy Engine Phase 1, not billing MVP.	§5 + §Phase 5

New Proto Definitions

Definition	Location	Notes
`LagoProxyService` proto	`apis/billing/v1/...` (greenfield)	Customer / Subscription / Plan / BillableMetric / Event / Invoice / Wallet / Coupon / CreditNote / AddOn RPCs
`Usage` multimodal fields	Additive to `apis/llm/v1/common/types.proto`	`audio_input_tokens`, `audio_output_tokens`, `reasoning_tokens`, `images_generated` — new field numbers, backward-compatible
Post-MVP billable events	`apis/...` (per signal)	`ApiCallBilledEvent`, `StorageUsageReportedEvent`, `VoiceMinutesConsumedEvent`, optional `UserActivityReportedEvent` — each adds a new topic and a new handler, never refactors an existing one

Integrated Components (No Changes Required)

Component	Role
Lago	v1.44.0+ self-hosted, one shared instance per environment (one staging, one prod). API on port 3100, UI on 8085. Postgres requires `pg_partman` extension. Validated end-to-end in `playground/lago-poc/`.
Stripe	Native Lago integration handles invoice → payment flow. Stripe Tax for tax calculation. Stripe Checkout redirect for payment-method update.
OPA	Policy engine — entitlement is one of several policy domains. Deployment topology owned by Policy Engine PRD.
`webhook-receiver` Kafka topic	`lago.webhook.v1.event` flows in MVP even though no consumer exists yet. Default Kafka retention is fine — `opa-consumer`'s cold-start contract reads current state from Lago, not from Kafka replay.
Unkey	`externalId = tenant_id`, `meta.project_id = project_id` — surfaces the tenant + project dimensions billing events tag.
OpenBao	Stores Lago org-level API token (one per environment); future home for per-provider HMAC secrets when HMAC enforcement turns on.
`pkg/restatex`	`Publish()` and `PublishBatch()` build the event envelope, set deterministic `event_id`, derive topic from proto FQN. Producers do not write Lago-specific code.

Observability Additions

Signal	Purpose
`lago.deadletter.v1.event` lag/count alert	First DLQ message indicates Lago outage or poison-pill — fires before scale becomes a problem
BillingService consumer lag per topic	Detects backpressure if Lago throttles or the handler stalls
OPA decision log per evaluation	`decision_id`, `input`, `result`, `reasons`, eval-time metrics — ships to Loki per Policy Engine PRD §FR-6
`plan_overrides` audit log	Mandatory structured log on every `CreateSubscription` / `UpdateSubscription` carrying non-empty overrides — finance contract trail
Billing-gateway request log	Bare-minimum MVP audit on both `internal-admin-gateway` (Travila staff, cross-tenant) and `console-gateway` (tenant self-serve): actor, RPC, sanitized request body, correlation_id, scoped/target tenant_id. Loki retention covers forensic queries until Retraced lands.

Flow 1: Usage Event → Lago

The producer side. One generation run = one billing event with full token attribution.

Generation Workflow completes
  │
  │  Build LLMGenerationCompletedEvent { run_id, status, usage, loop_count, duration_ms }
  ▼
restatex.Publish(ctx, writer, evt, WithTenantID(tid), WithEventName("llm.generation_completed"))
  │
  │  - event_context.event_id = sdkgo.UUID(ctx).String()    ← deterministic, journaled
  │  - event_context.tenant_id = tid
  │  - event_context.emitted_at = RFC3339
  │  - topic = TopicFromProto(evt) = "llm.v1.common.v1.llmgenerationcompletedevent"
  ▼
Kafka (proto-FQN topic)
  │
  ▼
Restate Kafka subscription → BillingService.HandleLLMGenerationCompleted
  │
  │  - Lookup external_subscription_id from event_context.tenant_id
  │  - Map Usage → token_usage events with model/type/modality filter properties
  │    (one event per filter combination — text-only = 2 events, audio = up to 4, reasoning = 3)
  │  - Wrap Lago POST in restate.Run() for at-least-once + retries
  ▼
Lago /api/v1/events
  │  - transaction_id = event_context.event_id     ← Lago dedupes on this
  │  - external_subscription_id = subID
  │  - properties = { tokens, model, type, modality, user_id, conversation_id }
  ▼
Lago aggregates → Plan charges → Invoice (end of period) → Stripe → Customer payment

Why one event per workflow run, not per token

A LLMGenerationCompletedEvent is the natural unit of billable work — it carries the full aggregated Usage for a single agent loop run. Per-token or per-message events would explode Lago's event volume by 3–4 orders of magnitude with no pricing benefit (Lago aggregates with SUM regardless). The same envelope also carries loop_count and duration_ms for SLA reporting without adding a separate channel.

Why proto-FQN topics, not a `billing.usage-events` topic

The 2026-01 draft proposed a billing-specific JSON topic. That was removed in v0.6 because:

It leaked Lago's data model into every producer.
It bypassed restatex.Publish()'s deterministic event_id envelope and would have required a parallel idempotency story.
It created two ingestion pipelines for what is effectively one signal class.

The proto-FQN-topic decision lines up with the GCP Eventarc → Kafka Pipeline and means new billable signals never invent a new pipeline shape — they just add a new proto event under apis/.

Flow 2: Lago Webhook → OPA Entitlement State

The asynchronous write path that keeps OPA's data.entitlements fresh as customers pay invoices, deplete wallets, or get terminated.

Lago lifecycle event (wallet.depleted_ongoing_balance, invoice.paid, etc.)
  │
  ▼
POST /webhooks/lago  (cluster-internal DNS only in MVP — no public Ingress)
  │
  ▼
webhook-receiver
  │  - HMAC slot present per route, disabled for Lago in MVP (cluster-internal)
  │  - Lift body's lago_id into Idempotency-Key per Edge Idempotency Provider-Translated pattern
  │  - Publish via restatex.Publish() onto per-provider topic
  ▼
Kafka topic: lago.webhook.v1.event
  │
  ▼
opa-consumer  (built in Policy Engine Phase 1; topic flows in production from MVP)
  │  - Cold-start: bootstrap state by paginating LagoProxyService.ListSubscriptions + ListWallets,
  │    project each tenant's current state into OPA. Lago is source of truth.
  │  - Steady-state: tail Kafka, apply per-event diff
  │  - Per-tenant Kafka partitioning preserves order; consumer is at-least-once
  │    with last-write-wins on OPA, fine because state is monotonic per event_id
  ▼
OPA Data API
  PUT /v1/data/entitlements/{tenant_id}/{project_id}
  data.entitlements[tenant_id][project_id] = {
    blocked, block_reason, subscription_active,
    wallet_balance_cents, last_updated, last_event_id
  }

Webhook → state diff mapping

Locked in this PRD; opa-consumer is built mechanically against this table.

Lago Webhook Event	Entitlement diff
`wallet.depleted_ongoing_balance`	`blocked=true`, `block_reason="wallet balance depleted"`, `wallet_balance_cents=0`
`wallet.transaction.created` / `wallet.updated`	`blocked=false`, `block_reason=null`, `wallet_balance_cents=<from body>`
`subscription.started`	`subscription_active=true` (does not auto-clear `blocked` — payment failure may still apply)
`subscription.terminated`	`subscription_active=false`, `blocked=true`, `block_reason="subscription terminated"`
`invoice.payment_failure`	`blocked=true`, `block_reason="invoice payment failed"`
`invoice.paid`	`blocked=false`, `block_reason=null`
`customer.payment_overdue`	Recorded but does not flip `blocked` — soft warning

POC vs production keying

The validated POC at playground/lago-poc/ keys on tenant_id only — policies/billing_entitlement.rego reads data.entitlements[input.tenant_id]. The production target adds a project_id second-level key per Projects PRD §10: a small Rego rule change, a project_id field on the Kafka envelope, and one extra path segment in the PUT. Tracked as an implementation item, not a design change.

Flow 3: Real-Time Enforcement (Sync, on Every Request)

The read path. Sub-millisecond per-request decision composable with every other policy domain in the platform.

Client request
  │
  ▼
KrakenD / domain-gateway middleware
  │  After deriveRequestContext() extracts tenant, project, subject
  │  Before routing to downstream actors/services
  ▼
POST /v1/data/billing/entitlement
  body: { input: { tenant_id, project_id, action } }
  │
  │  OPA evaluates Rego against in-memory data.entitlements
  │  Same call evaluates other policy domains: model access, feature access, RBAC, ...
  ▼
Decision: { allow, reasons }
  │
  ├─ allow=true  → route to downstream
  └─ allow=false → 402 Payment Required + result.reasons in body

Why OPA, not a Redis flag

The 2026-01 draft proposed webhook → Redis flag (Option D). It was rejected in v0.5 in favor of webhook → Kafka → OPA Data API (Option E):

Capability	Redis flag	OPA
Latency	<1ms (Redis GET)	<1ms (in-memory Rego eval, Policy Engine PRD §NFR-1)
Composable with model access / feature access / RBAC	No — separate code paths per domain	Yes — `data.entitlements` is one input among many to a single decision
Audit trail	Application logs only	Per-evaluation decision log: `decision_id`, structured `input`, `result`, `reasons`, eval-time metrics — Loki-shipped
Per-tenant + per-project granularity	Flattened key	Native nested keys: `data.entitlements[tenant_id][project_id]`
Per-action discrimination	Binary flag	Rego rules consider `input.action` (e.g., metered actions block on wallet exhaust; non-metered ones don't)
Cross-policy composition	N Redis GETs per request	One `POST /v1/data/...` evaluates the stacked policy and returns unified allow/deny + reasons

OPA matches Redis on raw latency for the entitlement domain alone, and adds the ability to evaluate every other relevant policy in the same call — which a KrakenD plugin asking "is this tenant entitled AND can use this model AND is this feature enabled?" would otherwise need 3+ Redis GETs to answer.

MVP launch state — metering only

Important. Phase 5 (real-time enforcement) is deferred to Policy Engine Phase 1. Billing MVP ships without the gateway middleware wired up and without opa-consumer running.

Concrete consequences during the MVP window:

Plan limits are metered — every request still produces an LLMGenerationCompletedEvent, BillingService dispatches to Lago, invoicing is correct end-of-period.
Plan limits are not enforced in real time — a Free-tier tenant can exceed their token allocation and the gateway will not return 402.
Overage is invoiced as usage; no money is lost, but a runaway client could spike usage faster than ops can react.
Manual kill-switch: LagoProxyService.TerminateSubscription is the only out-of-band block during the MVP window.
The lago.webhook.v1.event topic flows in production from MVP. When opa-consumer arrives months later, it has live changes to tail.

Flow 4: Reconciliation Safety Net

A periodic 5-minute cron queries Lago's ongoing_balance for all active wallets and reconciles by writing through the same opa-consumer projection path. This catches missed or delayed webhooks. Latency is acceptable because it is a backstop, not the primary enforcement path.

Billable Metrics

Metric Code	Description	Aggregation	Field	Filters	Unit	MVP Scope
`token_usage`	LLM tokens consumed	SUM	`tokens`	`model`, `type`, `modality`	tokens	MVP
`image_generation`	AI-generated images	COUNT	—	`model`	images	MVP
`api_calls`	API requests made	COUNT	—	—	requests	Post-MVP (additive)
`active_users`	Monthly active users	COUNT_DISTINCT	`user_id`	—	users	Post-MVP (additive)
`storage_gb`	File storage used	MAX	`gb`	—	GB	Post-MVP (additive)
`voice_minutes`	Voice AI minutes	SUM	`minutes`	—	minutes	Post-MVP (additive)

Filters, not separate metrics

token_usage is a single billable metric with three filter dimensions (model, type, modality). Each filter combination — e.g. (gpt-4o, output, audio) at $76.80 / 1M tokens — gets its own price in the plan's charge definition and its own line item on the invoice. The same pattern Mistral uses in production with Lago and the OpenAI per-token template Lago ships out of the box.

Audio tokens cost 10–13× text tokens on some models (GPT-4o audio input: $32/M vs text input: $2.50/M). Reasoning tokens (o1/o3, DeepSeek R1, Claude extended thinking) also have separate pricing on some models. Lago's __ALL_FILTER_VALUES__ wildcard provides a fallback rate so new models that are not explicitly priced still bill at a sane default.

`user_id` as property, not filter

user_id ships as an event property for analytics — not as a filter. If user_id were a filter, every unique user would become a separate aggregation bucket and an Enterprise tenant could see thousands of line items on a single invoice. The separate active_users metric (COUNT_DISTINCT on user_id) handles per-seat / MAU billing cleanly when product wants it.

MVP scope = LLM-only

MVP ships with token_usage (all modalities) and image_generation, both sourced from the existing LLMGenerationCompletedEvent proto. Post-MVP metrics are governed by the additive-future rule: each new signal is a new proto event + new emit site + new BillingService handler + new Lago metric. Existing event shapes are not modified. Estimated effort per metric: ~1–2 days end-to-end.

This scope reduction keeps billing shippable without locking out future metrics. The pricing dimensions in §Pricing Model below are provisional pending product confirmation of Open Questions 1, 2, 6, 7.

Pricing Model

The billing engine handles four revenue streams, each mapped to a Lago primitive:

Revenue Type	Lago Feature	Billing Cadence	Examples
Platform fee	Plan subscription (base amount)	Monthly or Annual	Pro plan $99/mo, Enterprise custom
Usage-based charges	Plan charges (billable metrics)	Monthly (in arrears)	Tokens, API calls, storage
One-time fees	Add-ons + one-off invoices	On demand	Implementation fee, setup fee, consulting
Prepaid credits	Wallets	Upfront / top-up	Trial credits, committed spend, credit packs

Subscription tiers

Tier	Monthly	Annual	Included	Overage
Free	$0	—	1k API calls, 10k text tokens, no audio/images	Blocked
Pro	$99/mo	$990/yr (2 months free)	50k API calls, 500k text tokens, 50k audio tokens, 100 images	Pay-as-you-go
Enterprise	Custom	Custom (annual only)	Custom (all modalities)	Custom negotiated rates

Annual and monthly Pro plans share the same usage-based charges — only the base fee and interval differ. Lago handles proration automatically when a customer upgrades mid-cycle.

Per-customer negotiated rates — `plan_overrides`, not new plans

Per-customer pricing within a shared plan structure (e.g., Socayo signs an MSA at a discounted token rate) is modeled as Lago plan_overrides on CreateSubscription plus optional Wallet for prepaid commits. This keeps the plan catalog small (no pro_socayo, pro_fitlife, …) while supporting fully arbitrary per-customer pricing.

CreateSubscription and UpdateSubscription accept:

A reference to a base plan_code (mandatory).
Optional plan_overrides (per-charge unit price, included units, monthly/annual commit, minimum commit, etc.).
Optional initial Wallet grant for prepaid commits (separate CreateWallet call, contractually paired when the MSA includes a prepaid component).

Every CreateSubscription / UpdateSubscription call carrying non-empty plan_overrides MUST emit a structured audit log entry capturing tenant_id, subscription_external_id, base plan_code, the per-charge diff between base-plan and override values (structured field — not free-text), actor.user_id, and a reason string linked to the MSA / approval ticket. This is the finance contract trail.

What plan_overrides is NOT for. Structurally different plan shapes (one tenant per-token, another per-active-user) cannot be expressed as overrides. That requires per-tenant Lago instances per the multi-tenancy recommendation — which is the trigger for the per-tenant-instance discussion, not plan_overrides.

Multi-Tenancy

Topology

Travila is the platform; each enterprise tenant (Socayo, FitLife, Wellness Co, …) is a tenant of Travila. In Lago this maps to:

Travila is the Lago Organization — one Lago org per environment (one staging, one prod).
Each enterprise tenant is a Lago Customer under that Organization.
Our tenant_id is the Customer's external_id. Subscriptions, invoices, and usage events all attach to the Customer.

There is exactly one Lago org-level API token per environment (Travila's), held by LagoProxyService and stored in OpenBao. The platform never holds per-tenant Lago tokens — tenant scoping is enforced at the LagoProxyService layer via metadata.tenant_id.

Organization (Travila)
└── Billing Entity (default)
    ├── Customer: socayo          ← our tenant_id = Lago external_id
    │   └── Subscription: Enterprise plan (with plan_overrides)
    ├── Customer: fitlife
    │   └── Subscription: Pro plan
    └── Customer: wellness_co
        └── Subscription: Pro plan

Billable Metrics (shared across all Customers)
├── token_usage (SUM with model+type+modality filters)
├── image_generation (COUNT with model filter)
├── api_calls (COUNT)                ← post-MVP
├── active_users (COUNT_DISTINCT on user_id)  ← post-MVP
├── storage_gb (MAX)                 ← post-MVP
└── voice_minutes (SUM)              ← post-MVP

Multi-tenancy mechanism choice

Lago offers four multi-tenancy mechanisms; the choice depends on what tenants need:

Mechanism	What it provides	What it does NOT provide	When
Customer metadata (OSS)	Application-layer filtering by `metadata.tenant_id`	Native tenant isolation in Lago itself	MVP — free, simple, all tenants share pricing
Billing Entities (premium)	Per-entity Stripe accounts, invoice numbering/branding, dunning campaigns, tax config, email settings	Per-tenant pricing — plans and metrics remain shared across the org	When a tenant needs their own Stripe account or branded invoices
Partner / revenue-share (premium)	Self-billed invoices for resellers	N/A	Reseller models
Per-tenant Lago instances	Fully independent plans, metrics, pricing, infra	~$40-70/mo infra per tenant + N migrations on Lago upgrades	Only when tenants demand fundamentally different plan structures (e.g., one billed per-token, another per-active-user) AND pay enough to justify the operational tax

Scope clarification. Billing Entities provides per-entity Stripe / branding / dunning / tax. It does NOT provide per-tenant pricing — plans and billable metrics remain shared across the Organization. Do not adopt Billing Entities expecting independent pricing per tenant. For per-customer rates within shared plans, use plan_overrides (see above).

Production Hardening

Lago API outage — `lago.deadletter.v1.event`

When Lago is unavailable:

BillingService wraps each Lago call in restate.Run(...). Restate retries with exponential backoff per the configured RunOptions.
After N retry attempts (typical starting point: 8 attempts over ~10 minutes), the handler publishes the original event envelope, the failed Lago request payload, and an error reason to lago.deadletter.v1.event.
An operator alert fires on first message arriving in the DLQ (Loki/Grafana alert against topic lag). One DLQ message is the signal that something is wrong; subsequent messages confirm scale.
An operator-driven DLQ replay tool drains the DLQ back into the BillingService handler once Lago is healthy. Replay is idempotent because event_id (deterministic per Key Design Decisions) maps to Lago's transaction_id.

This pattern is preferred over "stop consuming until Lago recovers" because the LLM-generation Kafka topic is shared with non-billing consumers (analytics, observability) and pausing it would create cross-domain backpressure.

Poison-pill events

Phase 2 adds 4 multimodal fields to llmcommon.Usage. Existing emitters pre-rollout produce events without those fields. To prevent one bad event from halting the consumer:

BillingService MUST treat missing or zero proto fields as valid input. No panic on absence; no validation error for audio_input_tokens == 0. Zero values map to "no audio billed", not "invalid event".
Events that genuinely cannot be processed (unparseable proto bytes, unknown event_name, missing tenant_id mapping) go to the same lago.deadletter.v1.event topic with the parse/lookup error attached. Same alert fires.
This combines with the additive-future rule: adding fields is never a forced migration of existing emitters; missing fields are silent zero-billing for that dimension until the emitter catches up.

Webhook HMAC — deferred for MVP

The webhook-receiver service ships in MVP without HMAC verification on the /webhooks/lago route. This is safe for MVP given the deployment topology:

Lago is self-hosted in the same Kubernetes cluster as the platform.
The receiver is exposed only via cluster-internal DNS (webhook-receiver.<namespace>.svc.cluster.local). No public Ingress, no NodePort, no LoadBalancer.
A Kubernetes NetworkPolicy restricts source pods/namespaces that may POST to the receiver.

Per-route HMAC config slots are designed in from day one, but the Lago route's slot is unset. HMAC implementation is required before any of:

Adding any non-in-cluster webhook provider (e.g., Stripe, GitHub, Pipedream — these always originate outside the cluster).
Public-exposing the receiver for any reason.
Moving Lago to Lago Cloud or any out-of-cluster deployment.

When HMAC is enabled, the secret lives in pkg/secrets (OpenBao-backed) and the receiver supports a dual-secret window during rotation. Tracker: #1692. Whoever adds a webhook provider, public-exposes the receiver, or migrates Lago must close that issue first.

Definition of Done

Two CI-enforced contracts that convert "reviewer-must-catch" risks into static checks.

BillingService — idempotency lint

All billing-source events MUST flow through restatex.Publish() (deterministic event_id) or carry an externally-provided idempotency key. Direct UUID generation as a billing event_id causes Lago double-billing on Restate retries.

The BillingService and webhook-receiver modules ship with a per-module .golangci.yml that enables forbidigo with:

linters-settings:
  forbidigo:
    forbid:
      - p: ^uuid\.(New|NewString|NewV7|NewRandom)$
        msg: "Direct UUID generation is forbidden for billing event_ids — use restatex.Publish() (deterministic event_id from the Restate journal) or lift an externally-provided idempotency key. See docs/prd/billing-metering.md §BillingService DoD."
    analyze-types: true

Exemption process. A // nolint:forbidigo // <reason> comment is allowed only when the UUID is provably NOT a billing event_id (e.g., a trace-correlation ID, a non-billing internal record key). Reviewers MUST acknowledge the exemption explicitly in PR comments.

What this catches. An engineer reaches for uuid.New() to construct a billing event_id; Restate retries the handler; two events with different IDs hit Lago; the customer is charged twice. The lint rule fires at compile time, before merge.

What this does NOT catch. Wrong-but-deterministic keys (e.g. only tenant_id), time-dependent keys (fmt.Sprintf("%s-%d", tenant_id, time.Now().Unix())), or wrappers in non-billing packages that bypass the regex. The PRD-level rule that all billing events go through restatex.Publish() is the safeguard for these cases.

LagoProxyService — tenant-isolation tests

Lago has no native tenant isolation. The risk surface is the List* family — a handler that forgets to forward external_customer_id (or equivalent tenant filter) returns cross-tenant data.

Every List* operation MUST have an integration test that:

Provisions Lago fixtures for at least two tenants (tenant_a, tenant_b) — customers, subscriptions, and the resource being listed.
Calls the operation in tenant_a's RequestContext.
Asserts the response contains only tenant_a's records and zero tenant_b records.

Operations covered: ListCustomers, ListSubscriptions, ListPlans, ListBillableMetrics, ListInvoices, ListEvents, ListWallets, ListCreditNotes, ListAddOns, and any future List* added.

A new List* operation without this test is not "Done" and must not merge.

Why no tenant-scoped client wrapper? Most LagoProxyService operations are Get/Update/Delete*(id) where the ID itself scopes the query. The leak surface is genuinely just List*, and per-handler tests catch it at the same layer where the bug would land. Avoids fighting the Lago SDK's type system on every upgrade. Residual risk acknowledged: if review discipline slips on a future Lago SDK upgrade, a small CI grep-and-assert script becomes the structural fallback.

Admin Console Integration

Two consoles, two gateways, two audiences:

Travila Console (bo-socayo/travila-console, the existing per-tenant admin portal — see Admin Portal PRD for context) — used by an enterprise tenant's own employees (e.g., Socayo's billing admin). Routes through services/console-gateway/. Self-serve only and hard-scoped to the caller's tenant — view own invoices, current plan, usage dashboard, update payment method via Stripe Checkout. Lands as the concrete handlers behind the Billing page that already exists as a placeholder in the Admin Portal PRD.
Travila Internal Admin Portal (future, separate UI — not yet built; see API Keys PRD §Travila Internal Admin Portal) — for Travila/platform staff (sales, finance, customer success, ops). Routes through services/internal-admin-gateway/. Acts on customers — applies plan_overrides, terminates subscriptions, voids invoices, grants wallet credits, manages the plan catalog, runs MRR analytics. Until the frontend ships, Travila staff drive these RPCs via the Restate ingress against internal-admin-gateway directly.

Tenant user  → KrakenD (/admin/v1/*) → services/console-gateway/
                                              │
                                              ├─ extracts tenant_id from RequestContext
                                              ├─ FORCES tenant_id = RequestContext.tenant_id
                                              ├─ ignores any tenant_id in the request body
                                              ├─ logs request to Loki (actor, rpc, scoped tenant_id)
                                              └─ calls LagoProxyService via Restate SDK client
                                                          │
                                                          └─ calls Lago HTTP API

Travila staff → KrakenD (/platform/admin/*) → services/internal-admin-gateway/
                                                       │
                                                       ├─ reads target tenant_id from request body
                                                       ├─ logs request to Loki (actor, rpc, target tenant_id)
                                                       └─ calls LagoProxyService via Restate SDK client
                                                                   │
                                                                   └─ calls Lago HTTP API

The key behavioral difference: console-gateway derives tenant_id from RequestContext and ignores the body, so a tenant user cannot reach another tenant's billing data even by manipulating the request; internal-admin-gateway accepts a target tenant_id from the request (Travila staff can act on any customer).

Operation split

Operation	Console / Gateway
Create a Lago customer for a new tenant	Travila / `internal-admin-gateway`
Assign initial subscription	Travila / `internal-admin-gateway`
Apply per-customer negotiated rates (`plan_overrides`)	Travila / `internal-admin-gateway`
Terminate a customer subscription	Travila / `internal-admin-gateway`
Void an invoice	Travila / `internal-admin-gateway`
Manually credit a customer wallet	Travila / `internal-admin-gateway`
Apply a coupon to a customer	Travila / `internal-admin-gateway`
Void a credit note	Travila / `internal-admin-gateway`
Create / update / delete a Plan	Travila / `internal-admin-gateway`
Create / update / delete a BillableMetric	Travila / `internal-admin-gateway` (or Lago UI)
MRR / gross-revenue analytics	Travila / `internal-admin-gateway`
List own invoices + download PDF	Tenant / `console-gateway`
Get current subscription / plan	Tenant / `console-gateway`
Get own usage dashboard	Tenant / `console-gateway`
Get own wallet balance	Tenant / `console-gateway`
Update payment method (Stripe Checkout redirect)	Tenant / `console-gateway`
Self-serve plan upgrade/downgrade	Tenant / `console-gateway` (post-MVP)

MVP routes all subscription changes through Travila staff (internal-admin-gateway only). Self-serve plan tier changes are post-MVP, gated on the per-tenant console exposing a plan-picker UI.

High-stakes operations and the `reason` field

Some Travila-staff operations have financial or customer-impact consequences serious enough that the console form should ask the operator why. These RPCs (all on internal-admin-gateway) carry an optional free-text reason field on their request proto:

Travila admin action	internal-admin-gateway → LagoProxyService RPC	Why it's high-stakes
Apply per-customer negotiated rates	`CreateSubscription` / `UpdateSubscription` with `plan_overrides`	Finance contract trail (mandatory structured audit log per §`plan_overrides`)
Terminate a customer subscription	`TerminateSubscription`	Cuts off a customer
Void an invoice	`VoidInvoice`	Reverses revenue
Manually credit a customer wallet	`TopUpWallet` (`type = manual_grant`)	Free credit
Apply a coupon to a customer	`ApplyCoupon`	Discount application
Void a credit note	`VoidCreditNote`	Reverses a credit
Change a plan's pricing	`UpdatePlan`	Affects all customers on the plan
Delete a plan	`DeletePlan`	Affects all customers on the plan

MVP behavior. The reason field is captured in internal-admin-gateway's request log when present; an empty value is not rejected. The console form encourages filling it in but the platform doesn't gate on it. plan_overrides is the one exception — its structured audit log line is mandatory MVP scope, not deferred to Retraced.

Self-serve operations on console-gateway do not carry a reason field — they're tenant-bounded by construction and never change cross-customer state.

Audit trail — bare-minimum MVP

For MVP the audit posture is bare-minimum on both gateways, leveraging existing infrastructure (no new services, no new tables):

Both internal-admin-gateway and console-gateway log every billing request to Loki via the standard structured logger (actor, RPC, sanitized body, correlation_id, plus the relevant tenant_id — target on internal-admin-gateway, enforced scope on console-gateway).
LagoProxyService logs the Lago response on success and the structured error on failure.
Loki retention covers forensic queries; correlation_id ties UI action → gateway log → LagoProxyService log → Lago response.

When the platform's Retraced audit-trail pipeline lands, this section gets revised: high-stakes RPCs on internal-admin-gateway gain mandatory reason validation, the plan_overrides audit payload is lifted as a Retraced event body (no schema rework — data is already captured), and other mutating RPCs on either gateway gain structured Retraced events. Self-serve mutations on console-gateway (e.g., payment-method updates) flow into the tenant's own audit log so a tenant's billing admin can see who in their org changed what. Until then, MVP is "good enough for forensic queries via Loki" — explicitly accepted as a Retraced-pre-launch trade-off.

RBAC — deferred per console

console-gateway (Travila Console — per-tenant side). Tenant-bounded scoping is enforced in code (every billing RPC forces tenant_id = RequestContext.tenant_id). Within a tenant, the existing Travila Console role model (Admin / Developer / Viewer per the Admin Portal PRD) already gives Billing the right shape (Admin: full, Developer: read, Viewer: hidden). Mapping from those roles to specific self-serve billing RPCs lands when the Billing handlers ship; until then, MVP grants every authenticated tenant user with access to the Billing page access to every self-serve billing RPC.

internal-admin-gateway (Travila Internal Admin Portal — staff side). No role model yet — and the frontend isn't built yet. Until both land, the gateway authenticates the requester (staff token) but does not gate billing RPCs by role. Every staff member with a valid token can call every billing RPC. Acceptable because the staff list is small and known, and every action is logged with the actor.

Implementer note. Design both gateways' billing handlers to read role/scope from RequestContext even when no enforcement happens today — the eventual mapping is then a config change, not a refactor.

Lago UI vs the two consoles

For Travila staff:

Use the Travila Internal Admin Portal (internal-admin-gateway) — once built — for any operation acting on a specific customer: applying plan_overrides, terminating subscriptions, voiding invoices, granting wallet credits, applying coupons. These need the audit trail captured at the platform layer. Until the UI is built, staff drive the same RPCs via the Restate ingress against internal-admin-gateway directly.
Use the Lago UI for billing-config operations where the operator is intentionally configuring Lago itself — creating new plans, billable metrics, taxes, Stripe integration setup. These are global config changes, not customer-targeted.

For tenant employees:

Use only the Travila Console (console-gateway). Tenants never log into Lago directly and never reach internal-admin-gateway — both the Lago UI and the future Travila Internal Admin Portal are platform-staff tools.

Rollout Phases

The 5-phase plan from the PRD. Phases 1–4 ship billing MVP; Phase 5 is on the Policy Engine team's critical path, not the billing team's.

Phase	Scope	Status
1. Foundation (Week 1–2)	Deploy Lago (one shared instance per env). Configure Stripe. Create billable metrics in Lago. Create Free / Pro / Enterprise plans. Test invoice generation manually.	Not started
2. Event Integration — MVP=LLM-only (Week 3–4)	Add multimodal fields to `llmcommon.Usage` (additive). Parse `prompt_tokens_details` / `completion_tokens_details` in OpenRouter. Wire Generation Workflow modality-aware tracking. Implement `BillingService.HandleLLMGenerationCompleted` Kafka subscription. Implement `SyncModelPricing` job. Tenant-isolation tests on every LagoProxyService `List*`. Confirm `lago.webhook.v1.event` flows in production (no consumer in MVP).	Not started
3. Customer Facing (Week 5–6)	Two parallel tracks — see §Admin Console Integration. Travila staff (`internal-admin-gateway`): add cross-tenant billing handlers (`CreateCustomer`, `CreateSubscription` / `UpdateSubscription` with `plan_overrides`, `TerminateSubscription`, `VoidInvoice`, `TopUpWallet`, `ApplyCoupon`, `VoidCreditNote`, plan CRUD, MRR analytics) with optional `reason` field and mandatory `plan_overrides` audit log. Tenant self-serve (`console-gateway`): list own invoices + download, current plan, usage dashboard, own wallet balance, payment-method update via Stripe Checkout — every handler hard-scoped to `RequestContext.tenant_id`, with server tests asserting that supplying a foreign `tenant_id` does NOT change the downstream Lago scope. Self-serve plan change is post-MVP.	Not started
4. Production Hardening (Week 7–8)	DLQ pattern (`lago.deadletter.v1.event`) for Lago outage and poison pills. Webhook handlers for all Lago lifecycle events. Alerting (payment failures, first DLQ message, approaching plan limits). Dunning. Operator runbooks (DLQ replay, secret rotation when HMAC enables, Lago version-upgrade procedure).	Not started
5. Real-Time Enforcement (blocked on Policy Engine Phase 1)	Build `services/opa-consumer/v1/`. Wire gateway middleware to call OPA at the `billing/entitlement` domain. Smoke + soak tests on staging. Operator runbook: drain `opa-consumer` for upgrades, force re-bootstrap from Lago, handle OPA bundle rollback.	Not started — blocked on Policy Engine PRD

Dependency ordering

Phase depends on	Reason
Phase 2 depends on Phase 1	Lago must exist (with metrics + plans) before BillingService can dispatch events to it
Phase 5 depends on Policy Engine Phase 1	OPA infrastructure (sidecar deployment, `pkg/policy` client, GCS bundle server, decision logging) must exist before `opa-consumer` writes entitlement state and gateway middleware reads it
Phase 5 depends on Phase 2	The `lago.webhook.v1.event` topic (provisioned in Phase 2) must be live before `opa-consumer` can tail it
Phases 3 + 4 are independent of Phase 5	Customer dashboards and DLQ replay don't need real-time enforcement
Post-MVP metric increments are independent of phasing	Each new metric (`api_calls`, `storage_gb`, `voice_minutes`, `active_users`) is the same fixed recipe and can land any time after Phase 2

Post-MVP additive increments

Each item below is a separate ticket opened when product confirms scope. None modifies existing event shapes, handlers, or topics.

api_calls: new ApiCallBilledEvent proto + gateway emit + new BillingService handler + Lago metric.
storage_gb: new StorageUsageReportedEvent proto + StorageService emit + new handler + Lago metric.
voice_minutes: new VoiceMinutesConsumedEvent proto + Pipecat / Daily emit + new handler + Lago metric.
active_users: COUNT_DISTINCT on user_id (piggyback on existing events first; add UserActivityReportedEvent only if non-LLM-only tenants need it).

Out of Scope for MVP

Real-time block-on-limit enforcement — Phase 5, blocked on Policy Engine Phase 1.
opa-consumer projection service — built alongside Policy Engine Phase 1 against the entitlement-state shape and webhook→diff mapping locked in this PRD.
Gateway middleware calling OPA at the billing/entitlement domain — built alongside opa-consumer.
api_calls, storage_gb, voice_minutes, active_users metrics — additive post-MVP increments, ~1–2 days each.
Webhook HMAC enforcement — deferred while webhook-receiver is cluster-internal-only; mandatory before any non-in-cluster provider, public exposure, or Lago Cloud migration. Tracked in #1692.
Per-tenant Lago instances — only adopted when a tenant requires a fundamentally different plan structure (not just rates). For different rates within a shared plan structure, use plan_overrides.
Billing Entities (premium) — adopted later if a tenant needs their own Stripe account or branded invoices. Does NOT enable per-tenant pricing.
Retraced-style structured audit pipeline — MVP audit is Loki structured logs; Retraced lifts the existing plan_overrides payload without schema rework when it lands.
RBAC on either gateway's billing RPCs — deferred per console. console-gateway reuses the existing Travila Console Admin/Developer/Viewer role model (Billing page already in that grid); internal-admin-gateway waits on the future Travila Internal Admin Portal's role model. Tenant-bounded scoping on console-gateway is enforced in code today regardless. Handlers on both gateways read role from RequestContext for forward compatibility.
Self-serve plan upgrade/downgrade on console-gateway — MVP routes all subscription changes through Travila staff (internal-admin-gateway). Self-serve tier changes land after the per-tenant console exposes a plan-picker UI.
Body fingerprint / cross-validation between Restate cached responses and downstream Lago events — separate concern, captured in the Edge Idempotency PRD.

Open Questions

#	Question	Owner	Status
1	Exact pricing for each tier?	Product	Open
2	Free tier limits?	Product	Open
3	Grace period for failed payments before suspension?	Product	Open
4	Should we show real-time cost or end-of-period estimate?	Product	Open
5	~~Lago Cloud vs self-hosted?~~	Engineering	Resolved — self-hosted OSS, one shared instance per environment (one staging, one prod). Drivers: data-residency posture, per-event Cloud pricing tax on growth, marginal operational cost is small given the platform already runs 8+ stateful systems.
6	Should Free tier block audio/image generation entirely, or allow a small quota?	Product	Open
7	Should reasoning tokens be priced at the output rate or at a discount to encourage usage?	Product	Open
8	How to handle OpenRouter's single `audio` pricing field (no input/output split)? Use 2× heuristic or fetch from model detail pages?	Engineering	Open

Items also deferred (with explicit owner): structurally different plan shapes per tenant (biz discussion — only triggers per-tenant Lago instance); per-tenant Lago revenue threshold (biz discussion); capacity sizing and load testing (revisit when onboarding more tenants).

Cross-References

Billing, Metering & Invoicing PRD — the v0.6 design in full
Edge Idempotency Roadmap — Provider-Translated webhook pattern reused by webhook-receiver; same KrakenD namespace rewrite
Tenant & Project Lifecycle Roadmap — multi-tenant substrate; tenant_id → Lago Customer
GCP Eventarc → Kafka Pipeline Roadmap — sibling proto-FQN-topic pipeline; same envelope and Handler-Internal idempotency posture
Policy Engine PRD — owns OPA deployment topology and the Phase 5 blocker
Lago Documentation — upstream reference
Lago + Stripe integration guide
Lago Billable Metric Filters — filter dimensions on a single metric, the same pattern Mistral uses
Lago OpenAI per-token template — Lago's ready-made LLM pricing template
Why Mistral Chose Lago — production case study for LLM token billing
POC code — Docker stack with Lago + Redpanda + OPA, 15 test scripts, Refine + Mantine v5 dashboard, end-to-end validated round-trip in scripts/15-opa-entitlement.sh

Glossary​

Service & Component Inventory​

New Services​

New Proto Definitions​

Integrated Components (No Changes Required)​

Observability Additions​

Flow 1: Usage Event → Lago​

Why one event per workflow run, not per token​

Why proto-FQN topics, not a billing.usage-events topic​

Flow 2: Lago Webhook → OPA Entitlement State​

Webhook → state diff mapping​

POC vs production keying​

Flow 3: Real-Time Enforcement (Sync, on Every Request)​

Why OPA, not a Redis flag​

MVP launch state — metering only​

Flow 4: Reconciliation Safety Net​

Billable Metrics​

Filters, not separate metrics​

user_id as property, not filter​

MVP scope = LLM-only​

Pricing Model​

Subscription tiers​

Per-customer negotiated rates — plan_overrides, not new plans​

Multi-Tenancy​

Topology​

Multi-tenancy mechanism choice​

Production Hardening​

Lago API outage — lago.deadletter.v1.event​

Poison-pill events​

Webhook HMAC — deferred for MVP​

Definition of Done​

BillingService — idempotency lint​

LagoProxyService — tenant-isolation tests​

Admin Console Integration​

Operation split​

High-stakes operations and the reason field​

Audit trail — bare-minimum MVP​

RBAC — deferred per console​

Lago UI vs the two consoles​

Rollout Phases​

Dependency ordering​

Post-MVP additive increments​

Out of Scope for MVP​

Open Questions​

Cross-References​