Skip to main content

Billing, Metering & Invoicing

End-to-end view of how usage becomes revenue — Lago for metering and invoicing, Stripe for collection, OPA for sync entitlement enforcement, and a single proto-FQN Kafka path that keeps producers ignorant of billing. This page consolidates the Billing, Metering & Invoicing PRD (v0.6, Draft — POC validated, architectural review applied) and its dependencies into a single reference. The design is complete; MVP rollout has not started, and real-time enforcement is deferred to Policy Engine Phase 1.

Source PRDs

This page is derived from the Billing, Metering & Invoicing PRD and its closely related dependencies:

Primary (docs/prd/):

  • Billing, Metering & Invoicing — the v0.6 design: Lago + Stripe + OPA, proto-FQN topic ingestion, MVP=LLM-only, additive-metric rule, deferred real-time enforcement

Dependencies (docs/prd/):

  • Policy Engine — owns OPA deployment topology (sidecar vs central), fail-open/closed posture, decision logging. Phase 5 of this PRD blocks on Policy Engine Phase 1 for the opa-consumer and gateway middleware.
  • Edge Idempotency Key — Lago is treated as a Provider-Translated webhook source: lago_id is lifted into the canonical Idempotency-Key at the trust boundary. Producer-side billing events use restatex.Publish()'s deterministic event_id.
  • Multi-Tenancy: Projects — establishes (tenant_id, project_id) as the metering boundary. Production target is data.entitlements[tenant_id][project_id]; POC keys on tenant_id only.
  • Multi-Tenancy: API Keys — Unkey resolves the (tenant_id, project_id) dimensions billing tags every event with.
  • Remove Manager Actor PatternStorageService (post-removal) is the day-1 source for the post-MVP storage_gb metric.
  • Enterprise Readiness — strategic context for plan structures, MSAs, and per-customer negotiated rates.

Related:

Architectural Direction — One Pipe to Lago, One Pipe Back, Composable Decisions

The whole design refuses to invent billing-specific abstractions when an existing platform primitive already does the job:

  • Producers stay billing-agnostic. Generation Workflow and OpenRouter publish their normal domain events (LLMGenerationCompletedEvent) via restatex.Publish() to proto-FQN-derived Kafka topics. The BillingService subscribes from the consumer side. There is no billing.usage-events topic, no shared JSON schema, no producer-side knowledge that "Lago" exists.
  • Idempotency is the journal, not a UUID. Every billing-source event carries a deterministic event_id from Restate's journaled sdkgo.UUID(ctx), which maps directly to Lago's transaction_id. Direct uuid.New() is forbidden for billing event IDs and CI-enforced via per-module forbidigo lint rules. Replays are safe by construction.
  • Inbound Lago webhooks reuse Edge Idempotency. A standalone webhook-receiver Restate service lifts lago_id into the canonical Idempotency-Key per the Provider-Translated pattern and republishes onto a per-provider Kafka topic. The receiver is a thin HTTP→Kafka translator; interpretation lives in the downstream consumer.
  • One decision, many domains. Sync entitlement enforcement runs through OPA, not a Redis flag. The gateway issues a single POST /v1/data/... per request that evaluates entitlement alongside model access, feature access, RBAC, and data residency — one composable decision, one structured audit log, no N-fan-out of per-domain caches.
  • MVP is LLM-only; new metrics are additive. Adding api_calls, storage_gb, voice_minutes, or active_users later is the same fixed recipe (new proto event + new emit site + new BillingService handler + new Lago metric). No existing event shape, handler, or topic gets refactored — guaranteed by the additive-future rule baked into Key Design Decisions.

Canonical reference: Billing, Metering & Invoicing PRD. The PRD's POC findings section catalogs measured latencies (Lago dispatch arrival → OPA-visible decision: ~30ms end-to-end, <1ms OPA decision read) validated in playground/lago-poc/.

Glossary

TermDefinition
LagoOpen-source usage-based billing platform (AGPL v3, self-hosted). Owns metering aggregation, plan/charge calculation, invoice generation, Stripe orchestration. Selected over OpenMeter, Kill Bill, and Stripe Billing.
BillingServiceNew Restate service (services/billing/v1/) that consumes proto-FQN topics, maps each event to a Lago payload via restatex.Run(), and posts to /api/v1/events. Greenfield — does not yet exist in the repo.
LagoProxyServiceNew Restate service that wraps Lago's REST API for admin/query operations (customer CRUD, subscriptions, plans, wallets, invoices). Tenant-scoped at the proxy layer; Lago has no native tenant isolation.
webhook-receiverStandalone Restate service for inbound third-party webhooks. Per-provider routes (/webhooks/lago first), per-provider Kafka topics (lago.webhook.v1.event first). HTTP→Kafka translator only — does not interpret events.
opa-consumerProjection service that tails lago.webhook.v1.event, maps each Lago lifecycle event to an entitlement-state diff, and writes via PUT /v1/data/entitlements/{tenant_id}/{project_id} on OPA. Built alongside Policy Engine Phase 1, NOT in billing MVP.
LLMGenerationCompletedEventThe primary billing event. One per workflow run with aggregated Usage (prompt/completion/total tokens, cost estimate). Published by Generation Workflow via restatex.Publish() on topic llm.v1.common.v1.llmgenerationcompletedevent.
Billable metricLago primitive: a name + aggregation (SUM, COUNT, COUNT_DISTINCT, MAX) + optional filter dimensions. MVP ships token_usage (SUM with model/type/modality filters) and image_generation (COUNT with model).
FilterA dimension on a single billable metric, NOT a separate metric. Each filter combination gets its own price and appears as a separate invoice line item. Same pattern Mistral uses with Lago.
plan_overridesLago primitive on CreateSubscription for per-customer negotiated rates within a shared plan structure. Avoids per-tenant plan code pollution. Mandatory structured audit log on use.
Provider-Translated webhookEdge Idempotency pattern: a third-party identifier (Lago's lago_id, Stripe's event.id) is lifted into the canonical Idempotency-Key at the receiver.
Additive-future ruleEvery post-MVP billable signal lands as a new proto event + new handler + new Lago config. No existing event shape, handler, or topic is modified. The mechanical safety net behind "MVP = LLM-only".
lago.webhook.v1.eventPer-provider Kafka topic carrying lifted Lago lifecycle webhooks. Flows in production from billing MVP so the future opa-consumer has live changes to tail when it ships.
lago.deadletter.v1.eventLago-outage / poison-pill DLQ. BillingService publishes here after N retry attempts; an operator alert fires on first message. Replay is idempotent because event_id is deterministic.

Billing pipeline architecture — Generation Workflow publishes LLMGenerationCompletedEvent via restatex.Publish to a proto-FQN Kafka topic; BillingService Kafka subscription maps each event to Lago via /api/v1/events with transaction_id = event_id; failures after N retries land in lago.deadletter.v1.event; Lago invoices through Stripe; Lago lifecycle webhooks return through webhook-receiver, get lifted to Idempotency-Key, publish to lago.webhook.v1.event; the future opa-consumer (Policy Engine Phase 1, dashed) tails that topic and PUTs into OPA data.entitlements; gateway sync-read calls OPA for sub-millisecond entitlement decisions

Service & Component Inventory

New Services

ComponentPurposePRD section
services/billing/v1/ (BillingService)Kafka consumer on proto-FQN topics; maps each event to Lago /api/v1/events; wraps Lago calls in restate.Run() for at-least-once + Lago dedup via transaction_id§7 The Missing Piece
services/lagoproxy/v1/ (LagoProxyService)Restate proxy in front of Lago's REST API; tenant-scoped at the handler layer; mandatory two-tenant integration tests on every List*§LagoProxyService API Definition + DoD
services/webhook-receiver/v1/Generic third-party webhook ingestion (/webhooks/<provider>); HMAC slot per route (deferred for Lago in MVP); lifts provider dedupe ID into Idempotency-Key; publishes to per-provider Kafka topics§6 + §Production Hardening
services/opa-consumer/v1/Projects lago.webhook.v1.event into data.entitlements[tenant_id][project_id] on OPA. Cold-start bootstraps from LagoProxyService.ListSubscriptions + ListWallets, then tails Kafka. Built in Policy Engine Phase 1, not billing MVP.§5 + §Phase 5

New Proto Definitions

DefinitionLocationNotes
LagoProxyService protoapis/billing/v1/... (greenfield)Customer / Subscription / Plan / BillableMetric / Event / Invoice / Wallet / Coupon / CreditNote / AddOn RPCs
Usage multimodal fieldsAdditive to apis/llm/v1/common/types.protoaudio_input_tokens, audio_output_tokens, reasoning_tokens, images_generated — new field numbers, backward-compatible
Post-MVP billable eventsapis/... (per signal)ApiCallBilledEvent, StorageUsageReportedEvent, VoiceMinutesConsumedEvent, optional UserActivityReportedEvent — each adds a new topic and a new handler, never refactors an existing one

Integrated Components (No Changes Required)

ComponentRole
Lagov1.44.0+ self-hosted, one shared instance per environment (one staging, one prod). API on port 3100, UI on 8085. Postgres requires pg_partman extension. Validated end-to-end in playground/lago-poc/.
StripeNative Lago integration handles invoice → payment flow. Stripe Tax for tax calculation. Stripe Checkout redirect for payment-method update.
OPAPolicy engine — entitlement is one of several policy domains. Deployment topology owned by Policy Engine PRD.
webhook-receiver Kafka topiclago.webhook.v1.event flows in MVP even though no consumer exists yet. Default Kafka retention is fine — opa-consumer's cold-start contract reads current state from Lago, not from Kafka replay.
UnkeyexternalId = tenant_id, meta.project_id = project_id — surfaces the tenant + project dimensions billing events tag.
OpenBaoStores Lago org-level API token (one per environment); future home for per-provider HMAC secrets when HMAC enforcement turns on.
pkg/restatexPublish() and PublishBatch() build the event envelope, set deterministic event_id, derive topic from proto FQN. Producers do not write Lago-specific code.

Observability Additions

SignalPurpose
lago.deadletter.v1.event lag/count alertFirst DLQ message indicates Lago outage or poison-pill — fires before scale becomes a problem
BillingService consumer lag per topicDetects backpressure if Lago throttles or the handler stalls
OPA decision log per evaluationdecision_id, input, result, reasons, eval-time metrics — ships to Loki per Policy Engine PRD §FR-6
plan_overrides audit logMandatory structured log on every CreateSubscription / UpdateSubscription carrying non-empty overrides — finance contract trail
Billing-gateway request logBare-minimum MVP audit on both internal-admin-gateway (Travila staff, cross-tenant) and console-gateway (tenant self-serve): actor, RPC, sanitized request body, correlation_id, scoped/target tenant_id. Loki retention covers forensic queries until Retraced lands.

Flow 1: Usage Event → Lago

The producer side. One generation run = one billing event with full token attribution.

Generation Workflow completes

│ Build LLMGenerationCompletedEvent { run_id, status, usage, loop_count, duration_ms }

restatex.Publish(ctx, writer, evt, WithTenantID(tid), WithEventName("llm.generation_completed"))

│ - event_context.event_id = sdkgo.UUID(ctx).String() ← deterministic, journaled
│ - event_context.tenant_id = tid
│ - event_context.emitted_at = RFC3339
│ - topic = TopicFromProto(evt) = "llm.v1.common.v1.llmgenerationcompletedevent"

Kafka (proto-FQN topic)


Restate Kafka subscription → BillingService.HandleLLMGenerationCompleted

│ - Lookup external_subscription_id from event_context.tenant_id
│ - Map Usage → token_usage events with model/type/modality filter properties
│ (one event per filter combination — text-only = 2 events, audio = up to 4, reasoning = 3)
│ - Wrap Lago POST in restate.Run() for at-least-once + retries

Lago /api/v1/events
│ - transaction_id = event_context.event_id ← Lago dedupes on this
│ - external_subscription_id = subID
│ - properties = { tokens, model, type, modality, user_id, conversation_id }

Lago aggregates → Plan charges → Invoice (end of period) → Stripe → Customer payment

Why one event per workflow run, not per token

A LLMGenerationCompletedEvent is the natural unit of billable work — it carries the full aggregated Usage for a single agent loop run. Per-token or per-message events would explode Lago's event volume by 3–4 orders of magnitude with no pricing benefit (Lago aggregates with SUM regardless). The same envelope also carries loop_count and duration_ms for SLA reporting without adding a separate channel.

Why proto-FQN topics, not a billing.usage-events topic

The 2026-01 draft proposed a billing-specific JSON topic. That was removed in v0.6 because:

  • It leaked Lago's data model into every producer.
  • It bypassed restatex.Publish()'s deterministic event_id envelope and would have required a parallel idempotency story.
  • It created two ingestion pipelines for what is effectively one signal class.

The proto-FQN-topic decision lines up with the GCP Eventarc → Kafka Pipeline and means new billable signals never invent a new pipeline shape — they just add a new proto event under apis/.

Flow 2: Lago Webhook → OPA Entitlement State

The asynchronous write path that keeps OPA's data.entitlements fresh as customers pay invoices, deplete wallets, or get terminated.

Lago lifecycle event (wallet.depleted_ongoing_balance, invoice.paid, etc.)


POST /webhooks/lago (cluster-internal DNS only in MVP — no public Ingress)


webhook-receiver
│ - HMAC slot present per route, disabled for Lago in MVP (cluster-internal)
│ - Lift body's lago_id into Idempotency-Key per Edge Idempotency Provider-Translated pattern
│ - Publish via restatex.Publish() onto per-provider topic

Kafka topic: lago.webhook.v1.event


opa-consumer (built in Policy Engine Phase 1; topic flows in production from MVP)
│ - Cold-start: bootstrap state by paginating LagoProxyService.ListSubscriptions + ListWallets,
│ project each tenant's current state into OPA. Lago is source of truth.
│ - Steady-state: tail Kafka, apply per-event diff
│ - Per-tenant Kafka partitioning preserves order; consumer is at-least-once
│ with last-write-wins on OPA, fine because state is monotonic per event_id

OPA Data API
PUT /v1/data/entitlements/{tenant_id}/{project_id}
data.entitlements[tenant_id][project_id] = {
blocked, block_reason, subscription_active,
wallet_balance_cents, last_updated, last_event_id
}

Webhook → state diff mapping

Locked in this PRD; opa-consumer is built mechanically against this table.

Lago Webhook EventEntitlement diff
wallet.depleted_ongoing_balanceblocked=true, block_reason="wallet balance depleted", wallet_balance_cents=0
wallet.transaction.created / wallet.updatedblocked=false, block_reason=null, wallet_balance_cents=<from body>
subscription.startedsubscription_active=true (does not auto-clear blocked — payment failure may still apply)
subscription.terminatedsubscription_active=false, blocked=true, block_reason="subscription terminated"
invoice.payment_failureblocked=true, block_reason="invoice payment failed"
invoice.paidblocked=false, block_reason=null
customer.payment_overdueRecorded but does not flip blocked — soft warning

POC vs production keying

The validated POC at playground/lago-poc/ keys on tenant_id only — policies/billing_entitlement.rego reads data.entitlements[input.tenant_id]. The production target adds a project_id second-level key per Projects PRD §10: a small Rego rule change, a project_id field on the Kafka envelope, and one extra path segment in the PUT. Tracked as an implementation item, not a design change.

Flow 3: Real-Time Enforcement (Sync, on Every Request)

The read path. Sub-millisecond per-request decision composable with every other policy domain in the platform.

Client request


KrakenD / domain-gateway middleware
│ After deriveRequestContext() extracts tenant, project, subject
│ Before routing to downstream actors/services

POST /v1/data/billing/entitlement
body: { input: { tenant_id, project_id, action } }

│ OPA evaluates Rego against in-memory data.entitlements
│ Same call evaluates other policy domains: model access, feature access, RBAC, ...

Decision: { allow, reasons }

├─ allow=true → route to downstream
└─ allow=false → 402 Payment Required + result.reasons in body

Why OPA, not a Redis flag

The 2026-01 draft proposed webhook → Redis flag (Option D). It was rejected in v0.5 in favor of webhook → Kafka → OPA Data API (Option E):

CapabilityRedis flagOPA
Latency<1ms (Redis GET)<1ms (in-memory Rego eval, Policy Engine PRD §NFR-1)
Composable with model access / feature access / RBACNo — separate code paths per domainYes — data.entitlements is one input among many to a single decision
Audit trailApplication logs onlyPer-evaluation decision log: decision_id, structured input, result, reasons, eval-time metrics — Loki-shipped
Per-tenant + per-project granularityFlattened keyNative nested keys: data.entitlements[tenant_id][project_id]
Per-action discriminationBinary flagRego rules consider input.action (e.g., metered actions block on wallet exhaust; non-metered ones don't)
Cross-policy compositionN Redis GETs per requestOne POST /v1/data/... evaluates the stacked policy and returns unified allow/deny + reasons

OPA matches Redis on raw latency for the entitlement domain alone, and adds the ability to evaluate every other relevant policy in the same call — which a KrakenD plugin asking "is this tenant entitled AND can use this model AND is this feature enabled?" would otherwise need 3+ Redis GETs to answer.

MVP launch state — metering only

Important. Phase 5 (real-time enforcement) is deferred to Policy Engine Phase 1. Billing MVP ships without the gateway middleware wired up and without opa-consumer running.

Concrete consequences during the MVP window:

  • Plan limits are metered — every request still produces an LLMGenerationCompletedEvent, BillingService dispatches to Lago, invoicing is correct end-of-period.
  • Plan limits are not enforced in real time — a Free-tier tenant can exceed their token allocation and the gateway will not return 402.
  • Overage is invoiced as usage; no money is lost, but a runaway client could spike usage faster than ops can react.
  • Manual kill-switch: LagoProxyService.TerminateSubscription is the only out-of-band block during the MVP window.
  • The lago.webhook.v1.event topic flows in production from MVP. When opa-consumer arrives months later, it has live changes to tail.

Flow 4: Reconciliation Safety Net

A periodic 5-minute cron queries Lago's ongoing_balance for all active wallets and reconciles by writing through the same opa-consumer projection path. This catches missed or delayed webhooks. Latency is acceptable because it is a backstop, not the primary enforcement path.

Billable Metrics

Metric CodeDescriptionAggregationFieldFiltersUnitMVP Scope
token_usageLLM tokens consumedSUMtokensmodel, type, modalitytokensMVP
image_generationAI-generated imagesCOUNTmodelimagesMVP
api_callsAPI requests madeCOUNTrequestsPost-MVP (additive)
active_usersMonthly active usersCOUNT_DISTINCTuser_idusersPost-MVP (additive)
storage_gbFile storage usedMAXgbGBPost-MVP (additive)
voice_minutesVoice AI minutesSUMminutesminutesPost-MVP (additive)

Filters, not separate metrics

Token pricing filters — single billable metric token_usage with three filter dimensions (model, type, modality) fans out to multiple emitted events per request; a text-only request emits 2 events, an audio request emits up to 4 (text+audio per direction), a reasoning-model request emits 3 (input + text output + reasoning output); each filter combination maps to a separate price line in the plan&#39;s charge table with ALL wildcard fallback rates

token_usage is a single billable metric with three filter dimensions (model, type, modality). Each filter combination — e.g. (gpt-4o, output, audio) at $76.80 / 1M tokens — gets its own price in the plan's charge definition and its own line item on the invoice. The same pattern Mistral uses in production with Lago and the OpenAI per-token template Lago ships out of the box.

Audio tokens cost 10–13× text tokens on some models (GPT-4o audio input: $32/M vs text input: $2.50/M). Reasoning tokens (o1/o3, DeepSeek R1, Claude extended thinking) also have separate pricing on some models. Lago's __ALL_FILTER_VALUES__ wildcard provides a fallback rate so new models that are not explicitly priced still bill at a sane default.

user_id as property, not filter

user_id ships as an event property for analytics — not as a filter. If user_id were a filter, every unique user would become a separate aggregation bucket and an Enterprise tenant could see thousands of line items on a single invoice. The separate active_users metric (COUNT_DISTINCT on user_id) handles per-seat / MAU billing cleanly when product wants it.

MVP scope = LLM-only

MVP ships with token_usage (all modalities) and image_generation, both sourced from the existing LLMGenerationCompletedEvent proto. Post-MVP metrics are governed by the additive-future rule: each new signal is a new proto event + new emit site + new BillingService handler + new Lago metric. Existing event shapes are not modified. Estimated effort per metric: ~1–2 days end-to-end.

This scope reduction keeps billing shippable without locking out future metrics. The pricing dimensions in §Pricing Model below are provisional pending product confirmation of Open Questions 1, 2, 6, 7.

Pricing Model

The billing engine handles four revenue streams, each mapped to a Lago primitive:

Revenue TypeLago FeatureBilling CadenceExamples
Platform feePlan subscription (base amount)Monthly or AnnualPro plan $99/mo, Enterprise custom
Usage-based chargesPlan charges (billable metrics)Monthly (in arrears)Tokens, API calls, storage
One-time feesAdd-ons + one-off invoicesOn demandImplementation fee, setup fee, consulting
Prepaid creditsWalletsUpfront / top-upTrial credits, committed spend, credit packs

Subscription tiers

TierMonthlyAnnualIncludedOverage
Free$01k API calls, 10k text tokens, no audio/imagesBlocked
Pro$99/mo$990/yr (2 months free)50k API calls, 500k text tokens, 50k audio tokens, 100 imagesPay-as-you-go
EnterpriseCustomCustom (annual only)Custom (all modalities)Custom negotiated rates

Annual and monthly Pro plans share the same usage-based charges — only the base fee and interval differ. Lago handles proration automatically when a customer upgrades mid-cycle.

Per-customer negotiated rates — plan_overrides, not new plans

Per-customer pricing within a shared plan structure (e.g., Socayo signs an MSA at a discounted token rate) is modeled as Lago plan_overrides on CreateSubscription plus optional Wallet for prepaid commits. This keeps the plan catalog small (no pro_socayo, pro_fitlife, …) while supporting fully arbitrary per-customer pricing.

CreateSubscription and UpdateSubscription accept:

  • A reference to a base plan_code (mandatory).
  • Optional plan_overrides (per-charge unit price, included units, monthly/annual commit, minimum commit, etc.).
  • Optional initial Wallet grant for prepaid commits (separate CreateWallet call, contractually paired when the MSA includes a prepaid component).

Every CreateSubscription / UpdateSubscription call carrying non-empty plan_overrides MUST emit a structured audit log entry capturing tenant_id, subscription_external_id, base plan_code, the per-charge diff between base-plan and override values (structured field — not free-text), actor.user_id, and a reason string linked to the MSA / approval ticket. This is the finance contract trail.

What plan_overrides is NOT for. Structurally different plan shapes (one tenant per-token, another per-active-user) cannot be expressed as overrides. That requires per-tenant Lago instances per the multi-tenancy recommendation — which is the trigger for the per-tenant-instance discussion, not plan_overrides.

Multi-Tenancy

Topology

Travila is the platform; each enterprise tenant (Socayo, FitLife, Wellness Co, …) is a tenant of Travila. In Lago this maps to:

  • Travila is the Lago Organization — one Lago org per environment (one staging, one prod).
  • Each enterprise tenant is a Lago Customer under that Organization.
  • Our tenant_id is the Customer's external_id. Subscriptions, invoices, and usage events all attach to the Customer.

There is exactly one Lago org-level API token per environment (Travila's), held by LagoProxyService and stored in OpenBao. The platform never holds per-tenant Lago tokens — tenant scoping is enforced at the LagoProxyService layer via metadata.tenant_id.

Organization (Travila)
└── Billing Entity (default)
├── Customer: socayo ← our tenant_id = Lago external_id
│ └── Subscription: Enterprise plan (with plan_overrides)
├── Customer: fitlife
│ └── Subscription: Pro plan
└── Customer: wellness_co
└── Subscription: Pro plan

Billable Metrics (shared across all Customers)
├── token_usage (SUM with model+type+modality filters)
├── image_generation (COUNT with model filter)
├── api_calls (COUNT) ← post-MVP
├── active_users (COUNT_DISTINCT on user_id) ← post-MVP
├── storage_gb (MAX) ← post-MVP
└── voice_minutes (SUM) ← post-MVP

Multi-tenancy mechanism choice

Lago offers four multi-tenancy mechanisms; the choice depends on what tenants need:

MechanismWhat it providesWhat it does NOT provideWhen
Customer metadata (OSS)Application-layer filtering by metadata.tenant_idNative tenant isolation in Lago itselfMVP — free, simple, all tenants share pricing
Billing Entities (premium)Per-entity Stripe accounts, invoice numbering/branding, dunning campaigns, tax config, email settingsPer-tenant pricing — plans and metrics remain shared across the orgWhen a tenant needs their own Stripe account or branded invoices
Partner / revenue-share (premium)Self-billed invoices for resellersN/AReseller models
Per-tenant Lago instancesFully independent plans, metrics, pricing, infra~$40-70/mo infra per tenant + N migrations on Lago upgradesOnly when tenants demand fundamentally different plan structures (e.g., one billed per-token, another per-active-user) AND pay enough to justify the operational tax

Scope clarification. Billing Entities provides per-entity Stripe / branding / dunning / tax. It does NOT provide per-tenant pricing — plans and billable metrics remain shared across the Organization. Do not adopt Billing Entities expecting independent pricing per tenant. For per-customer rates within shared plans, use plan_overrides (see above).

Production Hardening

Lago API outage — lago.deadletter.v1.event

When Lago is unavailable:

  • BillingService wraps each Lago call in restate.Run(...). Restate retries with exponential backoff per the configured RunOptions.
  • After N retry attempts (typical starting point: 8 attempts over ~10 minutes), the handler publishes the original event envelope, the failed Lago request payload, and an error reason to lago.deadletter.v1.event.
  • An operator alert fires on first message arriving in the DLQ (Loki/Grafana alert against topic lag). One DLQ message is the signal that something is wrong; subsequent messages confirm scale.
  • An operator-driven DLQ replay tool drains the DLQ back into the BillingService handler once Lago is healthy. Replay is idempotent because event_id (deterministic per Key Design Decisions) maps to Lago's transaction_id.

This pattern is preferred over "stop consuming until Lago recovers" because the LLM-generation Kafka topic is shared with non-billing consumers (analytics, observability) and pausing it would create cross-domain backpressure.

Poison-pill events

Phase 2 adds 4 multimodal fields to llmcommon.Usage. Existing emitters pre-rollout produce events without those fields. To prevent one bad event from halting the consumer:

  • BillingService MUST treat missing or zero proto fields as valid input. No panic on absence; no validation error for audio_input_tokens == 0. Zero values map to "no audio billed", not "invalid event".
  • Events that genuinely cannot be processed (unparseable proto bytes, unknown event_name, missing tenant_id mapping) go to the same lago.deadletter.v1.event topic with the parse/lookup error attached. Same alert fires.
  • This combines with the additive-future rule: adding fields is never a forced migration of existing emitters; missing fields are silent zero-billing for that dimension until the emitter catches up.

Webhook HMAC — deferred for MVP

The webhook-receiver service ships in MVP without HMAC verification on the /webhooks/lago route. This is safe for MVP given the deployment topology:

  • Lago is self-hosted in the same Kubernetes cluster as the platform.
  • The receiver is exposed only via cluster-internal DNS (webhook-receiver.<namespace>.svc.cluster.local). No public Ingress, no NodePort, no LoadBalancer.
  • A Kubernetes NetworkPolicy restricts source pods/namespaces that may POST to the receiver.

Per-route HMAC config slots are designed in from day one, but the Lago route's slot is unset. HMAC implementation is required before any of:

  1. Adding any non-in-cluster webhook provider (e.g., Stripe, GitHub, Pipedream — these always originate outside the cluster).
  2. Public-exposing the receiver for any reason.
  3. Moving Lago to Lago Cloud or any out-of-cluster deployment.

When HMAC is enabled, the secret lives in pkg/secrets (OpenBao-backed) and the receiver supports a dual-secret window during rotation. Tracker: #1692. Whoever adds a webhook provider, public-exposes the receiver, or migrates Lago must close that issue first.

Definition of Done

Two CI-enforced contracts that convert "reviewer-must-catch" risks into static checks.

BillingService — idempotency lint

All billing-source events MUST flow through restatex.Publish() (deterministic event_id) or carry an externally-provided idempotency key. Direct UUID generation as a billing event_id causes Lago double-billing on Restate retries.

The BillingService and webhook-receiver modules ship with a per-module .golangci.yml that enables forbidigo with:

linters-settings:
forbidigo:
forbid:
- p: ^uuid\.(New|NewString|NewV7|NewRandom)$
msg: "Direct UUID generation is forbidden for billing event_ids — use restatex.Publish() (deterministic event_id from the Restate journal) or lift an externally-provided idempotency key. See docs/prd/billing-metering.md §BillingService DoD."
analyze-types: true

Exemption process. A // nolint:forbidigo // <reason> comment is allowed only when the UUID is provably NOT a billing event_id (e.g., a trace-correlation ID, a non-billing internal record key). Reviewers MUST acknowledge the exemption explicitly in PR comments.

What this catches. An engineer reaches for uuid.New() to construct a billing event_id; Restate retries the handler; two events with different IDs hit Lago; the customer is charged twice. The lint rule fires at compile time, before merge.

What this does NOT catch. Wrong-but-deterministic keys (e.g. only tenant_id), time-dependent keys (fmt.Sprintf("%s-%d", tenant_id, time.Now().Unix())), or wrappers in non-billing packages that bypass the regex. The PRD-level rule that all billing events go through restatex.Publish() is the safeguard for these cases.

LagoProxyService — tenant-isolation tests

Lago has no native tenant isolation. The risk surface is the List* family — a handler that forgets to forward external_customer_id (or equivalent tenant filter) returns cross-tenant data.

Every List* operation MUST have an integration test that:

  1. Provisions Lago fixtures for at least two tenants (tenant_a, tenant_b) — customers, subscriptions, and the resource being listed.
  2. Calls the operation in tenant_a's RequestContext.
  3. Asserts the response contains only tenant_a's records and zero tenant_b records.

Operations covered: ListCustomers, ListSubscriptions, ListPlans, ListBillableMetrics, ListInvoices, ListEvents, ListWallets, ListCreditNotes, ListAddOns, and any future List* added.

A new List* operation without this test is not "Done" and must not merge.

Why no tenant-scoped client wrapper? Most LagoProxyService operations are Get/Update/Delete*(id) where the ID itself scopes the query. The leak surface is genuinely just List*, and per-handler tests catch it at the same layer where the bug would land. Avoids fighting the Lago SDK's type system on every upgrade. Residual risk acknowledged: if review discipline slips on a future Lago SDK upgrade, a small CI grep-and-assert script becomes the structural fallback.

Admin Console Integration

Billing admin console integration — two consoles route to two distinct gateways that both call the shared LagoProxyService. Tenant employees use the existing Travila Console (bo-socayo/travila-console) over /admin/v1/* into services/console-gateway/, which forces tenant_id = RequestContext.tenant_id and ignores any tenant_id in the request body; supported operations are self-serve only (list own invoices + download, current plan, usage dashboard, own wallet balance, payment-method update via Stripe Checkout, with self-serve plan upgrade post-MVP). Travila/platform staff use the future Travila Internal Admin Portal (not yet built — staff drive RPCs via the Restate ingress until then) over /platform/admin/* into services/internal-admin-gateway/, which reads a target tenant_id from the request body so staff can act on any customer; supported operations are cross-tenant (CreateCustomer, CreateSubscription/UpdateSubscription with plan_overrides, TerminateSubscription, VoidInvoice, VoidCreditNote, TopUpWallet manual_grant, ApplyCoupon, plan and BillableMetric CRUD, MRR analytics) with optional reason field on high-stakes ops and mandatory plan_overrides audit log. LagoProxyService holds the org-level Lago API token in OpenBao and has mandatory two-tenant integration tests on every List* RPC.

Two consoles, two gateways, two audiences:

  • Travila Console (bo-socayo/travila-console, the existing per-tenant admin portal — see Admin Portal PRD for context) — used by an enterprise tenant's own employees (e.g., Socayo's billing admin). Routes through services/console-gateway/. Self-serve only and hard-scoped to the caller's tenant — view own invoices, current plan, usage dashboard, update payment method via Stripe Checkout. Lands as the concrete handlers behind the Billing page that already exists as a placeholder in the Admin Portal PRD.
  • Travila Internal Admin Portal (future, separate UI — not yet built; see API Keys PRD §Travila Internal Admin Portal) — for Travila/platform staff (sales, finance, customer success, ops). Routes through services/internal-admin-gateway/. Acts on customers — applies plan_overrides, terminates subscriptions, voids invoices, grants wallet credits, manages the plan catalog, runs MRR analytics. Until the frontend ships, Travila staff drive these RPCs via the Restate ingress against internal-admin-gateway directly.
Tenant user  → KrakenD (/admin/v1/*) → services/console-gateway/

├─ extracts tenant_id from RequestContext
├─ FORCES tenant_id = RequestContext.tenant_id
├─ ignores any tenant_id in the request body
├─ logs request to Loki (actor, rpc, scoped tenant_id)
└─ calls LagoProxyService via Restate SDK client

└─ calls Lago HTTP API

Travila staff → KrakenD (/platform/admin/*) → services/internal-admin-gateway/

├─ reads target tenant_id from request body
├─ logs request to Loki (actor, rpc, target tenant_id)
└─ calls LagoProxyService via Restate SDK client

└─ calls Lago HTTP API

The key behavioral difference: console-gateway derives tenant_id from RequestContext and ignores the body, so a tenant user cannot reach another tenant's billing data even by manipulating the request; internal-admin-gateway accepts a target tenant_id from the request (Travila staff can act on any customer).

Operation split

OperationConsole / Gateway
Create a Lago customer for a new tenantTravila / internal-admin-gateway
Assign initial subscriptionTravila / internal-admin-gateway
Apply per-customer negotiated rates (plan_overrides)Travila / internal-admin-gateway
Terminate a customer subscriptionTravila / internal-admin-gateway
Void an invoiceTravila / internal-admin-gateway
Manually credit a customer walletTravila / internal-admin-gateway
Apply a coupon to a customerTravila / internal-admin-gateway
Void a credit noteTravila / internal-admin-gateway
Create / update / delete a PlanTravila / internal-admin-gateway
Create / update / delete a BillableMetricTravila / internal-admin-gateway (or Lago UI)
MRR / gross-revenue analyticsTravila / internal-admin-gateway
List own invoices + download PDFTenant / console-gateway
Get current subscription / planTenant / console-gateway
Get own usage dashboardTenant / console-gateway
Get own wallet balanceTenant / console-gateway
Update payment method (Stripe Checkout redirect)Tenant / console-gateway
Self-serve plan upgrade/downgradeTenant / console-gateway (post-MVP)

MVP routes all subscription changes through Travila staff (internal-admin-gateway only). Self-serve plan tier changes are post-MVP, gated on the per-tenant console exposing a plan-picker UI.

High-stakes operations and the reason field

Some Travila-staff operations have financial or customer-impact consequences serious enough that the console form should ask the operator why. These RPCs (all on internal-admin-gateway) carry an optional free-text reason field on their request proto:

Travila admin actioninternal-admin-gateway → LagoProxyService RPCWhy it's high-stakes
Apply per-customer negotiated ratesCreateSubscription / UpdateSubscription with plan_overridesFinance contract trail (mandatory structured audit log per §plan_overrides)
Terminate a customer subscriptionTerminateSubscriptionCuts off a customer
Void an invoiceVoidInvoiceReverses revenue
Manually credit a customer walletTopUpWallet (type = manual_grant)Free credit
Apply a coupon to a customerApplyCouponDiscount application
Void a credit noteVoidCreditNoteReverses a credit
Change a plan's pricingUpdatePlanAffects all customers on the plan
Delete a planDeletePlanAffects all customers on the plan

MVP behavior. The reason field is captured in internal-admin-gateway's request log when present; an empty value is not rejected. The console form encourages filling it in but the platform doesn't gate on it. plan_overrides is the one exception — its structured audit log line is mandatory MVP scope, not deferred to Retraced.

Self-serve operations on console-gateway do not carry a reason field — they're tenant-bounded by construction and never change cross-customer state.

Audit trail — bare-minimum MVP

For MVP the audit posture is bare-minimum on both gateways, leveraging existing infrastructure (no new services, no new tables):

  • Both internal-admin-gateway and console-gateway log every billing request to Loki via the standard structured logger (actor, RPC, sanitized body, correlation_id, plus the relevant tenant_id — target on internal-admin-gateway, enforced scope on console-gateway).
  • LagoProxyService logs the Lago response on success and the structured error on failure.
  • Loki retention covers forensic queries; correlation_id ties UI action → gateway log → LagoProxyService log → Lago response.

When the platform's Retraced audit-trail pipeline lands, this section gets revised: high-stakes RPCs on internal-admin-gateway gain mandatory reason validation, the plan_overrides audit payload is lifted as a Retraced event body (no schema rework — data is already captured), and other mutating RPCs on either gateway gain structured Retraced events. Self-serve mutations on console-gateway (e.g., payment-method updates) flow into the tenant's own audit log so a tenant's billing admin can see who in their org changed what. Until then, MVP is "good enough for forensic queries via Loki" — explicitly accepted as a Retraced-pre-launch trade-off.

RBAC — deferred per console

console-gateway (Travila Console — per-tenant side). Tenant-bounded scoping is enforced in code (every billing RPC forces tenant_id = RequestContext.tenant_id). Within a tenant, the existing Travila Console role model (Admin / Developer / Viewer per the Admin Portal PRD) already gives Billing the right shape (Admin: full, Developer: read, Viewer: hidden). Mapping from those roles to specific self-serve billing RPCs lands when the Billing handlers ship; until then, MVP grants every authenticated tenant user with access to the Billing page access to every self-serve billing RPC.

internal-admin-gateway (Travila Internal Admin Portal — staff side). No role model yet — and the frontend isn't built yet. Until both land, the gateway authenticates the requester (staff token) but does not gate billing RPCs by role. Every staff member with a valid token can call every billing RPC. Acceptable because the staff list is small and known, and every action is logged with the actor.

Implementer note. Design both gateways' billing handlers to read role/scope from RequestContext even when no enforcement happens today — the eventual mapping is then a config change, not a refactor.

Lago UI vs the two consoles

For Travila staff:

  • Use the Travila Internal Admin Portal (internal-admin-gateway) — once built — for any operation acting on a specific customer: applying plan_overrides, terminating subscriptions, voiding invoices, granting wallet credits, applying coupons. These need the audit trail captured at the platform layer. Until the UI is built, staff drive the same RPCs via the Restate ingress against internal-admin-gateway directly.
  • Use the Lago UI for billing-config operations where the operator is intentionally configuring Lago itself — creating new plans, billable metrics, taxes, Stripe integration setup. These are global config changes, not customer-targeted.

For tenant employees:

  • Use only the Travila Console (console-gateway). Tenants never log into Lago directly and never reach internal-admin-gateway — both the Lago UI and the future Travila Internal Admin Portal are platform-staff tools.

Rollout Phases

The 5-phase plan from the PRD. Phases 1–4 ship billing MVP; Phase 5 is on the Policy Engine team's critical path, not the billing team's.

PhaseScopeStatus
1. Foundation (Week 1–2)Deploy Lago (one shared instance per env). Configure Stripe. Create billable metrics in Lago. Create Free / Pro / Enterprise plans. Test invoice generation manually.Not started
2. Event Integration — MVP=LLM-only (Week 3–4)Add multimodal fields to llmcommon.Usage (additive). Parse prompt_tokens_details / completion_tokens_details in OpenRouter. Wire Generation Workflow modality-aware tracking. Implement BillingService.HandleLLMGenerationCompleted Kafka subscription. Implement SyncModelPricing job. Tenant-isolation tests on every LagoProxyService List*. Confirm lago.webhook.v1.event flows in production (no consumer in MVP).Not started
3. Customer Facing (Week 5–6)Two parallel tracks — see §Admin Console Integration. Travila staff (internal-admin-gateway): add cross-tenant billing handlers (CreateCustomer, CreateSubscription / UpdateSubscription with plan_overrides, TerminateSubscription, VoidInvoice, TopUpWallet, ApplyCoupon, VoidCreditNote, plan CRUD, MRR analytics) with optional reason field and mandatory plan_overrides audit log. Tenant self-serve (console-gateway): list own invoices + download, current plan, usage dashboard, own wallet balance, payment-method update via Stripe Checkout — every handler hard-scoped to RequestContext.tenant_id, with server tests asserting that supplying a foreign tenant_id does NOT change the downstream Lago scope. Self-serve plan change is post-MVP.Not started
4. Production Hardening (Week 7–8)DLQ pattern (lago.deadletter.v1.event) for Lago outage and poison pills. Webhook handlers for all Lago lifecycle events. Alerting (payment failures, first DLQ message, approaching plan limits). Dunning. Operator runbooks (DLQ replay, secret rotation when HMAC enables, Lago version-upgrade procedure).Not started
5. Real-Time Enforcement (blocked on Policy Engine Phase 1)Build services/opa-consumer/v1/. Wire gateway middleware to call OPA at the billing/entitlement domain. Smoke + soak tests on staging. Operator runbook: drain opa-consumer for upgrades, force re-bootstrap from Lago, handle OPA bundle rollback.Not started — blocked on Policy Engine PRD

Dependency ordering

Phase dependency graph — 5 phases plus the post-MVP additive-metric track and the external Policy Engine Phase 1 dependency. Phases 1→2→3→4 are the linear MVP path. Phase 5 (real-time enforcement, dashed) depends on Phase 2&#39;s lago.webhook.v1.event topic AND on Policy Engine Phase 1&#39;s OPA infrastructure. Post-MVP metrics (api_calls, storage_gb, voice_minutes, active_users) are independent of phasing and land additively after Phase 2

Phase depends onReason
Phase 2 depends on Phase 1Lago must exist (with metrics + plans) before BillingService can dispatch events to it
Phase 5 depends on Policy Engine Phase 1OPA infrastructure (sidecar deployment, pkg/policy client, GCS bundle server, decision logging) must exist before opa-consumer writes entitlement state and gateway middleware reads it
Phase 5 depends on Phase 2The lago.webhook.v1.event topic (provisioned in Phase 2) must be live before opa-consumer can tail it
Phases 3 + 4 are independent of Phase 5Customer dashboards and DLQ replay don't need real-time enforcement
Post-MVP metric increments are independent of phasingEach new metric (api_calls, storage_gb, voice_minutes, active_users) is the same fixed recipe and can land any time after Phase 2

Post-MVP additive increments

Each item below is a separate ticket opened when product confirms scope. None modifies existing event shapes, handlers, or topics.

  • api_calls: new ApiCallBilledEvent proto + gateway emit + new BillingService handler + Lago metric.
  • storage_gb: new StorageUsageReportedEvent proto + StorageService emit + new handler + Lago metric.
  • voice_minutes: new VoiceMinutesConsumedEvent proto + Pipecat / Daily emit + new handler + Lago metric.
  • active_users: COUNT_DISTINCT on user_id (piggyback on existing events first; add UserActivityReportedEvent only if non-LLM-only tenants need it).

Out of Scope for MVP

  • Real-time block-on-limit enforcement — Phase 5, blocked on Policy Engine Phase 1.
  • opa-consumer projection service — built alongside Policy Engine Phase 1 against the entitlement-state shape and webhook→diff mapping locked in this PRD.
  • Gateway middleware calling OPA at the billing/entitlement domain — built alongside opa-consumer.
  • api_calls, storage_gb, voice_minutes, active_users metrics — additive post-MVP increments, ~1–2 days each.
  • Webhook HMAC enforcement — deferred while webhook-receiver is cluster-internal-only; mandatory before any non-in-cluster provider, public exposure, or Lago Cloud migration. Tracked in #1692.
  • Per-tenant Lago instances — only adopted when a tenant requires a fundamentally different plan structure (not just rates). For different rates within a shared plan structure, use plan_overrides.
  • Billing Entities (premium) — adopted later if a tenant needs their own Stripe account or branded invoices. Does NOT enable per-tenant pricing.
  • Retraced-style structured audit pipeline — MVP audit is Loki structured logs; Retraced lifts the existing plan_overrides payload without schema rework when it lands.
  • RBAC on either gateway's billing RPCs — deferred per console. console-gateway reuses the existing Travila Console Admin/Developer/Viewer role model (Billing page already in that grid); internal-admin-gateway waits on the future Travila Internal Admin Portal's role model. Tenant-bounded scoping on console-gateway is enforced in code today regardless. Handlers on both gateways read role from RequestContext for forward compatibility.
  • Self-serve plan upgrade/downgrade on console-gateway — MVP routes all subscription changes through Travila staff (internal-admin-gateway). Self-serve tier changes land after the per-tenant console exposes a plan-picker UI.
  • Body fingerprint / cross-validation between Restate cached responses and downstream Lago events — separate concern, captured in the Edge Idempotency PRD.

Open Questions

#QuestionOwnerStatus
1Exact pricing for each tier?ProductOpen
2Free tier limits?ProductOpen
3Grace period for failed payments before suspension?ProductOpen
4Should we show real-time cost or end-of-period estimate?ProductOpen
5Lago Cloud vs self-hosted?EngineeringResolved — self-hosted OSS, one shared instance per environment (one staging, one prod). Drivers: data-residency posture, per-event Cloud pricing tax on growth, marginal operational cost is small given the platform already runs 8+ stateful systems.
6Should Free tier block audio/image generation entirely, or allow a small quota?ProductOpen
7Should reasoning tokens be priced at the output rate or at a discount to encourage usage?ProductOpen
8How to handle OpenRouter's single audio pricing field (no input/output split)? Use 2× heuristic or fetch from model detail pages?EngineeringOpen

Items also deferred (with explicit owner): structurally different plan shapes per tenant (biz discussion — only triggers per-tenant Lago instance); per-tenant Lago revenue threshold (biz discussion); capacity sizing and load testing (revisit when onboarding more tenants).

Cross-References