Model Routing & Pre-Filter

This guide explains how model routing pre-filtering works — validating and filtering candidate models before starting a generation workflow, giving your application immediate synchronous error feedback when all candidate models are invalid or don't meet specified criteria.

Backwards Compatible

If you don't set modelRoutingFilter, everything works exactly as before. You can adopt the filter incrementally.

What's New

Feature	What It Does	Client Impact
Model validation	Validates model IDs against the OpenRouter catalog before starting generation	Handle synchronous error responses from `send-message`
Metadata filtering	Filter models by capabilities (context length, modalities, cost, etc.)	Add optional `modelRoutingFilter` to `GenerationConfig`

Benefits

Immediate error feedback — invalid model IDs return a synchronous error from send-message, no need to wait for an async failure via Firebase RTDB
Capability-based routing — ensure only models that support your requirements (e.g., image input, minimum context window) are used
Cost control — set maximum cost per token to prevent expensive models from being selected
Automatic fallback — invalid or filtered-out models are removed from the candidate list; generation proceeds with the remaining valid models

Synchronous Error Handling

This is the most important change for clients.

Previously, if a model was invalid, send-message would fire-and-forget the generation workflow, which would fail asynchronously — clients would only learn about the failure via Firebase RTDB (or not at all).

Now, send-message validates all candidate models synchronously before starting the workflow. If ALL candidates are invalid or filtered out, it returns an immediate error in the HTTP response:

POST /api/v1/llm/gateway/send-message  (model: "invalid/model-xyz")
    ↓
Backend: Validate model IDs → ALL invalid
    ↓
HTTP Response: 400 with RpcError body
    ↓
No generation workflow started. No RTDB update. No async failure.

Client Flow Change

Before:
Call send-message → always returns 200 with runId
Listen to RTDB for response
If model invalid → RTDB never updates (or shows error asynchronously)

After:
Call send-message → may return 400 if ALL models are filtered out
If 400 → show error to user immediately (do NOT listen to RTDB)
If 200 → proceed to listen to RTDB as usual

Error Response Format

When pre-validation leaves zero candidate models, send-message returns HTTP 400 with the following RpcError:

{
  "code": "ERROR_CODE_INVALID_ARGUMENT",
  "message": "all candidate models were filtered out: [invalid/model-xyz deprecated/old-model]",
  "is_terminal": true,
  "details": {
    "error_info": {
      "reason": "MODEL_FILTERED_OUT",
      "domain": "conversation"
    }
  }
}

The payload is the same whether every candidate was an invalid ID (not in the OpenRouter catalog) or survivors were eliminated by modelRoutingFilter — the inline error does not distinguish the two cases. In either case, clients should treat this as a terminal configuration error: fix the model IDs, or relax the filter.

Different payload on streaming / generation failures

The error above is emitted synchronously by the conversation actor before generation begins. If an error surfaces later from the generation workflow itself (for example, when the OpenRouter service re-validates candidates during a retry), the streamed RpcError uses code: ERROR_CODE_MODEL_INVALID, reason: "ALL_MODELS_FILTERED", domain: "openrouter", and includes per-model reasons in message.

Error Summary

Error	HTTP Code	Meaning	Action
All models invalid (ID not in catalog)	400 (`ERROR_CODE_INVALID_ARGUMENT`, `reason: MODEL_FILTERED_OUT`)	Every model ID is wrong/deprecated	Show error immediately. Do NOT listen to RTDB. Check model IDs.
All models filtered by metadata	400 (`ERROR_CODE_INVALID_ARGUMENT`, `reason: MODEL_FILTERED_OUT`)	Models exist but don't meet filter criteria	Show error immediately. Do NOT listen to RTDB. Relax filter or add more models.
Some models filtered, some valid	200	Invalid/filtered models silently removed, generation proceeds with remaining	Normal flow. RTDB listener as usual.
No `modelRoutingFilter` set	200	No metadata filtering, only ID validation	Normal flow. Same as before metadata filtering.

important

This also applies to send-message-sync. The sync endpoint returns the same RpcError shape when all models are filtered out.

Where to Set the Filter

The modelRoutingFilter lives on GenerationConfig — the same object used for model, models, responseFormat, etc. It can be set at two levels:

1. Thread-Level Default

Set via create-thread in default_generation_config.model_routing_filter. Applies to every message in the thread.

curl -X POST https://api.yocaso.dev/api/v1/llm/gateway/create-thread \
  -H "X-API-Key: $API_KEY" \
  -H "Authorization: Bearer $JWT" \
  -H "Content-Type: application/json" \
  -d '{
    "title": "Long research thread",
    "default_generation_config": {
      "models": [
        "google/gemini-2.5-flash:nitro",
        "anthropic/claude-sonnet-4.5:nitro",
        "openai/gpt-4o"
      ],
      "model_routing_filter": {
        "min_context_length": 128000
      }
    }
  }'

When to use: When you want consistent capability requirements across all messages in a thread (e.g., "always require >=128k context for this long-context thread").

2. Per-Message Override

Set via send-message in override_generation_config.model_routing_filter. Applies to a single message only.

curl -X POST https://api.yocaso.dev/api/v1/llm/gateway/send-message \
  -H "X-API-Key: $API_KEY" \
  -H "Authorization: Bearer $JWT" \
  -H "Content-Type: application/json" \
  -d '{
    "conversation_key": "research-001",
    "user_message": {
      "role": "ROLE_USER",
      "content": [
        { "type": "CONTENT_PART_TYPE_IMAGE_BASE64", "content": "<base64_image>" },
        { "type": "CONTENT_PART_TYPE_TEXT", "content": "What is in this image?" }
      ]
    },
    "override_generation_config": {
      "models": [
        "google/gemini-2.5-flash:nitro",
        "anthropic/claude-sonnet-4.5:nitro",
        "openai/gpt-4o"
      ],
      "model_routing_filter": {
        "required_input_modalities": ["image"]
      }
    }
  }'

When to use: When a specific message needs different model capabilities (e.g., "this message includes an image, require image input support").

Full Replacement Semantics

override_generation_config uses full replacement semantics. Include all fields you want to preserve, not just the new model_routing_filter.

ModelRoutingFilter Fields

All fields are optional. Unset/zero-value fields are ignored (no filtering on that dimension). When multiple fields are set, they are ANDed — a model must pass every specified filter.

Field	Type	Description	Behavior
`min_context_length`	int64	Minimum context window in tokens	Models below this are excluded
`min_max_completion_tokens`	int64	Minimum max output tokens	Models reporting 0 (unknown) pass through
`required_input_modalities`	string[]	e.g., `["image", "audio"]`	Model must support ALL listed. Models with empty modalities are excluded
`required_output_modalities`	string[]	e.g., `["image"]`	Model must support ALL listed. Models with empty modalities are excluded
`max_prompt_cost`	double	Max cost per prompt token (e.g., `0.000003`)	0 = no limit. Unparseable pricing = pass
`max_completion_cost`	double	Max cost per completion token	0 = no limit. Unparseable pricing = pass
`exclude_moderated`	bool	Skip models with content moderation	Only filters when set to `true`
`required_parameters`	string[]	e.g., `["tools", "response_format"]`	Model must support ALL listed

Examples

Long-Context Thread

Ensure all models in the fallback chain support long conversations:

curl -X POST https://api.yocaso.dev/api/v1/llm/gateway/create-thread \
  -H "X-API-Key: $API_KEY" \
  -H "Authorization: Bearer $JWT" \
  -H "Content-Type: application/json" \
  -d '{
    "title": "Long research thread",
    "default_generation_config": {
      "models": [
        "google/gemini-2.5-flash:nitro",
        "anthropic/claude-sonnet-4.5:nitro",
        "openai/gpt-4o"
      ],
      "model_routing_filter": {
        "min_context_length": 128000
      }
    }
  }'

If any model in the models list has less than 128k context, it's silently removed. If all are removed, send-message returns an immediate error.

Cost-Capped Generation

Limit to cheap models for bulk/background tasks:

curl -X POST https://api.yocaso.dev/api/v1/llm/gateway/send-message \
  -H "X-API-Key: $API_KEY" \
  -H "Authorization: Bearer $JWT" \
  -H "Content-Type: application/json" \
  -d '{
    "conversation_key": "bulk-task-001",
    "user_message": {
      "role": "ROLE_USER",
      "content": [{ "type": "CONTENT_PART_TYPE_TEXT", "content": "Summarize this" }]
    },
    "override_generation_config": {
      "models": [
        "google/gemini-2.5-flash:nitro",
        "openai/gpt-4o-mini",
        "anthropic/claude-haiku-3.5"
      ],
      "model_routing_filter": {
        "max_prompt_cost": 0.000005,
        "max_completion_cost": 0.00002
      }
    }
  }'

Structured Output with Tool Support

Ensure the model supports both tools and response_format parameters:

curl -X POST https://api.yocaso.dev/api/v1/llm/gateway/send-message \
  -H "X-API-Key: $API_KEY" \
  -H "Authorization: Bearer $JWT" \
  -H "Content-Type: application/json" \
  -d '{
    "conversation_key": "health-plan-001",
    "user_message": {
      "role": "ROLE_USER",
      "content": [{ "type": "CONTENT_PART_TYPE_TEXT", "content": "Generate my health plan" }]
    },
    "override_generation_config": {
      "models": [
        "google/gemini-2.5-flash:nitro",
        "anthropic/claude-sonnet-4.5:nitro"
      ],
      "response_format": {
        "json_schema": { "..." : "schema" },
        "schema_name": "health_plan"
      },
      "model_routing_filter": {
        "required_parameters": ["tools", "response_format"]
      }
    }
  }'

Backwards Compatibility

Scenario	Behavior
Existing threads without `modelRoutingFilter`	No metadata filtering. ID validation still active.
`modelRoutingFilter` with all zero/unset fields	No filtering — equivalent to not setting it
`modelRoutingFilter` on `override_generation_config`	Applies to that message only. Next message reverts to thread default.
Single valid `model` with no `models` array	Filter applies to the single model. If it fails, immediate error.
`model` + `models` with filter	Filter applies to the merged candidate list (`model` first, then `models`).
Existing error handling for `send-message`	If your code already handles non-200 responses, it will catch the new 400 errors automatically.

Conversations Guide — Thread lifecycle, generation config, and message sending
LLM Gateway API Reference — Full endpoint reference

What's New​

Benefits​

Synchronous Error Handling​

Client Flow Change​

Error Response Format​

Error Summary​

Where to Set the Filter​

1. Thread-Level Default​

2. Per-Message Override​

ModelRoutingFilter Fields​

Examples​

Long-Context Thread​

Cost-Capped Generation​

Structured Output with Tool Support​

Backwards Compatibility​

Related​