Skip to main content

Model Routing & Pre-Filter

This guide explains how model routing pre-filtering works — validating and filtering candidate models before starting a generation workflow, giving your application immediate synchronous error feedback when all candidate models are invalid or don't meet specified criteria.

Backwards Compatible

If you don't set modelRoutingFilter, everything works exactly as before. You can adopt the filter incrementally.

What's New

FeatureWhat It DoesClient Impact
Model validationValidates model IDs against the OpenRouter catalog before starting generationHandle synchronous error responses from send-message
Metadata filteringFilter models by capabilities (context length, modalities, cost, etc.)Add optional modelRoutingFilter to GenerationConfig

Benefits

  • Immediate error feedback — invalid model IDs return a synchronous error from send-message, no need to wait for an async failure via Firebase RTDB
  • Capability-based routing — ensure only models that support your requirements (e.g., image input, minimum context window) are used
  • Cost control — set maximum cost per token to prevent expensive models from being selected
  • Automatic fallback — invalid or filtered-out models are removed from the candidate list; generation proceeds with the remaining valid models

Synchronous Error Handling

This is the most important change for clients.

Previously, if a model was invalid, send-message would fire-and-forget the generation workflow, which would fail asynchronously — clients would only learn about the failure via Firebase RTDB (or not at all).

Now, send-message validates all candidate models synchronously before starting the workflow. If ALL candidates are invalid or filtered out, it returns an immediate error in the HTTP response:

POST /api/v1/llm/gateway/send-message  (model: "invalid/model-xyz")

Backend: Validate model IDs → ALL invalid

HTTP Response: 400 — "all candidate models were filtered out: [invalid/model-xyz]"

No generation workflow started. No RTDB update. No async failure.

Client Flow Change

Before:
1. Call send-message → always returns 200 with runId
2. Listen to RTDB for response
3. If model invalid → RTDB never updates (or shows error asynchronously)

After:
1. Call send-message → may return 400 if ALL models invalid
2. If 400 → show error to user immediately (do NOT listen to RTDB)
3. If 200 → proceed to listen to RTDB as usual

Error Response Format

When all candidates are filtered out due to invalid IDs, send-message returns HTTP 400:

{
"code": 400,
"message": "all candidate models were filtered out: [invalid/model-xyz, deprecated/old-model]"
}

When metadata filters eliminate all candidates, the error includes per-model reasons (HTTP 422):

{
"code": 422,
"message": "all candidate models were filtered out: [google/gemini-2.5-flash: context_length, openai/gpt-4o-mini: input_modality]"
}

Error Summary

ErrorHTTP CodeMeaningAction
All models invalid (ID not in catalog)400Every model ID is wrong/deprecatedShow error immediately. Do NOT listen to RTDB. Check model IDs.
All models filtered by metadata422Models exist but don't meet filter criteriaShow error immediately. Do NOT listen to RTDB. Relax filter or add more models.
Some models filtered, some valid200Invalid/filtered models silently removed, generation proceeds with remainingNormal flow. RTDB listener as usual.
No modelRoutingFilter set200No metadata filtering, only ID validationNormal flow. Same as before metadata filtering.
important

This also applies to send-message-sync. The sync endpoint returns the same 400/422 error inline when all models are filtered.

Where to Set the Filter

The modelRoutingFilter lives on GenerationConfig — the same object used for model, models, responseFormat, etc. It can be set at two levels:

1. Thread-Level Default

Set via create-thread in default_generation_config.model_routing_filter. Applies to every message in the thread.

curl -X POST https://api.yocaso.dev/api/v1/llm/gateway/create-thread \
-H "X-API-Key: $API_KEY" \
-H "Authorization: Bearer $JWT" \
-H "Content-Type: application/json" \
-d '{
"name": "Long research thread",
"conversation_key": "research-001",
"default_generation_config": {
"models": [
"google/gemini-2.5-flash:nitro",
"anthropic/claude-sonnet-4.5:nitro",
"openai/gpt-4o"
],
"model_routing_filter": {
"min_context_length": 128000
}
}
}'

When to use: When you want consistent capability requirements across all messages in a thread (e.g., "always require >=128k context for this long-context thread").

2. Per-Message Override

Set via send-message in override_generation_config.model_routing_filter. Applies to a single message only.

curl -X POST https://api.yocaso.dev/api/v1/llm/gateway/send-message \
-H "X-API-Key: $API_KEY" \
-H "Authorization: Bearer $JWT" \
-H "Content-Type: application/json" \
-d '{
"conversation_key": "research-001",
"user_message": {
"role": "user",
"content": [
{ "type": "CONTENT_PART_TYPE_IMAGE", "content": "<base64_image>" },
{ "type": "CONTENT_PART_TYPE_TEXT", "content": "What is in this image?" }
]
},
"override_generation_config": {
"models": [
"google/gemini-2.5-flash:nitro",
"anthropic/claude-sonnet-4.5:nitro",
"openai/gpt-4o"
],
"model_routing_filter": {
"required_input_modalities": ["image"]
}
}
}'

When to use: When a specific message needs different model capabilities (e.g., "this message includes an image, require image input support").

Full Replacement Semantics

override_generation_config uses full replacement semantics. Include all fields you want to preserve, not just the new model_routing_filter.

ModelRoutingFilter Fields

All fields are optional. Unset/zero-value fields are ignored (no filtering on that dimension). When multiple fields are set, they are ANDed — a model must pass every specified filter.

FieldTypeDescriptionBehavior
min_context_lengthint64Minimum context window in tokensModels below this are excluded
min_max_completion_tokensint64Minimum max output tokensModels reporting 0 (unknown) pass through
required_input_modalitiesstring[]e.g., ["image", "audio"]Model must support ALL listed. Models with empty modalities are excluded
required_output_modalitiesstring[]e.g., ["image"]Model must support ALL listed. Models with empty modalities are excluded
max_prompt_costdoubleMax cost per prompt token (e.g., 0.000003)0 = no limit. Unparseable pricing = pass
max_completion_costdoubleMax cost per completion token0 = no limit. Unparseable pricing = pass
exclude_moderatedboolSkip models with content moderationOnly filters when set to true
required_parametersstring[]e.g., ["tools", "response_format"]Model must support ALL listed

Examples

Long-Context Thread

Ensure all models in the fallback chain support long conversations:

curl -X POST https://api.yocaso.dev/api/v1/llm/gateway/create-thread \
-H "X-API-Key: $API_KEY" \
-H "Authorization: Bearer $JWT" \
-H "Content-Type: application/json" \
-d '{
"name": "Long research thread",
"conversation_key": "research-001",
"default_generation_config": {
"models": [
"google/gemini-2.5-flash:nitro",
"anthropic/claude-sonnet-4.5:nitro",
"openai/gpt-4o"
],
"model_routing_filter": {
"min_context_length": 128000
}
}
}'

If any model in the models list has less than 128k context, it's silently removed. If all are removed, send-message returns an immediate error.

Cost-Capped Generation

Limit to cheap models for bulk/background tasks:

curl -X POST https://api.yocaso.dev/api/v1/llm/gateway/send-message \
-H "X-API-Key: $API_KEY" \
-H "Authorization: Bearer $JWT" \
-H "Content-Type: application/json" \
-d '{
"conversation_key": "bulk-task-001",
"user_message": {
"role": "user",
"content": [{ "type": "CONTENT_PART_TYPE_TEXT", "content": "Summarize this" }]
},
"override_generation_config": {
"models": [
"google/gemini-2.5-flash:nitro",
"openai/gpt-4o-mini",
"anthropic/claude-haiku-3.5"
],
"model_routing_filter": {
"max_prompt_cost": 0.000005,
"max_completion_cost": 0.00002
}
}
}'

Structured Output with Tool Support

Ensure the model supports both tools and response_format parameters:

curl -X POST https://api.yocaso.dev/api/v1/llm/gateway/send-message \
-H "X-API-Key: $API_KEY" \
-H "Authorization: Bearer $JWT" \
-H "Content-Type: application/json" \
-d '{
"conversation_key": "health-plan-001",
"user_message": {
"role": "user",
"content": [{ "type": "CONTENT_PART_TYPE_TEXT", "content": "Generate my health plan" }]
},
"override_generation_config": {
"models": [
"google/gemini-2.5-flash:nitro",
"anthropic/claude-sonnet-4.5:nitro"
],
"response_format": {
"json_schema": { "..." : "schema" },
"schema_name": "health_plan"
},
"model_routing_filter": {
"required_parameters": ["tools", "response_format"]
}
}
}'

Backwards Compatibility

ScenarioBehavior
Existing threads without modelRoutingFilterNo metadata filtering. ID validation still active.
modelRoutingFilter with all zero/unset fieldsNo filtering — equivalent to not setting it
modelRoutingFilter on override_generation_configApplies to that message only. Next message reverts to thread default.
Single valid model with no models arrayFilter applies to the single model. If it fails, immediate error.
model + models with filterFilter applies to the merged candidate list (model first, then models).
Existing error handling for send-messageIf your code already handles non-200 responses, it will catch the new 400/422 errors automatically.