Model Routing & Pre-Filter
This guide explains how model routing pre-filtering works — validating and filtering candidate models before starting a generation workflow, giving your application immediate synchronous error feedback when all candidate models are invalid or don't meet specified criteria.
If you don't set modelRoutingFilter, everything works exactly as before. You can adopt the filter incrementally.
What's New
| Feature | What It Does | Client Impact |
|---|---|---|
| Model validation | Validates model IDs against the OpenRouter catalog before starting generation | Handle synchronous error responses from send-message |
| Metadata filtering | Filter models by capabilities (context length, modalities, cost, etc.) | Add optional modelRoutingFilter to GenerationConfig |
Benefits
- Immediate error feedback — invalid model IDs return a synchronous error from
send-message, no need to wait for an async failure via Firebase RTDB - Capability-based routing — ensure only models that support your requirements (e.g., image input, minimum context window) are used
- Cost control — set maximum cost per token to prevent expensive models from being selected
- Automatic fallback — invalid or filtered-out models are removed from the candidate list; generation proceeds with the remaining valid models
Synchronous Error Handling
This is the most important change for clients.
Previously, if a model was invalid, send-message would fire-and-forget the generation workflow, which would fail asynchronously — clients would only learn about the failure via Firebase RTDB (or not at all).
Now, send-message validates all candidate models synchronously before starting the workflow. If ALL candidates are invalid or filtered out, it returns an immediate error in the HTTP response:
POST /api/v1/llm/gateway/send-message (model: "invalid/model-xyz")
↓
Backend: Validate model IDs → ALL invalid
↓
HTTP Response: 400 — "all candidate models were filtered out: [invalid/model-xyz]"
↓
No generation workflow started. No RTDB update. No async failure.
Client Flow Change
Before:
1. Call send-message → always returns 200 with runId
2. Listen to RTDB for response
3. If model invalid → RTDB never updates (or shows error asynchronously)
After:
1. Call send-message → may return 400 if ALL models invalid
2. If 400 → show error to user immediately (do NOT listen to RTDB)
3. If 200 → proceed to listen to RTDB as usual
Error Response Format
When all candidates are filtered out due to invalid IDs, send-message returns HTTP 400:
{
"code": 400,
"message": "all candidate models were filtered out: [invalid/model-xyz, deprecated/old-model]"
}
When metadata filters eliminate all candidates, the error includes per-model reasons (HTTP 422):
{
"code": 422,
"message": "all candidate models were filtered out: [google/gemini-2.5-flash: context_length, openai/gpt-4o-mini: input_modality]"
}
Error Summary
| Error | HTTP Code | Meaning | Action |
|---|---|---|---|
| All models invalid (ID not in catalog) | 400 | Every model ID is wrong/deprecated | Show error immediately. Do NOT listen to RTDB. Check model IDs. |
| All models filtered by metadata | 422 | Models exist but don't meet filter criteria | Show error immediately. Do NOT listen to RTDB. Relax filter or add more models. |
| Some models filtered, some valid | 200 | Invalid/filtered models silently removed, generation proceeds with remaining | Normal flow. RTDB listener as usual. |
No modelRoutingFilter set | 200 | No metadata filtering, only ID validation | Normal flow. Same as before metadata filtering. |
This also applies to send-message-sync. The sync endpoint returns the same 400/422 error inline when all models are filtered.
Where to Set the Filter
The modelRoutingFilter lives on GenerationConfig — the same object used for model, models, responseFormat, etc. It can be set at two levels:
1. Thread-Level Default
Set via create-thread in default_generation_config.model_routing_filter. Applies to every message in the thread.
curl -X POST https://api.yocaso.dev/api/v1/llm/gateway/create-thread \
-H "X-API-Key: $API_KEY" \
-H "Authorization: Bearer $JWT" \
-H "Content-Type: application/json" \
-d '{
"name": "Long research thread",
"conversation_key": "research-001",
"default_generation_config": {
"models": [
"google/gemini-2.5-flash:nitro",
"anthropic/claude-sonnet-4.5:nitro",
"openai/gpt-4o"
],
"model_routing_filter": {
"min_context_length": 128000
}
}
}'
When to use: When you want consistent capability requirements across all messages in a thread (e.g., "always require >=128k context for this long-context thread").
2. Per-Message Override
Set via send-message in override_generation_config.model_routing_filter. Applies to a single message only.
curl -X POST https://api.yocaso.dev/api/v1/llm/gateway/send-message \
-H "X-API-Key: $API_KEY" \
-H "Authorization: Bearer $JWT" \
-H "Content-Type: application/json" \
-d '{
"conversation_key": "research-001",
"user_message": {
"role": "user",
"content": [
{ "type": "CONTENT_PART_TYPE_IMAGE", "content": "<base64_image>" },
{ "type": "CONTENT_PART_TYPE_TEXT", "content": "What is in this image?" }
]
},
"override_generation_config": {
"models": [
"google/gemini-2.5-flash:nitro",
"anthropic/claude-sonnet-4.5:nitro",
"openai/gpt-4o"
],
"model_routing_filter": {
"required_input_modalities": ["image"]
}
}
}'
When to use: When a specific message needs different model capabilities (e.g., "this message includes an image, require image input support").
override_generation_config uses full replacement semantics. Include all fields you want to preserve, not just the new model_routing_filter.
ModelRoutingFilter Fields
All fields are optional. Unset/zero-value fields are ignored (no filtering on that dimension). When multiple fields are set, they are ANDed — a model must pass every specified filter.
| Field | Type | Description | Behavior |
|---|---|---|---|
min_context_length | int64 | Minimum context window in tokens | Models below this are excluded |
min_max_completion_tokens | int64 | Minimum max output tokens | Models reporting 0 (unknown) pass through |
required_input_modalities | string[] | e.g., ["image", "audio"] | Model must support ALL listed. Models with empty modalities are excluded |
required_output_modalities | string[] | e.g., ["image"] | Model must support ALL listed. Models with empty modalities are excluded |
max_prompt_cost | double | Max cost per prompt token (e.g., 0.000003) | 0 = no limit. Unparseable pricing = pass |
max_completion_cost | double | Max cost per completion token | 0 = no limit. Unparseable pricing = pass |
exclude_moderated | bool | Skip models with content moderation | Only filters when set to true |
required_parameters | string[] | e.g., ["tools", "response_format"] | Model must support ALL listed |
Examples
Long-Context Thread
Ensure all models in the fallback chain support long conversations:
curl -X POST https://api.yocaso.dev/api/v1/llm/gateway/create-thread \
-H "X-API-Key: $API_KEY" \
-H "Authorization: Bearer $JWT" \
-H "Content-Type: application/json" \
-d '{
"name": "Long research thread",
"conversation_key": "research-001",
"default_generation_config": {
"models": [
"google/gemini-2.5-flash:nitro",
"anthropic/claude-sonnet-4.5:nitro",
"openai/gpt-4o"
],
"model_routing_filter": {
"min_context_length": 128000
}
}
}'
If any model in the models list has less than 128k context, it's silently removed. If all are removed, send-message returns an immediate error.
Cost-Capped Generation
Limit to cheap models for bulk/background tasks:
curl -X POST https://api.yocaso.dev/api/v1/llm/gateway/send-message \
-H "X-API-Key: $API_KEY" \
-H "Authorization: Bearer $JWT" \
-H "Content-Type: application/json" \
-d '{
"conversation_key": "bulk-task-001",
"user_message": {
"role": "user",
"content": [{ "type": "CONTENT_PART_TYPE_TEXT", "content": "Summarize this" }]
},
"override_generation_config": {
"models": [
"google/gemini-2.5-flash:nitro",
"openai/gpt-4o-mini",
"anthropic/claude-haiku-3.5"
],
"model_routing_filter": {
"max_prompt_cost": 0.000005,
"max_completion_cost": 0.00002
}
}
}'
Structured Output with Tool Support
Ensure the model supports both tools and response_format parameters:
curl -X POST https://api.yocaso.dev/api/v1/llm/gateway/send-message \
-H "X-API-Key: $API_KEY" \
-H "Authorization: Bearer $JWT" \
-H "Content-Type: application/json" \
-d '{
"conversation_key": "health-plan-001",
"user_message": {
"role": "user",
"content": [{ "type": "CONTENT_PART_TYPE_TEXT", "content": "Generate my health plan" }]
},
"override_generation_config": {
"models": [
"google/gemini-2.5-flash:nitro",
"anthropic/claude-sonnet-4.5:nitro"
],
"response_format": {
"json_schema": { "..." : "schema" },
"schema_name": "health_plan"
},
"model_routing_filter": {
"required_parameters": ["tools", "response_format"]
}
}
}'
Backwards Compatibility
| Scenario | Behavior |
|---|---|
Existing threads without modelRoutingFilter | No metadata filtering. ID validation still active. |
modelRoutingFilter with all zero/unset fields | No filtering — equivalent to not setting it |
modelRoutingFilter on override_generation_config | Applies to that message only. Next message reverts to thread default. |
Single valid model with no models array | Filter applies to the single model. If it fails, immediate error. |
model + models with filter | Filter applies to the merged candidate list (model first, then models). |
Existing error handling for send-message | If your code already handles non-200 responses, it will catch the new 400/422 errors automatically. |
Related
- Conversations Guide — Thread lifecycle, generation config, and message sending
- LLM Gateway API Reference — Full endpoint reference