Conversations
This guide explains how conversations work on the GYBC platform — threads, messages, generation, and configuration.
Core Concepts
Threads
A thread is a conversation container. Each thread has:
- A unique
conversation_key(your chosen identifier) - A display name
- Its own message history
- Independent settings and generation config
Threads are scoped to the authenticated user. One user can have many threads.
Messages
Messages are the content units within a thread. Each message has a role and content:
| Role | Description |
|---|---|
user | Messages from the end user |
assistant | AI-generated responses |
system | System instructions (injected by the platform or via settings) |
tool | Results from MCP tool calls |
Conversation State
Every thread maintains state that includes:
- Message history — the full ordered list of messages
- Generation config — model, temperature, max tokens, etc.
- Settings — system prompt, context management, interrupt policy
- Status — whether a generation run is active (
RUNNING,IDLE, etc.) - Pending tool approvals — tools waiting for user approval before execution
Conversation Lifecycle
1. Create a Thread
curl -X POST https://api.yocaso.dev/api/v1/llm/gateway/create-thread \
-H "X-API-Key: sk_live_your_key_here" \
-H "Content-Type: application/json" \
-d '{
"name": "Support Chat",
"conversation_key": "support-chat-001"
}'
The conversation_key is your identifier — use something meaningful to your application.
2. Send a Message
curl -X POST https://api.yocaso.dev/api/v1/llm/gateway/send-message \
-H "X-API-Key: sk_live_your_key_here" \
-H "Content-Type: application/json" \
-d '{
"conversation_key": "support-chat-001",
"user_message": {
"role": "user",
"content": "How do I reset my password?"
}
}'
When you send a message:
- The message is appended to the thread's history
- A generation run starts asynchronously
- The LLM processes the conversation history and generates a response
- If the model calls tools, tool execution happens automatically (or awaits approval)
- The assistant response is appended to history
- The response is published as streaming chunks via Kafka
3. Retrieve State
curl -X POST https://api.yocaso.dev/api/v1/llm/gateway/conversation-state \
-H "X-API-Key: sk_live_your_key_here" \
-H "Content-Type: application/json" \
-d '{
"conversation_key": "support-chat-001"
}'
Returns the full conversation state including all messages, settings, and current status.
4. List Threads
curl -X POST https://api.yocaso.dev/api/v1/llm/gateway/list-threads \
-H "X-API-Key: sk_live_your_key_here" \
-H "Content-Type: application/json" \
-d '{}'
Returns all threads for the authenticated user.
Generation Config
Control how the AI generates responses by updating the default generation config.
curl -X POST https://api.yocaso.dev/api/v1/llm/gateway/update-default-generation-config \
-H "X-API-Key: sk_live_your_key_here" \
-H "Content-Type: application/json" \
-d '{
"conversation_key": "support-chat-001",
"generation_config": {
"model": "openai/gpt-4o",
"temperature": 0.7,
"max_tokens": 2048,
"top_p": 0.9
}
}'
Key generation config fields:
| Field | Description | Default |
|---|---|---|
model | LLM model to use (OpenRouter model ID) | Platform default |
temperature | Randomness (0.0 = deterministic, 2.0 = creative) | 1.0 |
max_tokens | Maximum tokens in the response | Model default |
top_p | Nucleus sampling threshold | 1.0 |
Conversation Settings
Update the conversation's system prompt, interrupt policy, and other settings.
curl -X POST https://api.yocaso.dev/api/v1/llm/gateway/update-settings \
-H "X-API-Key: sk_live_your_key_here" \
-H "Content-Type: application/json" \
-d '{
"conversation_key": "support-chat-001",
"settings": {
"system_prompt": "You are a helpful customer support agent for Acme Corp.",
"interrupt_policy": "QUEUE"
}
}'
System Prompt
The system_prompt is prepended to every LLM request for this conversation. Use it to set the AI's persona, rules, and context.
Interrupt Policy
Controls what happens when a user sends a new message while a generation run is already in progress:
| Policy | Behavior |
|---|---|
QUEUE | Queue the new message and process it after the current run completes |
INTERRUPT | Cancel the current run and start a new one with the latest message |
Context Management
For long conversations, the message history can exceed the model's context window. Context management settings control how this is handled.
curl -X POST https://api.yocaso.dev/api/v1/llm/gateway/update-context-management-settings \
-H "X-API-Key: sk_live_your_key_here" \
-H "Content-Type: application/json" \
-d '{
"conversation_key": "support-chat-001",
"context_management_settings": {
"strategy": "SLIDING_WINDOW",
"max_history_messages": 50
}
}'
| Strategy | Description |
|---|---|
SLIDING_WINDOW | Keep the most recent N messages |
SUMMARIZE | Summarize older messages to preserve context while reducing token count |
Streaming
Message generation is asynchronous. AI responses are published as streaming chunks via Kafka, enabling real-time delivery to connected clients.
The flow:
send-messagestarts a generation run and returns immediately- The LLM generates tokens incrementally
- Each chunk is published to a Kafka topic
- Your client consumes chunks for real-time display
- Once complete, the full response is stored in the conversation state
Tool Calling
When the LLM decides to use a tool (via MCP), the platform can either:
- Auto-execute the tool and feed results back to the LLM
- Request approval from the user before execution
Checking Pending Approvals
curl -X POST https://api.yocaso.dev/api/v1/llm/gateway/list-pending-approvals \
-H "X-API-Key: sk_live_your_key_here" \
-H "Content-Type: application/json" \
-d '{
"conversation_key": "support-chat-001"
}'
Submitting Approvals
curl -X POST https://api.yocaso.dev/api/v1/llm/gateway/submit-tool-approvals \
-H "X-API-Key: sk_live_your_key_here" \
-H "Content-Type: application/json" \
-d '{
"conversation_key": "support-chat-001",
"approvals": [
{
"tool_call_id": "call_abc123",
"approved": true
}
]
}'
Client-Side Tools
For tools that should execute on the client (e.g., UI actions), submit results back to the conversation:
curl -X POST https://api.yocaso.dev/api/v1/llm/gateway/submit-client-tool-results \
-H "X-API-Key: sk_live_your_key_here" \
-H "Content-Type: application/json" \
-d '{
"conversation_key": "support-chat-001",
"tool_results": [
{
"tool_call_id": "call_xyz789",
"result": "{\"status\": \"confirmed\"}"
}
]
}'
See the MCP Tools Guide for more on tool calling.
Voice Sessions
Create a real-time voice session for a conversation using Daily and Pipecat:
curl -X POST https://api.yocaso.dev/api/v1/llm/gateway/create-daily-session \
-H "X-API-Key: sk_live_your_key_here" \
-H "Content-Type: application/json" \
-d '{
"conversation_key": "support-chat-001"
}'
The response includes a Daily room URL and token for the client to join the voice session.
User Impersonation
Backend services with the users:impersonate scope can operate on behalf of specific users:
curl -X POST https://api.yocaso.dev/api/v1/llm/gateway/send-message \
-H "X-API-Key: sk_live_your_key_here" \
-H "X-On-Behalf-Of: user_123" \
-H "Content-Type: application/json" \
-d '{
"conversation_key": "support-chat-001",
"user_message": {
"role": "user",
"content": "Hello"
}
}'
See the API Key Integration Guide for details on scopes.
Related
- Quickstart — Make your first API call
- MCP Tools Guide — Tool calling in conversations
- Memory Guide — Semantic memory for conversations
- LLM Gateway API Reference — Full endpoint reference