Skip to main content

Conversations

This guide explains how conversations work on the GYBC platform — threads, messages, generation, and configuration.

Core Concepts

Threads

A thread is a conversation container. Each thread has:

  • A unique conversation_key (your chosen identifier)
  • A display name
  • Its own message history
  • Independent settings and generation config

Threads are scoped to the authenticated user. One user can have many threads.

Messages

Messages are the content units within a thread. Each message has a role and content:

RoleDescription
userMessages from the end user
assistantAI-generated responses
systemSystem instructions (injected by the platform or via settings)
toolResults from MCP tool calls

Conversation State

Every thread maintains state that includes:

  • Message history — the full ordered list of messages
  • Generation config — model, temperature, max tokens, etc.
  • Settings — system prompt, context management, interrupt policy
  • Status — whether a generation run is active (RUNNING, IDLE, etc.)
  • Pending tool approvals — tools waiting for user approval before execution

Conversation Lifecycle

1. Create a Thread

curl -X POST https://api.yocaso.dev/api/v1/llm/gateway/create-thread \
-H "X-API-Key: sk_live_your_key_here" \
-H "Content-Type: application/json" \
-d '{
"name": "Support Chat",
"conversation_key": "support-chat-001"
}'

The conversation_key is your identifier — use something meaningful to your application.

2. Send a Message

curl -X POST https://api.yocaso.dev/api/v1/llm/gateway/send-message \
-H "X-API-Key: sk_live_your_key_here" \
-H "Content-Type: application/json" \
-d '{
"conversation_key": "support-chat-001",
"user_message": {
"role": "user",
"content": "How do I reset my password?"
}
}'

When you send a message:

  1. The message is appended to the thread's history
  2. A generation run starts asynchronously
  3. The LLM processes the conversation history and generates a response
  4. If the model calls tools, tool execution happens automatically (or awaits approval)
  5. The assistant response is appended to history
  6. The response is published as streaming chunks via Kafka

3. Retrieve State

curl -X POST https://api.yocaso.dev/api/v1/llm/gateway/conversation-state \
-H "X-API-Key: sk_live_your_key_here" \
-H "Content-Type: application/json" \
-d '{
"conversation_key": "support-chat-001"
}'

Returns the full conversation state including all messages, settings, and current status.

4. List Threads

curl -X POST https://api.yocaso.dev/api/v1/llm/gateway/list-threads \
-H "X-API-Key: sk_live_your_key_here" \
-H "Content-Type: application/json" \
-d '{}'

Returns all threads for the authenticated user.

Generation Config

Control how the AI generates responses by updating the default generation config.

curl -X POST https://api.yocaso.dev/api/v1/llm/gateway/update-default-generation-config \
-H "X-API-Key: sk_live_your_key_here" \
-H "Content-Type: application/json" \
-d '{
"conversation_key": "support-chat-001",
"generation_config": {
"model": "openai/gpt-4o",
"temperature": 0.7,
"max_tokens": 2048,
"top_p": 0.9
}
}'

Key generation config fields:

FieldDescriptionDefault
modelLLM model to use (OpenRouter model ID)Platform default
temperatureRandomness (0.0 = deterministic, 2.0 = creative)1.0
max_tokensMaximum tokens in the responseModel default
top_pNucleus sampling threshold1.0

Conversation Settings

Update the conversation's system prompt, interrupt policy, and other settings.

curl -X POST https://api.yocaso.dev/api/v1/llm/gateway/update-settings \
-H "X-API-Key: sk_live_your_key_here" \
-H "Content-Type: application/json" \
-d '{
"conversation_key": "support-chat-001",
"settings": {
"system_prompt": "You are a helpful customer support agent for Acme Corp.",
"interrupt_policy": "QUEUE"
}
}'

System Prompt

The system_prompt is prepended to every LLM request for this conversation. Use it to set the AI's persona, rules, and context.

Interrupt Policy

Controls what happens when a user sends a new message while a generation run is already in progress:

PolicyBehavior
QUEUEQueue the new message and process it after the current run completes
INTERRUPTCancel the current run and start a new one with the latest message

Context Management

For long conversations, the message history can exceed the model's context window. Context management settings control how this is handled.

curl -X POST https://api.yocaso.dev/api/v1/llm/gateway/update-context-management-settings \
-H "X-API-Key: sk_live_your_key_here" \
-H "Content-Type: application/json" \
-d '{
"conversation_key": "support-chat-001",
"context_management_settings": {
"strategy": "SLIDING_WINDOW",
"max_history_messages": 50
}
}'
StrategyDescription
SLIDING_WINDOWKeep the most recent N messages
SUMMARIZESummarize older messages to preserve context while reducing token count

Streaming

Message generation is asynchronous. AI responses are published as streaming chunks via Kafka, enabling real-time delivery to connected clients.

The flow:

  1. send-message starts a generation run and returns immediately
  2. The LLM generates tokens incrementally
  3. Each chunk is published to a Kafka topic
  4. Your client consumes chunks for real-time display
  5. Once complete, the full response is stored in the conversation state

Tool Calling

When the LLM decides to use a tool (via MCP), the platform can either:

  • Auto-execute the tool and feed results back to the LLM
  • Request approval from the user before execution

Checking Pending Approvals

curl -X POST https://api.yocaso.dev/api/v1/llm/gateway/list-pending-approvals \
-H "X-API-Key: sk_live_your_key_here" \
-H "Content-Type: application/json" \
-d '{
"conversation_key": "support-chat-001"
}'

Submitting Approvals

curl -X POST https://api.yocaso.dev/api/v1/llm/gateway/submit-tool-approvals \
-H "X-API-Key: sk_live_your_key_here" \
-H "Content-Type: application/json" \
-d '{
"conversation_key": "support-chat-001",
"approvals": [
{
"tool_call_id": "call_abc123",
"approved": true
}
]
}'

Client-Side Tools

For tools that should execute on the client (e.g., UI actions), submit results back to the conversation:

curl -X POST https://api.yocaso.dev/api/v1/llm/gateway/submit-client-tool-results \
-H "X-API-Key: sk_live_your_key_here" \
-H "Content-Type: application/json" \
-d '{
"conversation_key": "support-chat-001",
"tool_results": [
{
"tool_call_id": "call_xyz789",
"result": "{\"status\": \"confirmed\"}"
}
]
}'

See the MCP Tools Guide for more on tool calling.

Voice Sessions

Create a real-time voice session for a conversation using Daily and Pipecat:

curl -X POST https://api.yocaso.dev/api/v1/llm/gateway/create-daily-session \
-H "X-API-Key: sk_live_your_key_here" \
-H "Content-Type: application/json" \
-d '{
"conversation_key": "support-chat-001"
}'

The response includes a Daily room URL and token for the client to join the voice session.

User Impersonation

Backend services with the users:impersonate scope can operate on behalf of specific users:

curl -X POST https://api.yocaso.dev/api/v1/llm/gateway/send-message \
-H "X-API-Key: sk_live_your_key_here" \
-H "X-On-Behalf-Of: user_123" \
-H "Content-Type: application/json" \
-d '{
"conversation_key": "support-chat-001",
"user_message": {
"role": "user",
"content": "Hello"
}
}'

See the API Key Integration Guide for details on scopes.