Conversations

This guide explains how conversations work on the GYBC platform — threads, messages, generation, and configuration.

Core Concepts

Threads

A thread is a conversation container. Each thread has:

A unique conversation_key (your chosen identifier)
A display name
Its own message history
Independent settings and generation config

Threads are scoped to the authenticated user. One user can have many threads.

Messages

Messages are the content units within a thread. Each message has a role and content:

Role	Description
`user`	Messages from the end user
`assistant`	AI-generated responses
`system`	System instructions (injected by the platform or via settings)
`tool`	Results from MCP tool calls

Conversation State

Every thread maintains state that includes:

Message history — the full ordered list of messages
Generation config — model, temperature, max tokens, etc.
Settings — system prompt, context management, interrupt policy
Status — whether a generation run is active (RUNNING, IDLE, etc.)
Pending tool approvals — tools waiting for user approval before execution

Conversation Lifecycle

1. Create a Thread

curl -X POST https://api.yocaso.dev/api/v1/llm/gateway/create-thread \
  -H "X-API-Key: sk_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Support Chat",
    "conversation_key": "support-chat-001"
  }'

The conversation_key is your identifier — use something meaningful to your application.

2. Send a Message

curl -X POST https://api.yocaso.dev/api/v1/llm/gateway/send-message \
  -H "X-API-Key: sk_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "conversation_key": "support-chat-001",
    "user_message": {
      "role": "user",
      "content": "How do I reset my password?"
    }
  }'

When you send a message:

The message is appended to the thread's history
A generation run starts asynchronously
The LLM processes the conversation history and generates a response
If the model calls tools, tool execution happens automatically (or awaits approval)
The assistant response is appended to history
The response is published as streaming chunks via Kafka

3. Retrieve State

curl -X POST https://api.yocaso.dev/api/v1/llm/gateway/conversation-state \
  -H "X-API-Key: sk_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "conversation_key": "support-chat-001"
  }'

Returns the full conversation state including all messages, settings, and current status.

4. List Threads

curl -X POST https://api.yocaso.dev/api/v1/llm/gateway/list-threads \
  -H "X-API-Key: sk_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{}'

Returns all threads for the authenticated user.

Generation Config

Control how the AI generates responses by updating the default generation config.

curl -X POST https://api.yocaso.dev/api/v1/llm/gateway/update-default-generation-config \
  -H "X-API-Key: sk_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "conversation_key": "support-chat-001",
    "generation_config": {
      "model": "openai/gpt-4o",
      "temperature": 0.7,
      "max_tokens": 2048,
      "top_p": 0.9
    }
  }'

Key generation config fields:

Field	Description	Default
`model`	LLM model to use (OpenRouter model ID)	Platform default
`temperature`	Randomness (0.0 = deterministic, 2.0 = creative)	1.0
`max_tokens`	Maximum tokens in the response	Model default
`top_p`	Nucleus sampling threshold	1.0

Conversation Settings

Update the conversation's system prompt, interrupt policy, and other settings.

curl -X POST https://api.yocaso.dev/api/v1/llm/gateway/update-settings \
  -H "X-API-Key: sk_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "conversation_key": "support-chat-001",
    "settings": {
      "system_prompt": "You are a helpful customer support agent for Acme Corp.",
      "interrupt_policy": "QUEUE"
    }
  }'

System Prompt

The system_prompt is prepended to every LLM request for this conversation. Use it to set the AI's persona, rules, and context.

Interrupt Policy

Controls what happens when a user sends a new message while a generation run is already in progress:

Policy	Behavior
`QUEUE`	Queue the new message and process it after the current run completes
`INTERRUPT`	Cancel the current run and start a new one with the latest message

Context Management

For long conversations, the message history can exceed the model's context window. Context management settings control how this is handled.

curl -X POST https://api.yocaso.dev/api/v1/llm/gateway/update-context-management-settings \
  -H "X-API-Key: sk_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "conversation_key": "support-chat-001",
    "context_management_settings": {
      "strategy": "SLIDING_WINDOW",
      "max_history_messages": 50
    }
  }'

Strategy	Description
`SLIDING_WINDOW`	Keep the most recent N messages
`SUMMARIZE`	Summarize older messages to preserve context while reducing token count

Streaming

Message generation is asynchronous. AI responses are published as streaming chunks via Kafka, enabling real-time delivery to connected clients.

The flow:

send-message starts a generation run and returns immediately
The LLM generates tokens incrementally
Each chunk is published to a Kafka topic
Your client consumes chunks for real-time display
Once complete, the full response is stored in the conversation state

Tool Calling

When the LLM decides to use a tool (via MCP), the platform can either:

Auto-execute the tool and feed results back to the LLM
Request approval from the user before execution

Checking Pending Approvals

curl -X POST https://api.yocaso.dev/api/v1/llm/gateway/list-pending-approvals \
  -H "X-API-Key: sk_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "conversation_key": "support-chat-001"
  }'

Submitting Approvals

curl -X POST https://api.yocaso.dev/api/v1/llm/gateway/submit-tool-approvals \
  -H "X-API-Key: sk_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "conversation_key": "support-chat-001",
    "approvals": [
      {
        "tool_call_id": "call_abc123",
        "approved": true
      }
    ]
  }'

Client-Side Tools

For tools that should execute on the client (e.g., UI actions), submit results back to the conversation:

curl -X POST https://api.yocaso.dev/api/v1/llm/gateway/submit-client-tool-results \
  -H "X-API-Key: sk_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "conversation_key": "support-chat-001",
    "tool_results": [
      {
        "tool_call_id": "call_xyz789",
        "result": "{\"status\": \"confirmed\"}"
      }
    ]
  }'

See the MCP Tools Guide for more on tool calling.

Voice Sessions

Create a real-time voice session for a conversation using Daily and Pipecat:

curl -X POST https://api.yocaso.dev/api/v1/llm/gateway/create-daily-session \
  -H "X-API-Key: sk_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "conversation_key": "support-chat-001"
  }'

The response includes a Daily room URL and token for the client to join the voice session.

User Impersonation

Backend services with the users:impersonate scope can operate on behalf of specific users:

curl -X POST https://api.yocaso.dev/api/v1/llm/gateway/send-message \
  -H "X-API-Key: sk_your_key_here" \
  -H "X-On-Behalf-Of: user_123" \
  -H "Content-Type: application/json" \
  -d '{
    "conversation_key": "support-chat-001",
    "user_message": {
      "role": "user",
      "content": "Hello"
    }
  }'

See the API Key Integration Guide for details on scopes.

Quickstart — Make your first API call
MCP Tools Guide — Tool calling in conversations
Memory Guide — Semantic memory for conversations
LLM Gateway API Reference — Full endpoint reference

Core Concepts​

Threads​

Messages​

Conversation State​

Conversation Lifecycle​

1. Create a Thread​

2. Send a Message​

3. Retrieve State​

4. List Threads​

Generation Config​

Conversation Settings​

System Prompt​

Interrupt Policy​

Context Management​

Streaming​

Tool Calling​

Checking Pending Approvals​

Submitting Approvals​

Client-Side Tools​

Voice Sessions​

User Impersonation​

Related​