Skip to content
Xplore
Chat Agents

Build a chatbot that learns from your conversations.

Simulate real support scenarios. Train your chatbot on tone, accuracy, escalation, and resolution — using your actual conversation patterns. Deploy when it passes your quality bar.

0.91
Tone score — support-v4
Training run
0.88
Accuracy — factual correctness
Training run
0.92
Escalation precision
Training run
0.03
Refusal rate
Below threshold
Evaluate

You know why a chatbot fails — and where.

Tone and accuracy aren't the same metric. Score them separately, weight them for your business. A chatbot that's polite but wrong fails differently than one that's correct but cold — your eval catches both.

Chat agent eval · support-v4
Tone
0.91
Accuracy
0.88
Resolution
0.84
Escalation
0.92
Refusal rate
0.97
Weighted 0.90
Chatbots
For: Customer support
Train: tone, accuracy, escalation
refusal_rate 0.03
tone_score 0.89
accuracy 0.91
Training progress
1.0 0.0 iterations
Config diff
Iteration 8 9
trainer: openclaw · chat-support
+ tool: verify_source_citation
"Cross-check facts against original document"
~ rule: escalation_policy
threshold: 0.4 → 0.6
~ instruction: verification section
added: "Always cite page number"
score: 0.83 → 0.90 +0.07 promoted
[Re]train

Your chatbot gets better with every training cycle.

Train on your actual conversations, knowledge base, and edge cases. The trainer iterates on tone rules, accuracy thresholds, and escalation logic. You see every change as a diff — and the score delta that resulted.

Deploy

Only chatbots that pass your tone bar reach customers.

Set promotion thresholds on tone, accuracy, and escalation. Candidates stay on the training branch until they clear every gate. Versioned. Rollback in one click.

Agent overview — chat agent versions, promotion history
Production controls · live
tone_quality enabled
agent: chatbot-prod
last 24h: 2,341 runs · avg: 0.89
alerts: 0
accuracy enabled
agent: chatbot-prod
last 24h: 2,341 runs · avg: 0.88
alerts: 0
escalation enabled
agent: chatbot-prod
last 24h: 2,341 runs · avg: 0.92
⚠ drift: escalation_precision 0.92 → 0.81
Control

Tone drops — you know. Retraining starts automatically.

Live scoring on every production conversation. Drift detection catches when tone degrades, accuracy slips, or escalation precision drops. Alerts fire. Retraining triggers. Customers never notice.