Evaluate · [Re]train · Deploy · Control

Improve the whole agent — not just the prompt.

Instructions, tools, rules, data access, workflows, and policies. Xplore evaluates, trains, deploys, and controls across every layer of agent behavior.

See the architecture Start with Evaluate →

Architecture

Everything that shapes agent behavior is scored, versioned, and improvable.

Most platforms tune prompts. Xplore treats the entire agent configuration as a trainable surface — scored, versioned, and improved across every iteration.

Instructions

System prompts, task descriptions, behavioral guidelines — the words that shape agent behavior.

Tools

Python functions, API integrations, data connectors — the capabilities an agent can invoke.

Rules

Escalation policies, safety guardrails, compliance constraints — the boundaries agents operate within.

Data access

Which databases, APIs, and services the agent can reach — and how it queries them.

Workflows

Step ordering, branching logic, parallel execution — the structure of how agents work.

Policies

Promotion gates, cost limits, certification requirements — the governance layer over everything.

Architecture

Deeper training unlocks more of the agent for optimization.

From prompt tuning (L0) to team synthesis (L5). Each level unlocks more of the trainable surface. L0–L3 are live today. L4–L5 on the roadmap.

Training depth ladder

Prompt tuning

Edits instructions, rules, priorities

● live

Diagnostic training

Reads traces, finds root causes, then edits

● live

Architecture training

Changes agent workflow — adds steps, branches

● live

Tool creation

Writes new tools, extends agent capabilities

● live

Curriculum evolution

Improves the tests themselves

○ roadmap

Team synthesis

Creates specialized sub-agents

○ roadmap

Architecture

Scores drive training. Training produces versions. Monitoring closes the loop.

Evaluate, [Re]train, Deploy, Control — each verb operates across the full trainable surface. The architecture ensures they connect: evaluation results feed training, training produces deployable versions, deployment gates enforce standards, monitoring feeds back into retraining.

Evaluate

Scoring

40+ composable evaluators run against the full trainable surface. Not just prompt output — tool usage, rule compliance, workflow correctness.

[Re]train

Training

Six depth levels from prompt tuning to team synthesis. The trainer edits every surface, not just instructions.

Deploy

Promotion

Gated release with regression certifications. Every version carries a full evaluation snapshot.

Control

Monitoring

Live certifications, drift detection, cost tracking. The same evaluation chains used in training, running continuously in production.

Architecture

Your existing stack stays. Xplore adds the reliability layer.

Your LLM provider stays. Your data platform stays. Your orchestration framework stays. Xplore sits at the runtime layer — resolving agent configuration, executing evaluation, managing training, and enforcing production controls.

Stack position

L4OrchestrationYour framework

L3ObservabilityYour tooling

L2RuntimeXplore

L1Data platformYour infra

L0ModelsYour provider