Skip to content
Xplore
Evaluate · [Re]train · Deploy · Control

Improve the whole agent — not just the prompt.

Instructions, tools, rules, data access, workflows, and policies. Xplore evaluates, trains, deploys, and controls across every layer of agent behavior.

Architecture

Everything that shapes agent behavior is scored, versioned, and improvable.

Most platforms tune prompts. Xplore treats the entire agent configuration as a trainable surface — scored, versioned, and improved across every iteration.

Instructions

System prompts, task descriptions, behavioral guidelines — the words that shape agent behavior.

Tools

Python functions, API integrations, data connectors — the capabilities an agent can invoke.

Rules

Escalation policies, safety guardrails, compliance constraints — the boundaries agents operate within.

Data access

Which databases, APIs, and services the agent can reach — and how it queries them.

Workflows

Step ordering, branching logic, parallel execution — the structure of how agents work.

Policies

Promotion gates, cost limits, certification requirements — the governance layer over everything.

Architecture

Deeper training unlocks more of the agent for optimization.

From prompt tuning (L0) to team synthesis (L5). Each level unlocks more of the trainable surface. L0–L3 are live today. L4–L5 on the roadmap.

Training depth ladder
L0
Prompt tuning
Edits instructions, rules, priorities
● live
L1
Diagnostic training
Reads traces, finds root causes, then edits
● live
L2
Architecture training
Changes agent workflow — adds steps, branches
● live
L3
Tool creation
Writes new tools, extends agent capabilities
● live
L4
Curriculum evolution
Improves the tests themselves
○ roadmap
L5
Team synthesis
Creates specialized sub-agents
○ roadmap
Architecture

Scores drive training. Training produces versions. Monitoring closes the loop.

Evaluate, [Re]train, Deploy, Control — each verb operates across the full trainable surface. The architecture ensures they connect: evaluation results feed training, training produces deployable versions, deployment gates enforce standards, monitoring feeds back into retraining.

Evaluate
Scoring

40+ composable evaluators run against the full trainable surface. Not just prompt output — tool usage, rule compliance, workflow correctness.

[Re]train
Training

Six depth levels from prompt tuning to team synthesis. The trainer edits every surface, not just instructions.

Deploy
Promotion

Gated release with regression certifications. Every version carries a full evaluation snapshot.

Control
Monitoring

Live certifications, drift detection, cost tracking. The same evaluation chains used in training, running continuously in production.

Architecture

Your existing stack stays. Xplore adds the reliability layer.

Your LLM provider stays. Your data platform stays. Your orchestration framework stays. Xplore sits at the runtime layer — resolving agent configuration, executing evaluation, managing training, and enforcing production controls.

Stack position
L4OrchestrationYour framework
L3ObservabilityYour tooling
L2RuntimeXplore
L1Data platformYour infra
L0ModelsYour provider