Live RAG execution
Agent Playground
Run the same question across versions and watch the agent move from wrong, to safer-but-slow, to production-ready.
Conversation
v1-production · acting as employee
employee
How many annual leave days do full-time employees receive?
Enterprise Knowledge Assistant
Submit the question to run this selected version and role.
Evaluation prompts
Run production-grade checks across v2, v3, and v4 to inspect behavior changes.
Judge view
Not run yet
Pick a version and ask a question to see whether this agent is ready to ship.
Why this happened
Operational evidence, not hidden reasoning
Evidence used
The documents that shaped the answer
No context was retrieved or exposed.