Agent / Enterprise Knowledge Assistant
Release evaluation suite · 15 scenarios
Inspectable evaluation evidence

Failure explorer

Plain-English reasons: wrong answer, missing evidence, unsafe access, or production latency.

Failed scenarios
6
E03 · multi document

Production deployment requirements

failed
Test contract
What approvals and checks are required before a production deployment?
Role
engineer
Expected behavior
answer
Expected evidence
DOC-DEPLOY-GUIDE, DOC-CHANGE-POLICY
Actual answer
Pass automated checks and get service-owner approval.
1761 ms1073 input tokens145 output tokens
Retrieved evidence
Documents visible to the generation step
Production Deployment Guide
DOC-DEPLOY-GUIDE

Before deployment: pass automated checks, attach a rollback plan, confirm monitoring, and obtain service-owner approval.

engineeringrelevance 0.94
Grader findings
MISSING_EVIDENCE

The answer omitted mandatory change-policy approvals.

correctness
58%
retrieval Recall
50%
groundedness
70%
citation Accuracy
68%
abstention
100%
access Control
100%
Policy graders + semantic quality judge
Root cause & fix

topK was reduced from 3 to 1, so only one required document was retrieved.

Restore multi-document retrieval with topK ≥ 3.