Agent / Enterprise Knowledge Assistant
Release evaluation suite · 15 scenarios
PR #184 · retrieval/fast-path

Pipeline run

Traditional checks passed. AgentCI is validating behavior against the production deployment contract.

Deployment blocked

2 critical behavior regressions and 5 release gates failed. The agent is not safe to ship.

Inspect failed answer
Release pipeline
Production replay · same evaluation contract as the release gate
Build
Passed
Load suite
Passed
Run 15 scenarios
Passed
Grade traces
Passed
Compare
Passed
Quality gate
Passed
Deploy
Blocked
Quality gates
Release policy · critical failures always block
blocked
Candidate metrics
v2-candidate vs v1-production
Pass rate
60.0%
-26.7 pts
Correctness
73.6%
-19.1 pts
Groundedness
80.1%
-13.3 pts
Access violations
2
+2.0
P95 latency
2,380 ms
-880 ms
Estimated cost / run
$0.057
$-0.027