Testing & evaluation
A layered quality system: unit tests, evals, tracing, red-teaming, and benchmarking — in one harness.
Section: testing-evaluation · scene id testing-evaluation-overview · tutorial 04-testing-evaluation
A layered quality system: unit tests, evals, tracing, red-teaming, and benchmarking — in one harness.
Section: testing-evaluation · scene id testing-evaluation-overview · tutorial 04-testing-evaluation