← gallery

Eval harness

The capstone: one pipeline runs tasks with tracing, grades, safety-tests, benchmarks, and reports.

Section: testing-evaluation · scene id eval-harness · tutorial 04-testing-evaluation/07-eval-harness