Eval harness
The capstone: one pipeline runs tasks with tracing, grades, safety-tests, benchmarks, and reports.
Section: testing-evaluation · scene id eval-harness · tutorial 04-testing-evaluation/07-eval-harness
The capstone: one pipeline runs tasks with tracing, grades, safety-tests, benchmarks, and reports.
Section: testing-evaluation · scene id eval-harness · tutorial 04-testing-evaluation/07-eval-harness