Benchmarking

Run the same tasks across every model, measure accuracy/latency/cost, and pick the Pareto-optimal config.

Section: testing-evaluation · scene id benchmarking · tutorial 04-testing-evaluation/05-benchmarking