Benchmarking
Run the same tasks across every model, measure accuracy/latency/cost, and pick the Pareto-optimal config.
Section: testing-evaluation · scene id benchmarking · tutorial 04-testing-evaluation/05-benchmarking
Run the same tasks across every model, measure accuracy/latency/cost, and pick the Pareto-optimal config.
Section: testing-evaluation · scene id benchmarking · tutorial 04-testing-evaluation/05-benchmarking