Turn production traces into evals, compare prompts and models, and improve quality with every release.