Building an automated Evals workflow that works (and open-sourcing it)
How we built Kaleidoscope: A structured workflow for realistic, scalable, and human-aligned contextual AI evaluations.
Our experiments and insights from tinkering at the frontier of AI