Evals - ai@govtech

Stress-Testing Government Communications with AI Personas Grounded in Real Voices

Government communications can misfire in ways their authors never intended. We built an app that lets officers stress-test drafts against AI personas grounded in real Singapore voices, so they can catch what they missed in minutes rather than weeks.

EvalsResponsible AI

The Lab

MetaEvaluator: Systematically Evaluate Your LLM Judges

Measure how well your app is performing and more importantly where it's failing.

Evals

The Lab

Benchmarking GPT-5 & GPT-OSS: A Responsible AI Approach

Evaluating dimensions often overlooked by traditional benchmarks.

Responsible AIEvals