The Lab Building an automated Evals workflow that works (and open-sourcing it) How we built Kaleidoscope: A structured workflow for realistic, scalable, and human-aligned contextual AI evaluations. Responsible AI
The Lab The Road Under the Harness Previously I wrote about building a harness for yourself. This one is about the environment you're building in, and why at enterprise scale, if the platform underneath doesn't exist, individual wins have nowhere to accumulate. AgenticInfrastructure
The Lab Scaling the Pentesting Team with AI Engineering Multi-Agent Architectures for Autonomous Penetration Testing. AgenticSecurity
The Lab Harnessing the harness On building your own multi-agent orchestrator, and why owning the infrastructure around AI matters. Agentic
The Lab Video Generation Landscape Analysis: The Road to Informative Video We tested 2026 SOTA models and found a "usability gap". Multimodal
The Lab Yes, you’re absolutely right… Right? A mini survey on LLM sycophancy Ever spoken to an AI and felt like it was responding with insincere praise? Responsible AI
The Lab MetaEvaluator: Systematically Evaluate Your LLM Judges Measure how well your app is performing and more importantly where it's failing. Evals
The Lab Building for Agentic AI - Agent SDKs & Design Patterns The true value of AI agents lies in loops and self-correction rather than raw reasoning power. Agentic
The Lab A deeper look into using MCP in the enterprise A universal "USB-C" for AI? AgenticInfrastructure
The Lab Benchmarking GPT-5 & GPT-OSS: A Responsible AI Approach Evaluating dimensions often overlooked by traditional benchmarks. Responsible AIEvals
The Lab (Part 2) LLM Safety Alignment for the Singapore Context using Supervised Fine-tuning and RLHF-based Methods Safety must be "baked in". Responsible AI
The Lab (Part 1) LLM Safety Alignment for the Singapore Context using Supervised Fine-tuning and RLHF-based Methods The process of "teaching" models to be safe Responsible AI
The Lab Eliciting Toxic Singlish from r1 A red-teaming exercise that proves even "reasoning" models can be coaxed. Responsible AI