AI Practice - ai@govtech

Agentic

Harnessing the harness

On building your own multi-agent orchestrator, and why owning the infrastructure around AI matters.

Multimodal

Video Generation Landscape Analysis: The Road to Informative Video

We tested 2026 SOTA models and found a "usability gap".

Responsible AI

Yes, you’re absolutely right… Right? A mini survey on LLM sycophancy

Ever spoken to an AI and felt like it was responding with insincere praise?

Robotics Machine Learning

The Realities of Robot Deployment: What It Takes for Embodied AI to Succeed

The "hype" of robots ignore the unstructured environment problem.

Responsible AI

MetaEvaluator: Systematically Evaluate Your LLM Judges

Measure how well your app is performing and more importantly where it's failing.

Machine Learning

MLOps Transformation: Moving from Stage 0 to Stage 3 (Part II)

A maturity roadmap and a cultural shift.

Agentic

Building for Agentic AI - Agent SDKs & Design Patterns

The true value of AI agents lies in loops and self-correction rather than raw reasoning power.

Agentic Infrastructure

A deeper look into using MCP in the enterprise

A universal "USB-C" for AI?

Machine Learning

Building MLOps Bridges: Our Journey in Uplifting Agencies

A practical guide to MLOps adoption across Government teams.

General

Building a Better RAG Pipeline for HR Policy Q&A: What Worked and What Didn’t

We tested the most effective approaches.

Responsible AI

Benchmarking GPT-5 & GPT-OSS: A Responsible AI Approach

Evaluating dimensions often overlooked by traditional benchmarks.

General

“The Bots Are Here. Now What?” How Knowledge Management Became the Key to Powering GenAI Solutions

Available LLMs are powerful enough. What we are missing is the knowledge to fuel them.

Responsible AI

Introducing LionGuard 2: Multilingual LLM Guardrail for Singapore

We improved its coverage and robustness.

Responsible AI

RabakBench: Multilingual AI Safety Evaluation Made Local

Global safety guardrails are often blind to local dialects and sensitivities.

Validating Annotation Agreement between Humans and LLMs

Who Judges the Judge? At GovTech’s AI Practice, we’ve been embracing what’s known as “LLM-as-a-judge” — essentially employing LLMs as evaluators across our AI workflows. This approach has become one powerful approach in our evaluation toolkit. We use LLMs extensively across multiple areas: judging other LLM outputs (e.

Responsible AI

Does your LLM know when to say “I don’t know”?

Refusal by a model to answer may sometimes be more valuable.

Responsible AI

Fine-Tuning Language Models for Long-Context Data: Automated Stance Analysis of Citizen Discussions

Addressing technical challenges of processing high-volume public feedback for policy-making

Machine Learning

MLOps Transformation: Moving from Stage 0 to Stage 3 (Part I)

Much a cultural shift as a technical one.

General

Evaluating MOE’s SLS Learning Assistant: Using Synthetic Data and LLMs to Benchmark Faithfulness and Factuality

Safer, faster testing of student-facing AI before real-world deployment.

Infrastructure

From Infrastructure to Intelligence (Part 1): Strategic Foundations for AI Model Hosting and Agent-Based Architectures on the Cloud

What began as simple chatbot prototypes has evolved into full-fledged agent architectures.

Agentic

The other side of Agentic AI

An agent's utility is capped by its environment interface rather than just its reasoning capabilities.

Responsible AI

Securing Guardrails with Automated Red Teaming

Manual testing is no longer scalable.

Responsible AI

(Part 2) LLM Safety Alignment for the Singapore Context using Supervised Fine-tuning and RLHF-based Methods

Safety must be "baked in".

Responsible AI

(Part 1) LLM Safety Alignment for the Singapore Context using Supervised Fine-tuning and RLHF-based Methods

The process of "teaching" models to be safe

Responsible AI

Eliciting Toxic Singlish from r1

A red-teaming exercise that proves even "reasoning" models can be coaxed.