GOV.UK Chat is a Retrieval Augmented Generation (RAG) chatbot developed by the Government Digital Service (GDS) within the Department for Science, Innovation and Technology (DSIT) in the United Kingdom. The system is designed to provide citizens and businesses with quick, personalised answers to questions about government services, regulations, and guidance by drawing on the approximately 700,000 pages of content published on the GOV.UK website. Rather than requiring users to navigate through multiple pages of search results, GOV.UK Chat allows them to pose natural language questions and receive synthesised, conversational responses grounded in official government content.
The system works through a multi-stage RAG pipeline. When a user submits a question, the system first retrieves relevant content chunks from a vector database containing approximately 100,000 GOV.UK pages processed into roughly 700,000 chunks totalling 36.9 gigabytes of data. The retrieved content is then passed to a large language model which generates a natural language response based solely on the retrieved government content. Before the response reaches the user, it passes through safety guardrails that filter for quality, appropriateness, and adherence to tolerance thresholds. Each answer is presented alongside 'check this answer' links to the original GOV.UK source pages, enabling users to independently verify the information provided.
The project began in July 2023 as an internal experiment within GDS. The initial technical architecture used OpenAI's GPT-3.5-turbo-16k model accessed via API, with a Qdrant vector store for document retrieval and the LangChain framework for integration, all hosted on Google Cloud. The system was tested through five phased experiments, beginning with internal testing and progressing to a scaled pilot with 1,000 invited users accessed through 'magic links' on selected GOV.UK business pages. During this initial phase, nearly 70 percent of users found the chatbot's responses useful and approximately 65 percent were satisfied with their experience, while the system achieved an accuracy threshold of 80 percent.
The technology stack evolved significantly as the project matured. By the time of the Algorithmic Transparency Record filing, the system had migrated to Anthropic's Claude Sonnet model (specifically Claude Sonnet-4, model ID eu.anthropic.claude-sonnet-4-20250514-v1:0) hosted on AWS Bedrock in the Ireland EU region with cross-regional inference capability. The embedding model was updated to Amazon Titan Embed Text v2. The application infrastructure runs on Ruby on Rails deployed on Kubernetes within AWS, with an AWS RDS PostgreSQL database (encrypted at rest) for data storage and Amazon OpenSearch for search indexing. Google BigQuery is used for analytics.
In November 2024, GDS launched a private beta of GOV.UK Chat, providing access to business users through links on selected business-related GOV.UK pages, with a waiting list managing capacity. The focus on business content was deliberate, as it represents a domain where users frequently need to navigate complex cross-departmental policies and regulations. Testing during this phase included up to 2,000 users over four-week periods. In July 2025, GOV.UK Chat was selected as one of the Prime Minister's AI Exemplars and designated as one of five 'kickstarters' in the UK blueprint for modern digital government.
Safety and security have been central to the system's development. GDS conducted a Data Protection Impact Assessment (completed September 2025) and a Secure by Design framework review. The system implements multiple layers of protection: incoming user queries are screened via regex for phone numbers, email addresses, and card numbers, with queries containing personal data rejected outright; GOV.UK pages likely to contain personal data are filtered out before vectorisation; and response guardrails perform LLM-based filtering to catch outputs outside tolerance levels. Extensive red teaming exercises were conducted in collaboration with cyber security experts from CDDO, Number 10 Data Science, and the i.AI team, as well as the AI Security Institute. During testing, more than 500 jailbreak attempts were successfully blocked by existing safeguards, though the team acknowledges that it is not possible to guarantee no jailbreaking attempts will ever be successful.
The system's accuracy improved substantially over its development lifecycle. Answer accuracy rose from approximately 80 percent in the initial experiment to 90 percent in later evaluations, attributed both to advances in the underlying LLM technology and to GDS's own data science work on retrieval, chunking strategies, alternative embedding models, re-ranking, and improved prompt engineering. Evaluation uses a hybrid approach combining automated metrics (precision, recall, LLM-as-Judge) with manual assessment by subject matter experts from across government, including HMRC specialists who scored accuracy against content designer-written reference answers.
By late 2025 and into 2026, GDS began planning wider rollout, starting with deployment in the GOV.UK app (available on iOS and Android) before extending across the GOV.UK website. The team has also begun exploring experimental agentic AI capabilities, envisioning an evolution from a system that merely provides answers to one that can facilitate simple government transactions and hand off to departmental customer support when needed. Anthropic provided general advice and engineering support under a Joint Innovation Vehicle procurement arrangement, though no data access was granted for development purposes.