GBR-002

GOV.UK Chat

Download PDF
United Kingdom Europe & Central Asia High income Pilot / Controlled Trial Phase Confirmed

Government Digital Service (GDS), Department for Science, Innovation and Technology (DSIT)

At a Glance

What it does LLMs for content creation, transformation and modality conversion — User communication and interaction
Who runs it Government Digital Service (GDS), Department for Science, Innovation and Technology (DSIT)
Programme GOV.UK Chat
Confidence Confirmed
Deployment Status Pilot / Controlled Trial Phase
Key Risks Model-related risks
Key Outcomes Initial experiment: nearly 70% of users found responses useful, approximately 65% satisfied with experience, 80% accuracy threshold achieved.
Source Quality 5 sources — Government website / press release, Report (government / official)

GOV.UK Chat is a Retrieval Augmented Generation (RAG) chatbot developed by the Government Digital Service (GDS) within the Department for Science, Innovation and Technology (DSIT) in the United Kingdom. The system is designed to provide citizens and businesses with quick, personalised answers to questions about government services, regulations, and guidance by drawing on the approximately 700,000 pages of content published on the GOV.UK website. Rather than requiring users to navigate through multiple pages of search results, GOV.UK Chat allows them to pose natural language questions and receive synthesised, conversational responses grounded in official government content.

The system works through a multi-stage RAG pipeline. When a user submits a question, the system first retrieves relevant content chunks from a vector database containing approximately 100,000 GOV.UK pages processed into roughly 700,000 chunks totalling 36.9 gigabytes of data. The retrieved content is then passed to a large language model which generates a natural language response based solely on the retrieved government content. Before the response reaches the user, it passes through safety guardrails that filter for quality, appropriateness, and adherence to tolerance thresholds. Each answer is presented alongside 'check this answer' links to the original GOV.UK source pages, enabling users to independently verify the information provided.

The project began in July 2023 as an internal experiment within GDS. The initial technical architecture used OpenAI's GPT-3.5-turbo-16k model accessed via API, with a Qdrant vector store for document retrieval and the LangChain framework for integration, all hosted on Google Cloud. The system was tested through five phased experiments, beginning with internal testing and progressing to a scaled pilot with 1,000 invited users accessed through 'magic links' on selected GOV.UK business pages. During this initial phase, nearly 70 percent of users found the chatbot's responses useful and approximately 65 percent were satisfied with their experience, while the system achieved an accuracy threshold of 80 percent.

The technology stack evolved significantly as the project matured. By the time of the Algorithmic Transparency Record filing, the system had migrated to Anthropic's Claude Sonnet model (specifically Claude Sonnet-4, model ID eu.anthropic.claude-sonnet-4-20250514-v1:0) hosted on AWS Bedrock in the Ireland EU region with cross-regional inference capability. The embedding model was updated to Amazon Titan Embed Text v2. The application infrastructure runs on Ruby on Rails deployed on Kubernetes within AWS, with an AWS RDS PostgreSQL database (encrypted at rest) for data storage and Amazon OpenSearch for search indexing. Google BigQuery is used for analytics.

In November 2024, GDS launched a private beta of GOV.UK Chat, providing access to business users through links on selected business-related GOV.UK pages, with a waiting list managing capacity. The focus on business content was deliberate, as it represents a domain where users frequently need to navigate complex cross-departmental policies and regulations. Testing during this phase included up to 2,000 users over four-week periods. In July 2025, GOV.UK Chat was selected as one of the Prime Minister's AI Exemplars and designated as one of five 'kickstarters' in the UK blueprint for modern digital government.

Safety and security have been central to the system's development. GDS conducted a Data Protection Impact Assessment (completed September 2025) and a Secure by Design framework review. The system implements multiple layers of protection: incoming user queries are screened via regex for phone numbers, email addresses, and card numbers, with queries containing personal data rejected outright; GOV.UK pages likely to contain personal data are filtered out before vectorisation; and response guardrails perform LLM-based filtering to catch outputs outside tolerance levels. Extensive red teaming exercises were conducted in collaboration with cyber security experts from CDDO, Number 10 Data Science, and the i.AI team, as well as the AI Security Institute. During testing, more than 500 jailbreak attempts were successfully blocked by existing safeguards, though the team acknowledges that it is not possible to guarantee no jailbreaking attempts will ever be successful.

The system's accuracy improved substantially over its development lifecycle. Answer accuracy rose from approximately 80 percent in the initial experiment to 90 percent in later evaluations, attributed both to advances in the underlying LLM technology and to GDS's own data science work on retrieval, chunking strategies, alternative embedding models, re-ranking, and improved prompt engineering. Evaluation uses a hybrid approach combining automated metrics (precision, recall, LLM-as-Judge) with manual assessment by subject matter experts from across government, including HMRC specialists who scored accuracy against content designer-written reference answers.

By late 2025 and into 2026, GDS began planning wider rollout, starting with deployment in the GOV.UK app (available on iOS and Android) before extending across the GOV.UK website. The team has also begun exploring experimental agentic AI capabilities, envisioning an evolution from a system that merely provides answers to one that can facilitate simple government transactions and hand off to departmental customer support when needed. Anthropic provided general advice and engineering support under a Joint Innovation Vehicle procurement arrangement, though no data access was granted for development purposes.

Classifications follow the DCI AI Hub Taxonomy. Hover over field labels for definitions.

Social Protection Functions

Implementation/delivery chain
Outreach/communications/sensitisation primary
SP Pillar (Primary) The social protection branch: social assistance, social insurance, or labour market programmes. Social assistance
Programme Name GOV.UK Chat
Programme Type The type of social protection programme, classified under social assistance, social insurance, or labour market programmes. View in glossary Other
System Level Where in the social protection system the AI is applied: policy level, programme design, or implementation/delivery chain. View in glossary Implementation/delivery chain
Programme Description GOV.UK Chat is a RAG-based AI chatbot built by the Government Digital Service (GDS) to help citizens and businesses find information across the GOV.UK website through natural language queries. The system retrieves relevant content from approximately 700,000 GOV.UK content chunks and generates conversational responses grounded exclusively in official government guidance. It evolved from an internal experiment in July 2023 through private beta in November 2024 to planned wider rollout across the GOV.UK website and app in 2026.
Implementation Type How the AI output is produced: Classical ML, Deep learning, Foundation model, or Hybrid. Affects validation, compute requirements, and governance profile. View in glossary Foundation model
Lifecycle Stage Current stage in the AI lifecycle, from problem identification through to monitoring, maintenance and decommissioning. View in glossary Integration and Deployment
Model Provenance Origin of the AI model: developed in-house, adapted from open-source, commercial/proprietary, or accessed via third-party API. View in glossary API-accessed third-party
Compute Environment Where the AI system runs: on-premise, government cloud, commercial cloud, or edge/device. View in glossary Commercial cloud
Compute Provider The specific cloud or infrastructure provider hosting the AI system. AWS (Amazon Web Services); previously Google Cloud during initial experiment
Sovereignty Quadrant Classification of data and compute sovereignty: I (Sovereign), II (Federated/Hybrid), III (Cloud with safeguards), or IV (Shared Innovation Zone). View in glossary III — Compute-Intensive Cloud with safeguards
Data Residency Where the data used by the AI system is stored: domestic, regional, or international. View in glossary Regional
Data Residency Detail Additional detail on the specific data hosting arrangements and jurisdictions. AWS Bedrock hosted in Ireland EU region with cross-regional inference; application infrastructure on AWS; analytics on Google BigQuery
Cross-Border Transfer Whether data crosses national borders, and if so, whether documented safeguards are in place. View in glossary With documented safeguards
Is Agentic Whether the system autonomously plans and executes multi-step workflows, selecting tools and chaining actions with limited human intervention. View in glossary Partial
Agentic Pipeline Description of the chained workflow steps in the agentic pipeline. Current system is a standard RAG pipeline (retrieve-generate-filter) without autonomous action. However, GDS has begun exploring experimental agentic AI capabilities to evolve from providing answers to facilitating simple government transactions and departmental handoffs.
Agentic Autonomy Degree of autonomy: fully autonomous, semi-autonomous (human checkpoints), or supervised (human approval at each step). Supervised
Override Points Where in the pipeline human review or override is triggered. Response guardrails filter all LLM outputs before delivery to users; GOV.UK AI Team monitors via admin system with audit logging; users encouraged to verify via source links
Decision Criticality The rights impact of the decision the AI supports. High criticality requires HITL oversight; moderate requires HOTL; low may operate HOOTL. View in glossary Low
Human Oversight Type Level of human involvement: Human-in-the-Loop (active review), Human-on-the-Loop (monitoring), or Human-out-of-the-Loop (periodic audit). View in glossary HOTL
Development Process Whether the AI system was developed fully in-house, through a mix of in-house and third-party, or fully by an external provider. View in glossary Mix of in-house and third-party
Highest Risk Category The most significant structural risk source identified: data, model, operational, governance, or market/sovereignty risks. View in glossary Model-related risks
Risk Assessment Status Whether a formal risk assessment, informal assessment, or independent audit has been conducted for this system. Formal assessment

Impact Dimensions

Autonomy, human dignity and due process
Systemic and societal
  • DPIA/AIA conducted
  • Data minimisation controls
  • Human oversight protocol
CategorySensitivityCross-System LinkageAvailabilityKey Constraints
Unstructured and text-based contentNon-personalSingle source (no linkage)Currently available and usedApproximately 700,000 content chunks from ~100,000 GOV.UK pages (36.9 GB); pages likely to contain personal data are filtered out before vectorisation; content updated regularly requiring re-indexing

Central Digital and Data Office (2025) Artificial Intelligence Playbook for the UK Government. London: CDDO. Available at: https://www.gov.uk/government/publications/ai-playbook-for-the-uk-government/artificial-intelligence-playbook-for-the-uk-government-html (Accessed: 24 March 2026).

View source Government website / press release

Department for Science, Innovation and Technology (2025) 'DSIT: GOV.UK Chat', Algorithmic Transparency Recording Standard. Available at: https://www.gov.uk/algorithmic-transparency-records/dsit-gov-dot-uk-chat (Accessed: 24 March 2026).

View source Report (government / official)

Dub, S. and Davey, J. (2024) 'We're running a private beta of GOV.UK Chat', Inside GOV.UK Blog, 5 November. Available at: https://insidegovuk.blog.gov.uk/2024/11/05/were-running-a-private-beta-of-gov-uk-chat/ (Accessed: 24 March 2026).

View source Government website / press release

GDS (2024) 'The findings of our first generative AI experiment: GOV.UK Chat', Inside GOV.UK Blog, 18 January. Available at: https://insidegovuk.blog.gov.uk/2024/01/18/the-findings-of-our-first-generative-ai-experiment-gov-uk-chat/ (Accessed: 24 March 2026).

View source Government website / press release

GDS (2025) 'GOV.UK has entered the Chat: our vision for GOV.UK Chat', Inside GOV.UK Blog, 16 December. Available at: https://insidegovuk.blog.gov.uk/2025/12/16/gov-uk-has-entered-the-chat-our-vision-for-gov-uk-chat/ (Accessed: 24 March 2026).

View source Government website / press release
Deployment Status How far the system has progressed into real-world operational use, from concept/exploration through to scaled and institutionalised. View in glossary Pilot / Controlled Trial Phase
Year Initiated The year the AI system was first initiated or development began. 2023
Scale / Coverage The scale and geographic or population coverage of the deployment. Private beta with up to 2,000 users per 4-week testing period; 1,000 users in initial scaled pilot; up to 15,000 planned for next phase; targeting rollout across GOV.UK website and app serving millions of users
Funding Source The source(s) of funding for the AI system development and deployment. Government (GDS/DSIT budget); Anthropic engineering support procured via Joint Innovation Vehicle
Technical Partners External technology vendors, academic partners, or development partners involved. Anthropic (LLM provider via AWS Bedrock, general advice and engineering support under Joint Innovation Vehicle procurement); AWS (cloud infrastructure, Bedrock hosting, RDS PostgreSQL, OpenSearch); previously OpenAI (GPT-3.5-turbo-16k and later GPT-4o/GPT-4o mini during earlier phases); Google Cloud (initial hosting and BigQuery analytics)
Outcomes / Results Initial experiment: nearly 70% of users found responses useful, approximately 65% satisfied with experience, 80% accuracy threshold achieved. Subsequent improvements raised accuracy from 76% to 90%. During beta testing, more than 500 jailbreak attempts were successfully blocked. Just under 80% of research participants understood that GOV.UK Chat can contain inaccurate information after onboarding. Users showed preference for GOV.UK Chat over traditional search and navigation methods, particularly for complex cross-departmental queries.
Challenges Hallucination risk remains inherent to LLM-based systems and cannot be fully eliminated despite RAG grounding. Initial chunking approach using whole pages caused token limit errors with long pages. Quality assurance at scale is labour-intensive; manual evaluation techniques used in early phases are not practical for full deployment. Users may over-trust GOV.UK Chat responses due to the credibility of the GOV.UK brand. Accuracy below 100% on a government website raises concerns given the duty of care associated with official guidance.

How to Cite

DCI AI Hub (2026). 'GOV.UK Chat', AI Hub AI Tracker, case GBR-002. Digital Convergence Initiative. Available at: https://socialprotectionai.org/use-case/GBR-002 [Accessed: 1 April 2026].

Change History

Created 30 Mar 2026, 08:39
by v2-import (import)