Case Study

RAG Knowledge Platform at a Government Entity - 40% Retrieval Accuracy Lift

Designed and deployed a Retrieval-Augmented Generation platform integrating SharePoint, Weaviate, and Ollama 3.1 at a UAE government entity - improving internal document retrieval accuracy by ~40% and lifting knowledge-management relevance by ~35%.

2023 - 2025 · Project Manager - AI & Digital Transformation · AI · RAG · LLM · Enterprise Search · Government

Outcomes

  • Improved internal document search accuracy by approximately 40% over the baseline keyword search
  • Lifted knowledge-management relevance by approximately 35% across legal, policy, and operational corpora
  • Architected an event-driven microservice stack (Kafka + mediator pattern), cutting API overhead by 25%
  • Achieved 99.8% uptime and 20% faster inference under production load through API tuning and capacity testing
  • Embedded the platform inside the entity's AI Governance & Validation Framework - bias monitoring, audit trails, DESC compliance
  • Delivered ROI dashboards visualising AI adoption, cost savings, and KPI impact for IT directors and agency leadership
2023–2025 · Government Entity · UAE · RAG · LLM

Context

The entity’s knowledge lives across hundreds of SharePoint sites, policy repositories, and operational document stores. Traditional keyword search couldn’t keep up - staff spent meaningful time hunting for things they knew existed somewhere. A flagship transformation programme made this acutely visible: AI initiatives needed grounded, governance-compliant retrieval across legal, policy, and operational corpora, not another opaque search box.

I led the design and delivery of a Retrieval-Augmented Generation (RAG) platform over the entity’s enterprise content, with an explicit data-residency-first architecture.

Architecture

The system layers four concerns:

  1. Ingestion and chunking - content from SharePoint and other sources processed, cleaned, and chunked. Document structure (headings, sections, tables) preserved as metadata for context-aware retrieval.

  2. Vector indexing in Weaviate - chunks embedded and indexed semantically, enabling retrieval by meaning rather than exact match.

  3. Local LLM with retrieval grounding (Ollama 3.1) - answers generated grounded in retrieved context with citations back to source documents. Local inference keeps sensitive material on-premise.

  4. Event-driven service backbone - Kafka-based microservice architecture with a mediator pattern, cutting API overhead by ~25% and enabling async workflow automation across downstream systems.

Result

  • Retrieval accuracy up ~40% over the baseline keyword search on a representative test set of real user queries.
  • Knowledge-management relevance up ~35%.
  • Production stability at 99.8% uptime with 20% faster inference after API and capacity optimisation.
  • Adoption supported by ROI dashboards regularly reviewed by IT Directors and Agency Leadership.

Beyond the headline numbers, the platform changed how teams interact with knowledge: shorter time to answer, more “I didn’t know that existed” moments, and trust in citations because every answer points back to source documents.

Governance

Equal weight to retrieval quality. The platform was deployed inside the AI Governance & Validation Framework I established at the entity - KPIs for retrieval, bias monitoring, audit trails, and compliance with DESC cloud-security and UAE government AI ethics principles. For regulated environments this isn’t optional: a RAG system without governance is a liability whatever its accuracy looks like in a notebook.

Lessons

RAG sounds straightforward in a slide. In practice, it’s a chain of decisions - chunking strategy, embedding choice, retrieval ranking, prompt design, citation rendering, evaluation methodology - and any link can quietly tank quality. The accuracy lift came from disciplined attention to all of them, not from any single magic ingredient.

Local LLM inference also changes the conversation. It removes the data-residency objection that blocks many cloud-only AI deployments, and it forces a more thoughtful conversation about model choice, hardware sizing, and cost - which tends to produce better systems anyway.