Back to Projects
AI IntegrationRAGGoAWS
January 2025

AI Support Copilot

RAG-based AI support copilot with source citations, tenant permissions, and sub-300ms latency.

<300ms latency99.9% uptime
GoAWSPostgreSQLRedisOpenAICognitoECS Fargate

AI Support Copilot - Solution Overview

What this project solves

A SaaS product had scattered support knowledge across docs, FAQs, runbooks, release notes, and resolved tickets. Users and support agents wasted time hunting for answers or escalating issues that should have been solved instantly. The goal was to build an AI support copilot that could answer product questions accurately, cite its sources, respect tenant and role permissions, and slot into an existing product without turning into a hallucinating chatbot.

What had to be true

  • Only authorised users could query documents they were allowed to see
  • Answers needed source citations
  • New docs had to become searchable quickly
  • The system had to work for both end users and internal support staff
  • Every AI response needed logging, feedback, and cost visibility
  • The hot path had to stay fast enough for normal in-app chat usage

Stack

  • Go
  • AWS
  • PostgreSQL with pgvector
  • Redis
  • S3
  • OpenAI or Anthropic
  • Cognito
  • ECS Fargate
  • CloudWatch

Solution in plain English

The system was split into two flows:

  1. Document ingestion

    • Documents were uploaded or synced into S3
    • Text was extracted, chunked, embedded, and stored in PostgreSQL
    • Each chunk carried tenant, product-area, and permission metadata
  2. Question answering

    • A user asked a question in the app
    • The backend authenticated the user, resolved their permissions, retrieved the most relevant chunks, built a grounded prompt, called the LLM, and returned an answer with citations
    • The whole exchange was logged for analytics, debugging, and feedback

High-level architecture

flowchart LR U[User] --> FE[Web App] FE --> API[Go API] API --> AUTH[Cognito] API --> REDIS[Redis] API --> PG[(PostgreSQL + pgvector)] API --> LLM[LLM Provider] DOCS[Docs / PDFs / HTML / KB] --> INGEST[Ingestion Worker] INGEST --> S3[S3] INGEST --> PG INGEST --> REDIS API --> LOG[Conversation / Feedback Log] LOG --> PG ADMIN[Admin UI] --> API

Ingestion flow

flowchart TD A[Document Added] --> B[Store in S3] B --> C[Extract Text] C --> D[Clean and Chunk Text] D --> E[Generate Embeddings] E --> F[Store Chunks + Metadata in PostgreSQL] F --> G[Refresh Search Cache]

Chat request flow

sequenceDiagram participant U as User participant API as Go API participant PG as PostgreSQL participant R as Redis participant LLM as LLM Provider U->>API: POST /chat/ask API->>API: Validate JWT API->>R: Check cached answer / hot context API->>PG: Retrieve relevant chunks filtered by tenant/role PG-->>API: Top matching chunks API->>LLM: Prompt with question + retrieved context LLM-->>API: Answer API->>PG: Store question, answer, citations, token usage API-->>U: Answer + citations

Core components

api-service

Handles:

  • authentication
  • conversation endpoints
  • retrieval
  • prompt building
  • LLM calls
  • answer logging
  • feedback capture

ingestion-worker

Handles:

  • file fetch / sync
  • text extraction
  • chunking
  • embedding creation
  • metadata enrichment
  • reindexing

admin-ui

Handles:

  • content sync status
  • failed ingestions
  • prompt/answer inspection
  • user feedback review
  • cost tracking

Data model

Table Purpose
documents Source file or page metadata
document_chunks Searchable chunks with embeddings
document_permissions Tenant / role / product scoping
conversations Chat sessions
messages User and assistant messages
message_citations Which chunks supported which answer
feedback Helpful / not helpful ratings
ingestion_runs Sync and indexing runs
audit_logs Admin and system audit trail

Key design decisions

1. Retrieval first, generation second

The model never answered from memory alone. It answered from retrieved chunks.

2. Permissions were applied before retrieval output

Not after. If a chunk was not visible to the user, it never entered the prompt.

3. Citations were mandatory

Every answer had links back to source docs or titled references.

4. Conversation history was trimmed

Only the useful recent messages plus the retrieved context were sent to the model.

5. Hot paths were cached carefully

Redis cached:

  • recent conversation context
  • top document fragments for popular queries
  • tenant-scoped config
  • prompt templates

Minimal API shape

User endpoints

  • POST /chat/ask
  • GET /chat/conversations/:id
  • POST /chat/feedback
  • GET /chat/sources/:messageId

Admin endpoints

  • POST /admin/documents/sync
  • GET /admin/ingestion-runs
  • GET /admin/feedback
  • GET /admin/prompts/:conversationId

Example retrieval code

func (s *ChatService) Ask(ctx context.Context, user User, req AskRequest) (*Answer, error) {
    allowedScopes := s.PermissionService.ResolveScopes(user)

    chunks, err := s.Search.FindRelevantChunks(ctx, SearchQuery{
        TenantID: user.TenantID,
        Scopes:   allowedScopes,
        Query:    req.Question,
        Limit:    6,
    })
    if err != nil {
        return nil, err
    }

    prompt := s.Prompts.BuildSupportPrompt(req.Question, chunks)

    llmResp, err := s.LLM.Generate(ctx, prompt)
    if err != nil {
        return nil, err
    }

    answer := &Answer{
        Text:      llmResp.Text,
        Citations: mapChunksToCitations(chunks),
    }

    if err := s.Conversations.Store(ctx, user.ID, req.ConversationID, req.Question, answer, llmResp.Usage); err != nil {
        return nil, err
    }

    return answer, nil
}