AI Support Copilot - Solution Overview

What this project solves

A SaaS product had scattered support knowledge across docs, FAQs, runbooks, release notes, and resolved tickets. Users and support agents wasted time hunting for answers or escalating issues that should have been solved instantly. The goal was to build an AI support copilot that could answer product questions accurately, cite its sources, respect tenant and role permissions, and slot into an existing product without turning into a hallucinating chatbot.

What had to be true

Only authorised users could query documents they were allowed to see
Answers needed source citations
New docs had to become searchable quickly
The system had to work for both end users and internal support staff
Every AI response needed logging, feedback, and cost visibility
The hot path had to stay fast enough for normal in-app chat usage

Stack

Go
AWS
PostgreSQL with pgvector
Redis
S3
OpenAI or Anthropic
Cognito
ECS Fargate
CloudWatch

Solution in plain English

The system was split into two flows:

Document ingestion
- Documents were uploaded or synced into S3
- Text was extracted, chunked, embedded, and stored in PostgreSQL
- Each chunk carried tenant, product-area, and permission metadata
Question answering
- A user asked a question in the app
- The backend authenticated the user, resolved their permissions, retrieved the most relevant chunks, built a grounded prompt, called the LLM, and returned an answer with citations
- The whole exchange was logged for analytics, debugging, and feedback

High-level architecture

flowchart LR U[User] --> FE[Web App] FE --> API[Go API] API --> AUTH[Cognito] API --> REDIS[Redis] API --> PG[(PostgreSQL + pgvector)] API --> LLM[LLM Provider] DOCS[Docs / PDFs / HTML / KB] --> INGEST[Ingestion Worker] INGEST --> S3[S3] INGEST --> PG INGEST --> REDIS API --> LOG[Conversation / Feedback Log] LOG --> PG ADMIN[Admin UI] --> API

Ingestion flow

flowchart TD A[Document Added] --> B[Store in S3] B --> C[Extract Text] C --> D[Clean and Chunk Text] D --> E[Generate Embeddings] E --> F[Store Chunks + Metadata in PostgreSQL] F --> G[Refresh Search Cache]

Chat request flow

sequenceDiagram participant U as User participant API as Go API participant PG as PostgreSQL participant R as Redis participant LLM as LLM Provider U->>API: POST /chat/ask API->>API: Validate JWT API->>R: Check cached answer / hot context API->>PG: Retrieve relevant chunks filtered by tenant/role PG-->>API: Top matching chunks API->>LLM: Prompt with question + retrieved context LLM-->>API: Answer API->>PG: Store question, answer, citations, token usage API-->>U: Answer + citations

Core components

`api-service`

Handles:

authentication
conversation endpoints
retrieval
prompt building
LLM calls
answer logging
feedback capture

`ingestion-worker`

Handles:

file fetch / sync
text extraction
chunking
embedding creation
metadata enrichment
reindexing

`admin-ui`

Handles:

content sync status
failed ingestions
prompt/answer inspection
user feedback review
cost tracking

Data model

Table	Purpose
`documents`	Source file or page metadata
`document_chunks`	Searchable chunks with embeddings
`document_permissions`	Tenant / role / product scoping
`conversations`	Chat sessions
`messages`	User and assistant messages
`message_citations`	Which chunks supported which answer
`feedback`	Helpful / not helpful ratings
`ingestion_runs`	Sync and indexing runs
`audit_logs`	Admin and system audit trail

Key design decisions

1. Retrieval first, generation second

The model never answered from memory alone. It answered from retrieved chunks.

2. Permissions were applied before retrieval output

Not after. If a chunk was not visible to the user, it never entered the prompt.

3. Citations were mandatory

Every answer had links back to source docs or titled references.

4. Conversation history was trimmed

Only the useful recent messages plus the retrieved context were sent to the model.

5. Hot paths were cached carefully

Redis cached:

recent conversation context
top document fragments for popular queries
tenant-scoped config
prompt templates

Minimal API shape

User endpoints

POST /chat/ask
GET /chat/conversations/:id
POST /chat/feedback
GET /chat/sources/:messageId

Admin endpoints

POST /admin/documents/sync
GET /admin/ingestion-runs
GET /admin/feedback
GET /admin/prompts/:conversationId

Example retrieval code

func (s *ChatService) Ask(ctx context.Context, user User, req AskRequest) (*Answer, error) {
    allowedScopes := s.PermissionService.ResolveScopes(user)

    chunks, err := s.Search.FindRelevantChunks(ctx, SearchQuery{
        TenantID: user.TenantID,
        Scopes:   allowedScopes,
        Query:    req.Question,
        Limit:    6,
    })
    if err != nil {
        return nil, err
    }

    prompt := s.Prompts.BuildSupportPrompt(req.Question, chunks)

    llmResp, err := s.LLM.Generate(ctx, prompt)
    if err != nil {
        return nil, err
    }

    answer := &Answer{
        Text:      llmResp.Text,
        Citations: mapChunksToCitations(chunks),
    }

    if err := s.Conversations.Store(ctx, user.ID, req.ConversationID, req.Question, answer, llmResp.Usage); err != nil {
        return nil, err
    }

    return answer, nil
}