Vimeo Video Platform - Solution Architecture

Mission

Build a low-latency backend integration layer that turns Vimeo into an in-app video library by synchronising a deeply nested Vimeo folder tree into the platform, serving only authorised content to signed-in users, staying aligned with client-side changes made directly in Vimeo, and supporting ranking, analytics, audit logging, Kafka-driven events, and per-video HubSpot chat context.

Constraints

Backend stack: Go
Cloud: AWS
Primary database: PostgreSQL
Optional cache: Redis
Video source of truth: Vimeo
User authentication: Google / Apple
Target backend latency for library/detail endpoints: $p95 \le 300\text{ms}$
No reliance on front-end caching
Vimeo folder tree can change at any time
Users can only see videos permitted by role/profile

Core Design

Never call Vimeo on the user read path for library browsing.
Mirror Vimeo folders/videos into PostgreSQL.
Use Redis for hot metadata, ranked feeds, and entitlement-friendly read acceleration.
Use Kafka for async processing:
- Vimeo sync events
- analytics events
- audit events
- ranking feature events
Use both:
- webhook-driven sync for freshness
- scheduled reconciliation for correctness
Enforce authorisation in backend before returning video playback metadata.
Let Vimeo deliver the media; backend delivers metadata, structure, entitlements, ranking, and context.

AWS Services

Amazon Cognito
- Google / Apple federated sign-in
Amazon ECS Fargate
- api-service
- sync-service
- analytics-consumer
- audit-consumer
- admin-service
Amazon RDS for PostgreSQL
Amazon ElastiCache for Redis
Amazon MSK
Amazon EventBridge Scheduler
AWS Secrets Manager
Amazon CloudWatch + OpenTelemetry

High-Level Architecture

flowchart LR U[User App] --> ALB[ALB] ALB --> API[Go API Service] API --> COG[Cognito<br/>Google/Apple Sign-In] API --> REDIS[Redis] API --> PG[(PostgreSQL)] API --> HUB[HubSpot API] API --> KAFKA[MSK / Kafka] VIMEO[Vimeo API + Webhooks] --> SYNC[Go Sync Service] SYNC --> PG SYNC --> REDIS SYNC --> KAFKA SYNC --> VIMEO API --> PLAYBACK[Playback Metadata Endpoint] PLAYBACK --> PG PLAYBACK --> REDIS API --> ANALYTICS[Analytics Ingest] ANALYTICS --> KAFKA KAFKA --> AC[Analytics Consumer] AC --> PG KAFKA --> ADC[Audit Consumer] ADC --> PG KAFKA --> RC[Ranking Pipeline] RC --> PG RC --> REDIS ADMIN[Admin Dashboard] --> API

Services

`api-service` (`Go`)

Authenticates requests using Cognito JWTs
Maps external identity to internal user
Serves:
- library tree
- folder contents
- video detail
- playback metadata
- ranked video feed
- HubSpot chat context
- analytics ingest
- admin read endpoints

`sync-service` (`Go`)

Consumes Vimeo webhooks
Verifies signatures
Writes raw provider events
Produces Kafka events
Reconciles Vimeo tree on schedule
Upserts folders/videos into PostgreSQL
Invalidates/rebuilds Redis caches

`analytics-consumer` (`Go`)

Consumes watch events from Kafka
Builds watch sessions / aggregates
Writes analytics tables in PostgreSQL

`audit-consumer` (`Go`)

Consumes audit events from Kafka
Writes immutable audit records

`ranking-pipeline`

Inputs:
- user profile
- entitlements
- watch behaviour
- content metadata
Outputs:
- ranked video IDs per user
Writes:
- user_video_rankings in PostgreSQL
- hot ranked feeds in Redis

`admin-service`

Sync health
Failed events
Replay actions
Folder/video drift visibility
Audit views

Read Path

Rule

User requests must never require live traversal of the Vimeo folder tree.
All library/folder/video pages are served from local indexed state.

Library Request Flow

sequenceDiagram participant U as User App participant API as Go API participant R as Redis participant P as PostgreSQL U->>API: GET /library/home API->>API: Validate Cognito JWT API->>API: Resolve internal user + role/profile API->>R: Get ranked feed / library snapshot alt cache hit R-->>API: ranked accessible videos else cache miss API->>P: Query ranked accessible videos P-->>API: result set API->>R: write cache end API-->>U: folders/videos payload

Video Detail / Playback Flow

sequenceDiagram participant U as User App participant API as Go API participant R as Redis participant P as PostgreSQL U->>API: GET /videos/:id API->>API: Validate JWT API->>R: Get video detail + entitlement snapshot alt cache miss API->>P: Load video, folder, entitlement, rank, chat context P-->>API: metadata API->>R: write cache end API->>API: Authorise access API-->>U: Vimeo player metadata + HubSpot context + analytics token

Sync Path

Rule

Webhooks provide freshness.
Reconciliation provides correctness.
Neither is trusted alone.

Sync Flow

flowchart TD A[Vimeo Webhook] --> B[Verify Signature] B --> C[Persist Raw Event] C --> D[Publish Kafka Event] D --> E[Sync Consumer] E --> F[Fetch Latest Vimeo Object] F --> G[Upsert Folders / Videos] G --> H[Invalidate Redis] G --> I[Write Audit Event] I --> J[Audit Consumer] J --> K[(PostgreSQL)]

Reconciliation Flow

flowchart TD S[EventBridge Schedule] --> R1[Reconciliation Job] R1 --> R2[Start from root Vimeo folder] R2 --> R3[Traverse children recursively] R3 --> R4[Compare Vimeo state to local state] R4 --> R5[Upsert new/changed folders] R4 --> R6[Mark deleted/moved/unavailable items] R5 --> R7[Invalidate caches] R6 --> R7 R7 --> R8[Emit audit + sync metrics]

Vimeo Traversal Strategy

Do not fetch all Vimeo videos globally.
Start from the known root folder.
Recursively traverse child folders.
For each folder:
- fetch direct child folders
- fetch direct videos
Persist:
- folder hierarchy
- Vimeo IDs
- parent-child relations
- content metadata
- visibility state
On reconciliation:
- mark missing objects as deleted/unavailable
- detect moved folders/videos by parent/path changes

Data Model

Core Tables

Table	Purpose
`users`	Internal user profile
`user_identities`	Cognito / Google / Apple identity mappings
`roles`	Role catalogue
`user_roles`	User-to-role mapping
`folders`	Vimeo folder mirror
`videos`	Vimeo video mirror
`folder_videos`	Video placement inside folders
`entitlements`	Role/profile access rules for folders/videos
`user_video_rankings`	Ranked videos per user
`watch_events`	Raw playback-related events
`watch_sessions`	Sessionised viewing facts
`provider_events`	Raw Vimeo webhook/reconciliation events
`sync_runs`	Reconciliation runs and outcomes
`audit_logs`	Immutable operational/security audit trail
`hubspot_context`	Video/user-to-chat context mappings
`dead_letters`	Failed provider/analytics events

Folder Modelling

Use PostgreSQL with:

adjacency columns:
- id
- parent_id
- vimeo_folder_id
plus path column:
- path
plus indexes for:
- parent_id
- path
- vimeo_folder_id

This gives:

fast tree traversal
easy subtree queries
efficient cache rebuilds

Video Modelling

videos
- id
- vimeo_video_id
- title
- description
- thumbnail_url
- duration_seconds
- status
- visibility
- embed_key
- updated_at
folder_videos
- folder_id
- video_id
- sort_order

Entitlements

Access can be granted at:
- folder level
- video level
- role level
- profile segment level
Backend resolves effective access before returning content.

Redis Strategy

Cache

Use Redis for:

library snapshots:
- library:root:{role}:{profile_hash}:{version}
folder contents:
- folder:{folder_id}:{role}:{profile_hash}:{version}
video detail:
- video:{video_id}:{user_segment}:{version}
ranked feeds:
- ranked:{user_id}:{version}

Invalidation

On folder/video update:
- bump a content version
- invalidate affected folder/video keys
On entitlement update:
- bump entitlement version
On ranking update:
- replace only ranked-feed keys

Authorisation Model

Identity: Cognito JWT from Google / Apple federation
Internal auth:
- backend maps subject to internal user
- backend resolves roles and profile flags
Access resolution:
- user can browse only folders/videos allowed by entitlement rules
- video detail endpoint re-checks access even if listed
Playback:
- backend returns Vimeo playback metadata only after access passes

Analytics

Ingest

App emits:
- video_opened
- playback_started
- heartbeat
- paused
- seeked
- completed
- chat_opened
API validates:
- user
- video
- session token
API publishes to Kafka immediately

Analytics Flow

flowchart LR APP[User App] --> INGEST[Analytics Ingest API] INGEST --> KAFKA[Kafka Topic: analytics.events] KAFKA --> CONSUMER[Analytics Consumer] CONSUMER --> SESS[Session Builder] SESS --> PG[(PostgreSQL)] CONSUMER --> AUDIT[Audit Topic]

Storage

watch_events
- raw event stream
watch_sessions
- start time
- end time
- last position
- watched seconds
- completion ratio
materialised aggregates:
- per user
- per video
- per folder/category

Audit Logging

Log:

logins
entitlement changes
admin replay actions
webhook receipt
sync failures
content moves/deletions
video access decisions if required
HubSpot context generation if sensitive

Write audit events through Kafka, then persist immutably in PostgreSQL.

Kafka Topics

Topic	Purpose
`vimeo.webhooks.raw`	Raw provider events
`vimeo.sync.normalized`	Normalized sync actions
`analytics.events`	Watch/chat events
`audit.events`	Operational/security audit
`ranking.features`	Ranking input events
`ranking.updated`	New ranking outputs
`deadletters.vimeo`	Failed sync events
`deadletters.analytics`	Failed analytics events

Retry / DLQ / Idempotency

Idempotency

Use:

unique key on provider event ID where available
otherwise derived key:
- provider + object_id + event_type + source_timestamp
unique insert into provider_events
consumers process only once per event key

Retries

transient failures:
- exponential backoff
- retry topic or delayed requeue
permanent failures:
- move to dead-letter topic/table

DLQ Handling

failed events appear in admin dashboard
admin can:
- inspect payload
- replay event
- mark ignored
- attach note

Rate-Limit Handling

Vimeo / HubSpot API client

central provider client in Go
token-bucket limiter per provider credential
jittered exponential backoff on 429
circuit-breaker behaviour on repeated upstream failure
scheduled reconciliation throttled by budgeted API call rate

HubSpot Chat Context

Goal

When a user opens a video, HubSpot chat can be contextualised with:
- user identity
- current video
- folder/category
- rank / recommendation reason if desired

Flow

sequenceDiagram participant U as User App participant API as Go API participant P as PostgreSQL participant H as HubSpot U->>API: GET /videos/:id/chat-context API->>API: Validate JWT + entitlement API->>P: Load user + video + content context P-->>API: context API->>H: Create/update contextual attributes H-->>API: widget/session context API-->>U: HubSpot chat config

Admin Dashboard

Views

sync run history
webhook receipt status
failed events / dead letters
folder tree diff
deleted / unavailable videos
cache version state
analytics lag
Kafka consumer lag
audit log search

Actions

replay failed sync event
trigger folder/video reconciliation
invalidate cache subtree
mark video hidden internally
inspect entitlement resolution

API Shape

Public / App APIs

GET /library/home
GET /folders/:id
GET /videos/:id
GET /videos/:id/playback
GET /videos/:id/chat-context
POST /analytics/events

Admin APIs

GET /admin/sync-runs
GET /admin/provider-events
GET /admin/dead-letters
POST /admin/replay/:event_id
POST /admin/reconcile/root
POST /admin/cache/invalidate

Deployment

flowchart TB subgraph AWS ALB[ALB] ECS1[ECS Fargate - API] ECS2[ECS Fargate - Sync] ECS3[ECS Fargate - Consumers] RDS[RDS PostgreSQL] EC[ElastiCache Redis] MSK[MSK Kafka] EVB[EventBridge Scheduler] CW[CloudWatch / OTel] SEC[Secrets Manager] end ALB --> ECS1 ECS1 --> RDS ECS1 --> EC ECS1 --> MSK ECS2 --> RDS ECS2 --> EC ECS2 --> MSK ECS2 --> EVB ECS3 --> RDS ECS3 --> MSK ECS1 --> SEC ECS2 --> SEC ECS3 --> SEC ECS1 --> CW ECS2 --> CW ECS3 --> CW

Why This Meets the Requirements

Low latency:
- user reads hit Redis / PostgreSQL, not Vimeo
No front-end caching required:
- backend pre-indexes and caches everything needed
Vimeo stays source of truth:
- sync + reconciliation keep backend aligned
Role/profile restrictions:
- enforced in backend before listing/playback
Ranking:
- precomputed and stored for fast retrieval
Analytics/audit:
- event-driven through Kafka
HubSpot chat:
- contextual endpoint per video
Reliability:
- idempotency, retries, DLQs, replay tools
Operability:
- admin dashboard + sync visibility

Main Trade-Offs

More moving parts than a naive direct-Vimeo read approach
Slight staleness risk between Vimeo change and local sync
Extra cache invalidation complexity
Kafka adds operational weight, but is justified for:
- analytics
- audit logs
- decoupled sync/ranking pipelines

Non-Negotiable Rule

Vimeo is never queried on the critical user browse path unless there is an explicit admin/debug fallback.
All user-facing library performance depends on local indexed state.

Suggested Build Order

folders / videos mirror in PostgreSQL
Vimeo reconciliation job from root folder
user auth + entitlement model
library APIs
Redis caching
playback metadata endpoint
Kafka analytics ingest
audit pipeline
HubSpot context
ranking pipeline
admin dashboard

If you want, next I can turn this into a cleaner case-study style doc instead of an architecture spec, so it reads better on your portfolio.

Vimeo Video Platform - Solution Architecture

Mission

Constraints

Core Design

AWS Services

High-Level Architecture

Services

api-service (Go)

sync-service (Go)

analytics-consumer (Go)

audit-consumer (Go)

ranking-pipeline

admin-service

Read Path

Rule

Library Request Flow

Video Detail / Playback Flow

Sync Path

Rule

Sync Flow

Reconciliation Flow

Vimeo Traversal Strategy

Data Model

Core Tables

Folder Modelling

Video Modelling

Entitlements

Redis Strategy

Cache

Invalidation

Authorisation Model

Analytics

Ingest

Analytics Flow

Storage

Audit Logging

Kafka Topics

Retry / DLQ / Idempotency

Idempotency

Retries

DLQ Handling

Rate-Limit Handling

Vimeo / HubSpot API client

HubSpot Chat Context

Goal

Flow

Admin Dashboard

Views

Actions

API Shape

Public / App APIs

Admin APIs

Deployment

Why This Meets the Requirements

Main Trade-Offs

Non-Negotiable Rule

Suggested Build Order

`api-service` (`Go`)

`sync-service` (`Go`)

`analytics-consumer` (`Go`)

`audit-consumer` (`Go`)

`ranking-pipeline`

`admin-service`