Back to Projects
ArchitectureOctober 2024

Vimeo Video Platform Integration

A low-latency backend integration layer mirroring Vimeo's folder tree into PostgreSQL, serving authorised content with Redis caching, Kafka-driven analytics, and HubSpot chat context.

p95 ≤ 300msNo Vimeo calls on read pathKafka-driven analyticsRole-based entitlements
GoPostgreSQLRedisKafkaAWSAmazon MSK

Vimeo Video Platform - Solution Architecture

Mission

Build a low-latency backend integration layer that turns Vimeo into an in-app video library by synchronising a deeply nested Vimeo folder tree into the platform, serving only authorised content to signed-in users, staying aligned with client-side changes made directly in Vimeo, and supporting ranking, analytics, audit logging, Kafka-driven events, and per-video HubSpot chat context.

Constraints

  • Backend stack: Go
  • Cloud: AWS
  • Primary database: PostgreSQL
  • Optional cache: Redis
  • Video source of truth: Vimeo
  • User authentication: Google / Apple
  • Target backend latency for library/detail endpoints: p95 ≤ 300ms
  • No reliance on front-end caching
  • Vimeo folder tree can change at any time
  • Users can only see videos permitted by role/profile

Core Design

  • Never call Vimeo on the user read path for library browsing.
  • Mirror Vimeo folders/videos into PostgreSQL.
  • Use Redis for hot metadata, ranked feeds, and entitlement-friendly read acceleration.
  • Use Kafka for async processing:
    • Vimeo sync events
    • analytics events
    • audit events
    • ranking feature events
  • Use both:
    • webhook-driven sync for freshness
    • scheduled reconciliation for correctness
  • Enforce authorisation in backend before returning video playback metadata.
  • Let Vimeo deliver the media; backend delivers metadata, structure, entitlements, ranking, and context.

AWS Services

  • Amazon Cognito
    • Google / Apple federated sign-in
  • Amazon ECS Fargate
    • api-service
    • sync-service
    • analytics-consumer
    • audit-consumer
    • admin-service
  • Amazon RDS for PostgreSQL
  • Amazon ElastiCache for Redis
  • Amazon MSK
  • Amazon EventBridge Scheduler
  • AWS Secrets Manager
  • Amazon CloudWatch + OpenTelemetry

High-Level Architecture

flowchart LR U[User App] --> ALB[ALB] ALB --> API[Go API Service] API --> COG[Cognito<br/>Google/Apple Sign-In] API --> REDIS[Redis] API --> PG[(PostgreSQL)] API --> HUB[HubSpot API] API --> KAFKA[MSK / Kafka] VIMEO[Vimeo API + Webhooks] --> SYNC[Go Sync Service] SYNC --> PG SYNC --> REDIS SYNC --> KAFKA SYNC --> VIMEO API --> PLAYBACK[Playback Metadata Endpoint] PLAYBACK --> PG PLAYBACK --> REDIS API --> ANALYTICS[Analytics Ingest] ANALYTICS --> KAFKA KAFKA --> AC[Analytics Consumer] AC --> PG KAFKA --> ADC[Audit Consumer] ADC --> PG KAFKA --> RC[Ranking Pipeline] RC --> PG RC --> REDIS ADMIN[Admin Dashboard] --> API

Services

api-service (Go)

  • Authenticates requests using Cognito JWTs
  • Maps external identity to internal user
  • Serves:
    • library tree
    • folder contents
    • video detail
    • playback metadata
    • ranked video feed
    • HubSpot chat context
    • analytics ingest
    • admin read endpoints

sync-service (Go)

  • Consumes Vimeo webhooks
  • Verifies signatures
  • Writes raw provider events
  • Produces Kafka events
  • Reconciles Vimeo tree on schedule
  • Upserts folders/videos into PostgreSQL
  • Invalidates/rebuilds Redis caches

analytics-consumer (Go)

  • Consumes watch events from Kafka
  • Builds watch sessions / aggregates
  • Writes analytics tables in PostgreSQL

audit-consumer (Go)

  • Consumes audit events from Kafka
  • Writes immutable audit records

ranking-pipeline

  • Inputs:
    • user profile
    • entitlements
    • watch behaviour
    • content metadata
  • Outputs:
    • ranked video IDs per user
  • Writes:
    • user_video_rankings in PostgreSQL
    • hot ranked feeds in Redis

admin-service

  • Sync health
  • Failed events
  • Replay actions
  • Folder/video drift visibility
  • Audit views

Read Path

Rule

  • User requests must never require live traversal of the Vimeo folder tree.
  • All library/folder/video pages are served from local indexed state.

Library Request Flow

sequenceDiagram participant U as User App participant API as Go API participant R as Redis participant P as PostgreSQL U->>API: GET /library/home API->>API: Validate Cognito JWT API->>API: Resolve internal user + role/profile API->>R: Get ranked feed / library snapshot alt cache hit R-->>API: ranked accessible videos else cache miss API->>P: Query ranked accessible videos P-->>API: result set API->>R: write cache end API-->>U: folders/videos payload

Video Detail / Playback Flow

sequenceDiagram participant U as User App participant API as Go API participant R as Redis participant P as PostgreSQL U->>API: GET /videos/:id API->>API: Validate JWT API->>R: Get video detail + entitlement snapshot alt cache miss API->>P: Load video, folder, entitlement, rank, chat context P-->>API: metadata API->>R: write cache end API->>API: Authorise access API-->>U: Vimeo player metadata + HubSpot context + analytics token

Sync Path

Rule

  • Webhooks provide freshness.
  • Reconciliation provides correctness.
  • Neither is trusted alone.

Sync Flow

flowchart TD A[Vimeo Webhook] --> B[Verify Signature] B --> C[Persist Raw Event] C --> D[Publish Kafka Event] D --> E[Sync Consumer] E --> F[Fetch Latest Vimeo Object] F --> G[Upsert Folders / Videos] G --> H[Invalidate Redis] G --> I[Write Audit Event] I --> J[Audit Consumer] J --> K[(PostgreSQL)]

Reconciliation Flow

flowchart TD S[EventBridge Schedule] --> R1[Reconciliation Job] R1 --> R2[Start from root Vimeo folder] R2 --> R3[Traverse children recursively] R3 --> R4[Compare Vimeo state to local state] R4 --> R5[Upsert new/changed folders] R4 --> R6[Mark deleted/moved/unavailable items] R5 --> R7[Invalidate caches] R6 --> R7 R7 --> R8[Emit audit + sync metrics]

Vimeo Traversal Strategy

  • Do not fetch all Vimeo videos globally.
  • Start from the known root folder.
  • Recursively traverse child folders.
  • For each folder:
    • fetch direct child folders
    • fetch direct videos
  • Persist:
    • folder hierarchy
    • Vimeo IDs
    • parent-child relations
    • content metadata
    • visibility state
  • On reconciliation:
    • mark missing objects as deleted/unavailable
    • detect moved folders/videos by parent/path changes

Data Model

Core Tables

Table Purpose
users Internal user profile
user_identities Cognito / Google / Apple identity mappings
roles Role catalogue
user_roles User-to-role mapping
folders Vimeo folder mirror
videos Vimeo video mirror
folder_videos Video placement inside folders
entitlements Role/profile access rules for folders/videos
user_video_rankings Ranked videos per user
watch_events Raw playback-related events
watch_sessions Sessionised viewing facts
provider_events Raw Vimeo webhook/reconciliation events
sync_runs Reconciliation runs and outcomes
audit_logs Immutable operational/security audit trail
hubspot_context Video/user-to-chat context mappings
dead_letters Failed provider/analytics events

Folder Modelling

Use PostgreSQL with:

  • adjacency columns:
    • id
    • parent_id
    • vimeo_folder_id
  • plus path column:
    • path
  • plus indexes for:
    • parent_id
    • path
    • vimeo_folder_id

This gives:

  • fast tree traversal
  • easy subtree queries
  • efficient cache rebuilds

Video Modelling

  • videos
    • id
    • vimeo_video_id
    • title
    • description
    • thumbnail_url
    • duration_seconds
    • status
    • visibility
    • embed_key
    • updated_at
  • folder_videos
    • folder_id
    • video_id
    • sort_order

Entitlements

  • Access can be granted at:
    • folder level
    • video level
    • role level
    • profile segment level
  • Backend resolves effective access before returning content.

Redis Strategy

Cache

Use Redis for:

  • library snapshots:
    • library:root:{role}:{profile_hash}:{version}
  • folder contents:
    • folder:{folder_id}:{role}:{profile_hash}:{version}
  • video detail:
    • video:{video_id}:{user_segment}:{version}
  • ranked feeds:
    • ranked:{user_id}:{version}

Invalidation

  • On folder/video update:
    • bump a content version
    • invalidate affected folder/video keys
  • On entitlement update:
    • bump entitlement version
  • On ranking update:
    • replace only ranked-feed keys

Authorisation Model

  • Identity: Cognito JWT from Google / Apple federation
  • Internal auth:
    • backend maps subject to internal user
    • backend resolves roles and profile flags
  • Access resolution:
    • user can browse only folders/videos allowed by entitlement rules
    • video detail endpoint re-checks access even if listed
  • Playback:
    • backend returns Vimeo playback metadata only after access passes

Analytics

Ingest

  • App emits:
    • video_opened
    • playback_started
    • heartbeat
    • paused
    • seeked
    • completed
    • chat_opened
  • API validates:
    • user
    • video
    • session token
  • API publishes to Kafka immediately

Analytics Flow

flowchart LR APP[User App] --> INGEST[Analytics Ingest API] INGEST --> KAFKA[Kafka Topic: analytics.events] KAFKA --> CONSUMER[Analytics Consumer] CONSUMER --> SESS[Session Builder] SESS --> PG[(PostgreSQL)] CONSUMER --> AUDIT[Audit Topic]

Storage

  • watch_events
    • raw event stream
  • watch_sessions
    • start time
    • end time
    • last position
    • watched seconds
    • completion ratio
  • materialised aggregates:
    • per user
    • per video
    • per folder/category

Audit Logging

Log:

  • logins
  • entitlement changes
  • admin replay actions
  • webhook receipt
  • sync failures
  • content moves/deletions
  • video access decisions if required
  • HubSpot context generation if sensitive

Write audit events through Kafka, then persist immutably in PostgreSQL.

Kafka Topics

Topic Purpose
vimeo.webhooks.raw Raw provider events
vimeo.sync.normalized Normalized sync actions
analytics.events Watch/chat events
audit.events Operational/security audit
ranking.features Ranking input events
ranking.updated New ranking outputs
deadletters.vimeo Failed sync events
deadletters.analytics Failed analytics events

Retry / DLQ / Idempotency

Idempotency

Use:

  • unique key on provider event ID where available
  • otherwise derived key:
    • provider + object_id + event_type + source_timestamp
  • unique insert into provider_events
  • consumers process only once per event key

Retries

  • transient failures:
    • exponential backoff
    • retry topic or delayed requeue
  • permanent failures:
    • move to dead-letter topic/table

DLQ Handling

  • failed events appear in admin dashboard
  • admin can:
    • inspect payload
    • replay event
    • mark ignored
    • attach note

Rate-Limit Handling

Vimeo / HubSpot API client

  • central provider client in Go
  • token-bucket limiter per provider credential
  • jittered exponential backoff on 429
  • circuit-breaker behaviour on repeated upstream failure
  • scheduled reconciliation throttled by budgeted API call rate

HubSpot Chat Context

Goal

  • When a user opens a video, HubSpot chat can be contextualised with:
    • user identity
    • current video
    • folder/category
    • rank / recommendation reason if desired

Flow

sequenceDiagram participant U as User App participant API as Go API participant P as PostgreSQL participant H as HubSpot U->>API: GET /videos/:id/chat-context API->>API: Validate JWT + entitlement API->>P: Load user + video + content context P-->>API: context API->>H: Create/update contextual attributes H-->>API: widget/session context API-->>U: HubSpot chat config

Admin Dashboard

Views

  • sync run history
  • webhook receipt status
  • failed events / dead letters
  • folder tree diff
  • deleted / unavailable videos
  • cache version state
  • analytics lag
  • Kafka consumer lag
  • audit log search

Actions

  • replay failed sync event
  • trigger folder/video reconciliation
  • invalidate cache subtree
  • mark video hidden internally
  • inspect entitlement resolution

API Shape

Public / App APIs

  • GET /library/home
  • GET /folders/:id
  • GET /videos/:id
  • GET /videos/:id/playback
  • GET /videos/:id/chat-context
  • POST /analytics/events

Admin APIs

  • GET /admin/sync-runs
  • GET /admin/provider-events
  • GET /admin/dead-letters
  • POST /admin/replay/:event_id
  • POST /admin/reconcile/root
  • POST /admin/cache/invalidate

Deployment

flowchart TB subgraph AWS ALB[ALB] ECS1[ECS Fargate - API] ECS2[ECS Fargate - Sync] ECS3[ECS Fargate - Consumers] RDS[RDS PostgreSQL] EC[ElastiCache Redis] MSK[MSK Kafka] EVB[EventBridge Scheduler] CW[CloudWatch / OTel] SEC[Secrets Manager] end ALB --> ECS1 ECS1 --> RDS ECS1 --> EC ECS1 --> MSK ECS2 --> RDS ECS2 --> EC ECS2 --> MSK ECS2 --> EVB ECS3 --> RDS ECS3 --> MSK ECS1 --> SEC ECS2 --> SEC ECS3 --> SEC ECS1 --> CW ECS2 --> CW ECS3 --> CW

Why This Meets the Requirements

  • Low latency:
    • user reads hit Redis / PostgreSQL, not Vimeo
  • No front-end caching required:
    • backend pre-indexes and caches everything needed
  • Vimeo stays source of truth:
    • sync + reconciliation keep backend aligned
  • Role/profile restrictions:
    • enforced in backend before listing/playback
  • Ranking:
    • precomputed and stored for fast retrieval
  • Analytics/audit:
    • event-driven through Kafka
  • HubSpot chat:
    • contextual endpoint per video
  • Reliability:
    • idempotency, retries, DLQs, replay tools
  • Operability:
    • admin dashboard + sync visibility

Main Trade-Offs

  • More moving parts than a naive direct-Vimeo read approach
  • Slight staleness risk between Vimeo change and local sync
  • Extra cache invalidation complexity
  • Kafka adds operational weight, but is justified for:
    • analytics
    • audit logs
    • decoupled sync/ranking pipelines

Non-Negotiable Rule

  • Vimeo is never queried on the critical user browse path unless there is an explicit admin/debug fallback.
  • All user-facing library performance depends on local indexed state.

Suggested Build Order

  1. folders / videos mirror in PostgreSQL
  2. Vimeo reconciliation job from root folder
  3. user auth + entitlement model
  4. library APIs
  5. Redis caching
  6. playback metadata endpoint
  7. Kafka analytics ingest
  8. audit pipeline
  9. HubSpot context
  10. ranking pipeline
  11. admin dashboard

If you want, next I can turn this into a cleaner case-study style doc instead of an architecture spec, so it reads better on your portfolio.