Skip to content

Architecture

Learn about the internal architecture and components of the S3 Documentation MCP server.

The system consists of 6 main components working together to provide RAG capabilities:

🌐 MCP Server

HTTP server implementing the MCP protocol. Routes requests to tools and resources.

☁️ S3Loader

Manages S3 communication. Lists, downloads, and tracks file changes via ETags.

🔄 SyncService

Orchestrates synchronization between S3 and vector store. Detects changes.

🧠 VectorStore

Chunks documents, generates embeddings, stores vectors, performs similarity search.

📡 Embedding Providers

Generate vector embeddings via Ollama (local) or OpenAI (cloud).

📚 ResourceService

Manages MCP Resources API for file discovery and direct access.

MCP Client → MCP Server → [S3Loader ↔ S3 Bucket]
SyncService → VectorStore → HNSWLib Index
Embedding Provider
(Ollama / OpenAI)

The main HTTP server that implements the Model Context Protocol.

Responsibilities:

  1. Accept MCP requests over HTTP
  2. Route tool calls to appropriate handlers
  3. Manage Resources API
  4. Handle authentication (optional)
  5. Provide health checks

Technologies:

  • Express.js for HTTP server
  • @modelcontextprotocol/sdk for MCP protocol

Endpoints:

  • POST /mcp - MCP protocol endpoint
  • GET /health - Health check

Manages communication with S3-compatible storage.

List Files

Scans bucket for .md files and retrieves metadata with ETags

Download Content

Fetches file contents in parallel for efficiency

Track Changes

Uses ETags to detect modifications without re-downloading

Handle Errors

Robust retry logic and error handling

Technologies:

  • AWS SDK v3 (@aws-sdk/client-s3)

S3 Sync Flow:

  1. List Files: SyncServiceS3LoaderListObjectsV2 → S3 Bucket
  2. Get Metadata: S3 returns file list with ETags
  3. Compare: S3Loader compares ETags to detect changes
  4. Download: Only changed files are downloaded via GetObject
  5. Return: File contents sent back to SyncService

Orchestrates the synchronization between S3 and the vector store.

Sync Modes:

When: At server startup

Behavior:

  • Full sync if vector store is empty
  • Incremental sync if data exists
  • Auto-detects first run

Best for: Most deployments

Incremental Sync Algorithm:

  1. Load current ETags from vector store
  2. List all files in S3 with their ETags
  3. Compare to detect:
    • New: Files in S3 but not in store
    • Modified: Files with different ETags
    • Deleted: Files in store but not in S3
  4. Process only changed files
  5. Update vector store with changes

Full Sync Algorithm:

  1. Clear entire vector store
  2. Download all files from S3
  3. Process all files from scratch
  4. Rebuild vector store completely

Manages document chunking, embedding generation, and vector similarity search.

Document Processing Pipeline:

Markdown File → Text Splitter → Chunks → Embedding Provider
Vectors (embeddings)
HNSWLib Index
Search Results

Chunking Strategy:

Chunk Size

1000 characters (default)

Configurable via RAG_CHUNK_SIZE

Overlap

200 characters (default)

Prevents context loss at boundaries

Method

Recursive character splitting

Respects markdown structure

Preserves

Markdown formatting

Code blocks, headings, lists intact

Technologies:

  • hnswlib-node for vector indexing
  • langchain for document processing

Index Storage:

./data/hnswlib-store/
├── args.json # Index configuration
├── docstore.json # Document metadata
└── hnswlib.index # Vector index (binary)

What is HNSWLib?

HNSWLib (Hierarchical Navigable Small World) is a fast, in-memory vector search library:

  • Fast: Approximate nearest neighbor in milliseconds
  • 💾 Simple: Stores indices as local files
  • 🪶 Efficient: Low memory footprint
  • 🎯 Accurate: High recall with cosine similarity

Generate vector embeddings for text chunks.

Configuration:

  • Endpoint: http://localhost:11434/api/embeddings
  • Model: nomic-embed-text
  • Dimensions: 768

API Flow:

  1. VectorStore sends text to OllamaProvider
  2. OllamaProvider calls POST /api/embeddings
  3. Ollama returns 768-dimensional vector
  4. Vector stored in HNSWLib index

Advantages:

  • ✅ Free, unlimited usage
  • ✅ 100% private and local
  • ✅ Works offline
  • ✅ Fast local API calls

Manages the MCP Resources API for file discovery and access.

Capabilities:

List Resources

Returns all indexed files with metadata (size, chunks, modified date)

Generate URIs

Creates unique s3doc:// URIs for each file

Provide Descriptions

Human-readable metadata for each resource

Read Files

Retrieves full file contents by URI

Resource Format:

{
"uri": "s3doc://docs/file.md",
"name": "file.md",
"description": "File Name - Size: X KB, Chunks: Y, Modified: Z",
"mimeType": "text/markdown"
}

  1. Client Request: MCP client calls search_documentation with query
  2. Query Processing: MCP Server routes to VectorStore
  3. Generate Embedding: VectorStore sends query to Embedding Provider
  4. Get Vector: Provider returns query embedding
  5. Search Index: VectorStore queries HNSWLib for similar vectors
  6. Retrieve Chunks: HNSWLib returns most similar document chunks
  7. Format Results: VectorStore formats with metadata and scores
  8. Return Response: MCP Server sends results to client

Typical Latency:

  • Embedding generation: 10-50ms
  • Vector search: 10-30ms
  • Total: 50-100ms end-to-end

  1. Trigger: Client calls refresh_index or automatic sync triggers
  2. List Files: SyncService asks S3Loader to list all files
  3. Fetch Metadata: S3Loader calls S3 ListObjectsV2
  4. Return ETags: S3 returns file list with ETags
  5. Detect Changes: SyncService compares ETags to detect new/modified/deleted
  6. Download Changes: S3Loader fetches only changed files via GetObject
  7. Process Files: VectorStore chunks and embeds changed documents
  8. Update Index: HNSWLib index is updated with new vectors
  9. Return Stats: SyncService reports sync statistics to client

Typical Sync Times:

  • Incremental (10 files changed): ~30 seconds
  • Full reindex (500 files): ~5 minutes (Ollama) or ~2 minutes (OpenAI)

FilesTotal SizeOllamaOpenAI
1005 MB~1 min~30 sec
50025 MB~5 min~2 min
100050 MB~10 min~4 min

Measured on M1 MacBook Pro

Embedding Generation

10-50ms

Varies by provider

Vector Search

10-30ms

HNSWLib lookup

Total Latency

50-100ms

End-to-end search time

Component100 files1000 files
Vector index~5 MB~50 MB
Docstore metadata~100 KB~1 MB
Total~5 MB~51 MB

Target Scale

Personal use, small teams

< 5000 files

Storage

File-based (HNSWLib)

Simple and portable

Search

In-memory vectors

Very fast lookups

Concurrency

Multiple searches

Handles concurrent users

If you need enterprise scale:

  1. Vector Database: Replace HNSWLib with Pinecone, Weaviate, or Qdrant
  2. Distributed Storage: Use PostgreSQL or MongoDB for metadata
  3. Caching Layer: Add Redis for frequent queries
  4. Load Balancing: Deploy multiple server instances
  5. Background Workers: Separate sync workers from search API

Node.js

>= 18

JavaScript runtime

TypeScript

Full typing

Type safety throughout

  • @modelcontextprotocol/sdk - MCP protocol implementation
  • express - HTTP server and routing
  • @aws-sdk/client-s3 - S3 client library
  • hnswlib-node - Fast vector indexing and search
  • langchain - Document processing and chunking
  • ollama - Local embedding generation
  • openai - Cloud embedding API
  • vitest - Fast unit testing
  • eslint - Code linting
  • prettier - Code formatting
  • typescript - Type checking

When ENABLE_AUTH=true:

  1. HTTP request arrives at MCP server
  2. Middleware checks for API key in:
    • Authorization: Bearer <key> header, or
    • ?api_key=<key> query parameter
  3. If valid: Request proceeds to handler
  4. If invalid/missing: Returns 401 Unauthorized
  5. Exception: /health always accessible
  • Stored securely in .env file
  • Never logged or exposed
  • Used only for S3 API calls
  • Follows AWS security best practices

INFO

Normal operations

Sync, search, startup

WARN

Non-critical issues

Fallbacks, deprecations

ERROR

Critical failures

S3 errors, sync failures

Terminal window
curl http://localhost:3000/health

Returns:

  • Server status
  • Vector store status
  • Document count
  • Chunk count
  • Last sync time

Currently logged but not exported:

  • Search latency per query
  • Sync duration and file counts
  • Document and chunk counts
  • Error rates and types

Simplicity

File-based storage, no database setup required

Performance

Fast approximate nearest neighbor search

Portability

Easy to backup, version, and migrate

Cost

No cloud database costs

Flexibility: Users choose based on their needs:

  • Ollama: Privacy, offline, free
  • OpenAI: Accuracy, multilingual, cloud

No vendor lock-in. Switch anytime.

  • Universal: De facto standard for object storage
  • Compatible: Works with many providers (AWS, MinIO, Cloudflare R2, etc.)
  • Cost-effective: Cheap storage, pay for what you use
  • Reliable: Built-in durability and availability

Potential improvements (not implemented):

📊 Metrics Export

Prometheus metrics for monitoring

🔍 Query Analytics

Track popular queries and patterns

🔄 Webhook Support

React to S3 events in real-time

🌐 Multi-language

Better non-English support

🎯 Relevance Feedback

Learn from user feedback

📈 Usage Dashboards

Visual analytics and insights