Architecture

Learn about the internal architecture and components of the S3 Documentation MCP server.

System Overview

The system consists of 6 main components working together to provide RAG capabilities:

🌐 MCP Server

HTTP server implementing the MCP protocol. Routes requests to tools and resources.

☁️ S3Loader

Manages S3 communication. Lists, downloads, and tracks file changes via ETags.

🔄 SyncService

Orchestrates synchronization between S3 and vector store. Detects changes.

🧠 VectorStore

Chunks documents, generates embeddings, stores vectors, performs similarity search.

📡 Embedding Providers

Generate vector embeddings via Ollama (local) or OpenAI (cloud).

📚 ResourceService

Manages MCP Resources API for file discovery and direct access.

Data Flow

MCP Client → MCP Server → [S3Loader ↔ S3 Bucket]
                ↓
         SyncService → VectorStore → HNSWLib Index
                          ↓
                  Embedding Provider
                  (Ollama / OpenAI)

Core Components

1. MCP Server

The main HTTP server that implements the Model Context Protocol.

Responsibilities:

Accept MCP requests over HTTP
Route tool calls to appropriate handlers
Manage Resources API
Handle authentication (optional)
Provide health checks

Technologies:

Express.js for HTTP server
@modelcontextprotocol/sdk for MCP protocol

Endpoints:

POST /mcp - MCP protocol endpoint
GET /health - Health check

2. S3Loader

Manages communication with S3-compatible storage.

List Files

Scans bucket for .md files and retrieves metadata with ETags

Download Content

Fetches file contents in parallel for efficiency

Track Changes

Uses ETags to detect modifications without re-downloading

Handle Errors

Robust retry logic and error handling

Technologies:

AWS SDK v3 (@aws-sdk/client-s3)

S3 Sync Flow:

List Files: SyncService → S3Loader → ListObjectsV2 → S3 Bucket
Get Metadata: S3 returns file list with ETags
Compare: S3Loader compares ETags to detect changes
Download: Only changed files are downloaded via GetObject
Return: File contents sent back to SyncService

3. SyncService

Orchestrates the synchronization between S3 and the vector store.

Sync Modes:

When: At server startup

Behavior:

Full sync if vector store is empty
Incremental sync if data exists
Auto-detects first run

Best for: Most deployments

When: Only on refresh_index tool call

Behavior:

No automatic syncs
Full control via API

Best for: Development, testing

Incremental Sync Algorithm:

Load current ETags from vector store
List all files in S3 with their ETags
Compare to detect:
- New: Files in S3 but not in store
- Modified: Files with different ETags
- Deleted: Files in store but not in S3
Process only changed files
Update vector store with changes

Full Sync Algorithm:

Clear entire vector store
Download all files from S3
Process all files from scratch
Rebuild vector store completely

4. VectorStore (HNSWLib)

Manages document chunking, embedding generation, and vector similarity search.

Document Processing Pipeline:

Markdown File → Text Splitter → Chunks → Embedding Provider
                                           ↓
                                    Vectors (embeddings)
                                           ↓
                                    HNSWLib Index
                                           ↓
                                    Search Results

Chunking Strategy:

Chunk Size

1000 characters (default)

Configurable via RAG_CHUNK_SIZE

Overlap

200 characters (default)

Prevents context loss at boundaries

Method

Recursive character splitting

Respects markdown structure

Preserves

Markdown formatting

Code blocks, headings, lists intact

Technologies:

hnswlib-node for vector indexing
langchain for document processing

Index Storage:

./data/hnswlib-store/
├── args.json          # Index configuration
├── docstore.json      # Document metadata
└── hnswlib.index      # Vector index (binary)

What is HNSWLib?

HNSWLib (Hierarchical Navigable Small World) is a fast, in-memory vector search library:

⚡ Fast: Approximate nearest neighbor in milliseconds
💾 Simple: Stores indices as local files
🪶 Efficient: Low memory footprint
🎯 Accurate: High recall with cosine similarity

5. Embedding Providers

Generate vector embeddings for text chunks.

Ollama (Local)
OpenAI (Cloud)

Configuration:

Endpoint: http://localhost:11434/api/embeddings
Model: nomic-embed-text
Dimensions: 768

API Flow:

VectorStore sends text to OllamaProvider
OllamaProvider calls POST /api/embeddings
Ollama returns 768-dimensional vector
Vector stored in HNSWLib index

Advantages:

✅ Free, unlimited usage
✅ 100% private and local
✅ Works offline
✅ Fast local API calls

Configuration:

Endpoint: https://api.openai.com/v1/embeddings
Models: text-embedding-3-small (1536) or text-embedding-3-large (3072)

API Flow:

VectorStore sends text to OpenAIProvider
OpenAIProvider calls POST /v1/embeddings
OpenAI returns 1536/3072-dimensional vector
Vector stored in HNSWLib index

Advantages:

✅ State-of-the-art accuracy
✅ Excellent multilingual support
✅ No local resources needed
✅ Fast API responses

Cost:

text-embedding-3-small: ~$0.00002/1K tokens
text-embedding-3-large: ~$0.00013/1K tokens

6. ResourceService

Manages the MCP Resources API for file discovery and access.

Capabilities:

List Resources

Returns all indexed files with metadata (size, chunks, modified date)

Generate URIs

Creates unique s3doc:// URIs for each file

Provide Descriptions

Human-readable metadata for each resource

Read Files

Retrieves full file contents by URI

Resource Format:

{
  "uri": "s3doc://docs/file.md",
  "name": "file.md",
  "description": "File Name - Size: X KB, Chunks: Y, Modified: Z",
  "mimeType": "text/markdown"
}

Request Flows

Search Request Flow

Client Request: MCP client calls search_documentation with query
Query Processing: MCP Server routes to VectorStore
Generate Embedding: VectorStore sends query to Embedding Provider
Get Vector: Provider returns query embedding
Search Index: VectorStore queries HNSWLib for similar vectors
Retrieve Chunks: HNSWLib returns most similar document chunks
Format Results: VectorStore formats with metadata and scores
Return Response: MCP Server sends results to client

Typical Latency:

Embedding generation: 10-50ms
Vector search: 10-30ms
Total: 50-100ms end-to-end

Sync Request Flow

Trigger: Client calls refresh_index or automatic sync triggers
List Files: SyncService asks S3Loader to list all files
Fetch Metadata: S3Loader calls S3 ListObjectsV2
Return ETags: S3 returns file list with ETags
Detect Changes: SyncService compares ETags to detect new/modified/deleted
Download Changes: S3Loader fetches only changed files via GetObject
Process Files: VectorStore chunks and embeds changed documents
Update Index: HNSWLib index is updated with new vectors
Return Stats: SyncService reports sync statistics to client

Typical Sync Times:

Incremental (10 files changed): ~30 seconds
Full reindex (500 files): ~5 minutes (Ollama) or ~2 minutes (OpenAI)

Performance Characteristics

Indexing Performance

Files	Total Size	Ollama	OpenAI
100	5 MB	~1 min	~30 sec
500	25 MB	~5 min	~2 min
1000	50 MB	~10 min	~4 min

Measured on M1 MacBook Pro

Search Performance

Embedding Generation

10-50ms

Varies by provider

Vector Search

10-30ms

HNSWLib lookup

Total Latency

50-100ms

End-to-end search time

Storage Requirements

Component	100 files	1000 files
Vector index	~5 MB	~50 MB
Docstore metadata	~100 KB	~1 MB
Total	~5 MB	~51 MB

Scalability

Current Design

Target Scale

Personal use, small teams

< 5000 files

Storage

File-based (HNSWLib)

Simple and portable

In-memory vectors

Very fast lookups

Concurrency

Multiple searches

Handles concurrent users

Limitations

For Large Scale (> 10,000 files)

If you need enterprise scale:

Vector Database: Replace HNSWLib with Pinecone, Weaviate, or Qdrant
Distributed Storage: Use PostgreSQL or MongoDB for metadata
Caching Layer: Add Redis for frequent queries
Load Balancing: Deploy multiple server instances
Background Workers: Separate sync workers from search API

Technology Stack

Runtime & Language

Node.js

>= 18

JavaScript runtime

TypeScript

Full typing

Type safety throughout

Core Libraries

@modelcontextprotocol/sdk - MCP protocol implementation
express - HTTP server and routing
@aws-sdk/client-s3 - S3 client library

Vector Search

hnswlib-node - Fast vector indexing and search
langchain - Document processing and chunking

Embedding Providers

ollama - Local embedding generation
openai - Cloud embedding API

Development Tools

vitest - Fast unit testing
eslint - Code linting
prettier - Code formatting
typescript - Type checking

Security Architecture

Authentication Layer

When ENABLE_AUTH=true:

HTTP request arrives at MCP server
Middleware checks for API key in:
- Authorization: Bearer <key> header, or
- ?api_key=<key> query parameter
If valid: Request proceeds to handler
If invalid/missing: Returns 401 Unauthorized
Exception: /health always accessible

S3 Credentials

Stored securely in .env file
Never logged or exposed
Used only for S3 API calls
Follows AWS security best practices

Monitoring & Observability

Structured Logging

INFO

Normal operations

Sync, search, startup

WARN

Non-critical issues

Fallbacks, deprecations

ERROR

Critical failures

S3 errors, sync failures

Health Check Endpoint

curl http://localhost:3000/health

Returns:

Server status
Vector store status
Document count
Chunk count
Last sync time

Metrics (Logged)

Currently logged but not exported:

Search latency per query
Sync duration and file counts
Document and chunk counts
Error rates and types

Architecture Decisions

Why HNSWLib?

Simplicity

File-based storage, no database setup required

Performance

Fast approximate nearest neighbor search

Portability

Easy to backup, version, and migrate

Cost

No cloud database costs

Why Two Embedding Providers?

Flexibility: Users choose based on their needs:

Ollama: Privacy, offline, free
OpenAI: Accuracy, multilingual, cloud

No vendor lock-in. Switch anytime.

Why S3?

Universal: De facto standard for object storage
Compatible: Works with many providers (AWS, MinIO, Cloudflare R2, etc.)
Cost-effective: Cheap storage, pay for what you use
Reliable: Built-in durability and availability

Future Enhancements

Potential improvements (not implemented):

📊 Metrics Export

Prometheus metrics for monitoring

🔍 Query Analytics

Track popular queries and patterns

🔄 Webhook Support

React to S3 events in real-time

🌐 Multi-language

Better non-English support

🎯 Relevance Feedback

Learn from user feedback

📈 Usage Dashboards

Visual analytics and insights

Learn More

API Reference

Complete HTTP and MCP API documentation

View API Docs →

MCP Tools

Learn about the 3 MCP tools available

Explore Tools →

Configuration

Environment variables and settings

Configure Server →