Skip to content

Embedding Providers

The S3 Documentation MCP server supports two embedding providers: Ollama (local) and OpenAI (cloud).

FeatureOllamaOpenAI
CostFree~$0.00002/1K tokens
Privacy100% localData sent to OpenAI
Offline✅ Yes❌ No
AccuracyGoodExcellent
MultilingualGoodExcellent
SetupInstall + model downloadAPI key only
ResourcesLocal CPU/GPUCloud-based

Recommended for: Local development, privacy-conscious deployments, offline usage

  1. Install Ollama from https://ollama.ai

  2. Pull the embedding model:

    Terminal window
    ollama pull nomic-embed-text
  3. Configure in .env:

    Terminal window
    EMBEDDING_PROVIDER=ollama
    OLLAMA_BASE_URL=http://localhost:11434
    OLLAMA_EMBEDDING_MODEL=nomic-embed-text

When using Docker, use host.docker.internal to access Ollama running on the host:

Terminal window
OLLAMA_BASE_URL=http://host.docker.internal:11434

Or in docker-compose.yml:

environment:
- OLLAMA_BASE_URL=http://host.docker.internal:11434
  • Free: No API costs, unlimited usage
  • Private: All data stays on your machine
  • Offline: Works without internet connection
  • Fast: Direct local API calls
  • No Rate Limits: Process as much as you want
  • ⚠️ Requires Ollama installation and model download (~270MB)
  • ⚠️ Uses local CPU/GPU resources
  • ⚠️ Slightly lower accuracy than cloud models

The nomic-embed-text model is:

  • Dimension: 768
  • Size: ~270MB
  • Performance: Excellent for English, good for other languages
  • Speed: Very fast on modern CPUs
  • License: Apache 2.0 (fully open-source)

Recommended for: Production deployments, multilingual content, maximum accuracy

  1. Get an API key from OpenAI Platform

  2. Add credits to your account

  3. Configure in .env:

    Terminal window
    EMBEDDING_PROVIDER=openai
    OPENAI_API_KEY=sk-...your-key...
    OPENAI_EMBEDDING_MODEL=text-embedding-3-small
Terminal window
OPENAI_EMBEDDING_MODEL=text-embedding-3-small
  • Dimensions: 1536
  • Cost: ~$0.00002/1K tokens
  • Performance: High accuracy
  • Best for: Most use cases, cost-sensitive deployments
Terminal window
OPENAI_EMBEDDING_MODEL=text-embedding-3-large
  • Dimensions: 3072
  • Cost: ~$0.00013/1K tokens
  • Performance: Maximum accuracy
  • Best for: Multilingual content, maximum precision
  • High Accuracy: State-of-the-art embeddings
  • Multilingual: Excellent support for 20+ languages
  • No Local Resources: Runs entirely in the cloud
  • Lower Latency: Fast API responses
  • Scalable: No local hardware limits
  • ⚠️ Requires API key and credits
  • ⚠️ Data sent to OpenAI servers
  • ⚠️ Requires internet connection
  • ⚠️ Rate limits apply (though very generous)

Typical documentation indexing costs:

Documentation SizeTokens (approx.)Cost (text-embedding-3-small)
100 pages~250K tokens~$0.005
500 pages~1.25M tokens~$0.025
1000 pages~2.5M tokens~$0.05

Search costs are negligible (~$0.00001 per query).

If you set EMBEDDING_PROVIDER=openai but don’t provide a valid OPENAI_API_KEY, the server will:

  1. ⚠️ Log a warning
  2. 🔄 Automatically fall back to Ollama (if configured)
  3. ❌ Fail to start if neither provider is available

This ensures the server can always start with a working configuration.

Real-world performance on a typical documentation set (500 pages):

ProviderIndexing TimeSearch TimeAccuracy
Ollama (nomic-embed-text)~5 min~50msGood ⭐⭐⭐⭐
OpenAI (text-embedding-3-small)~2 min~100msExcellent ⭐⭐⭐⭐⭐
OpenAI (text-embedding-3-large)~2 min~100msBest ⭐⭐⭐⭐⭐

Times measured on: M1 MacBook Pro (Ollama), standard network connection (OpenAI)

Terminal window
EMBEDDING_PROVIDER=ollama

Fast, free, and private. Perfect for testing and iteration.

Terminal window
EMBEDDING_PROVIDER=openai
OPENAI_EMBEDDING_MODEL=text-embedding-3-small

Great accuracy at minimal cost.

Terminal window
EMBEDDING_PROVIDER=openai
OPENAI_EMBEDDING_MODEL=text-embedding-3-large

Maximum accuracy across all languages.

Terminal window
EMBEDDING_PROVIDER=ollama

Keep all data on-premises.