Embedding Providers
The S3 Documentation MCP server supports two embedding providers: Ollama (local) and OpenAI (cloud).
Comparison
Section titled “Comparison”| Feature | Ollama | OpenAI |
|---|---|---|
| Cost | Free | ~$0.00002/1K tokens |
| Privacy | 100% local | Data sent to OpenAI |
| Offline | ✅ Yes | ❌ No |
| Accuracy | Good | Excellent |
| Multilingual | Good | Excellent |
| Setup | Install + model download | API key only |
| Resources | Local CPU/GPU | Cloud-based |
Ollama (Local)
Section titled “Ollama (Local)”Recommended for: Local development, privacy-conscious deployments, offline usage
-
Install Ollama from https://ollama.ai
-
Pull the embedding model:
Terminal window ollama pull nomic-embed-text -
Configure in
.env:Terminal window EMBEDDING_PROVIDER=ollamaOLLAMA_BASE_URL=http://localhost:11434OLLAMA_EMBEDDING_MODEL=nomic-embed-text
Docker Configuration
Section titled “Docker Configuration”When using Docker, use host.docker.internal to access Ollama running on the host:
OLLAMA_BASE_URL=http://host.docker.internal:11434Or in docker-compose.yml:
environment: - OLLAMA_BASE_URL=http://host.docker.internal:11434- ✅ Free: No API costs, unlimited usage
- ✅ Private: All data stays on your machine
- ✅ Offline: Works without internet connection
- ✅ Fast: Direct local API calls
- ✅ No Rate Limits: Process as much as you want
- ⚠️ Requires Ollama installation and model download (~270MB)
- ⚠️ Uses local CPU/GPU resources
- ⚠️ Slightly lower accuracy than cloud models
About nomic-embed-text
Section titled “About nomic-embed-text”The nomic-embed-text model is:
- Dimension: 768
- Size: ~270MB
- Performance: Excellent for English, good for other languages
- Speed: Very fast on modern CPUs
- License: Apache 2.0 (fully open-source)
OpenAI (Cloud)
Section titled “OpenAI (Cloud)”Recommended for: Production deployments, multilingual content, maximum accuracy
-
Get an API key from OpenAI Platform
-
Add credits to your account
-
Configure in
.env:Terminal window EMBEDDING_PROVIDER=openaiOPENAI_API_KEY=sk-...your-key...OPENAI_EMBEDDING_MODEL=text-embedding-3-small
Model Options
Section titled “Model Options”text-embedding-3-small (Recommended)
Section titled “text-embedding-3-small (Recommended)”OPENAI_EMBEDDING_MODEL=text-embedding-3-small- Dimensions: 1536
- Cost: ~$0.00002/1K tokens
- Performance: High accuracy
- Best for: Most use cases, cost-sensitive deployments
text-embedding-3-large
Section titled “text-embedding-3-large”OPENAI_EMBEDDING_MODEL=text-embedding-3-large- Dimensions: 3072
- Cost: ~$0.00013/1K tokens
- Performance: Maximum accuracy
- Best for: Multilingual content, maximum precision
- ✅ High Accuracy: State-of-the-art embeddings
- ✅ Multilingual: Excellent support for 20+ languages
- ✅ No Local Resources: Runs entirely in the cloud
- ✅ Lower Latency: Fast API responses
- ✅ Scalable: No local hardware limits
- ⚠️ Requires API key and credits
- ⚠️ Data sent to OpenAI servers
- ⚠️ Requires internet connection
- ⚠️ Rate limits apply (though very generous)
Cost Estimation
Section titled “Cost Estimation”Typical documentation indexing costs:
| Documentation Size | Tokens (approx.) | Cost (text-embedding-3-small) |
|---|---|---|
| 100 pages | ~250K tokens | ~$0.005 |
| 500 pages | ~1.25M tokens | ~$0.025 |
| 1000 pages | ~2.5M tokens | ~$0.05 |
Search costs are negligible (~$0.00001 per query).
Fallback Behavior
Section titled “Fallback Behavior”If you set EMBEDDING_PROVIDER=openai but don’t provide a valid OPENAI_API_KEY, the server will:
- ⚠️ Log a warning
- 🔄 Automatically fall back to Ollama (if configured)
- ❌ Fail to start if neither provider is available
This ensures the server can always start with a working configuration.
Switching Providers
Section titled “Switching Providers”Performance Comparison
Section titled “Performance Comparison”Real-world performance on a typical documentation set (500 pages):
| Provider | Indexing Time | Search Time | Accuracy |
|---|---|---|---|
| Ollama (nomic-embed-text) | ~5 min | ~50ms | Good ⭐⭐⭐⭐ |
| OpenAI (text-embedding-3-small) | ~2 min | ~100ms | Excellent ⭐⭐⭐⭐⭐ |
| OpenAI (text-embedding-3-large) | ~2 min | ~100ms | Best ⭐⭐⭐⭐⭐ |
Times measured on: M1 MacBook Pro (Ollama), standard network connection (OpenAI)
Recommendations
Section titled “Recommendations”For Local Development
Section titled “For Local Development”EMBEDDING_PROVIDER=ollamaFast, free, and private. Perfect for testing and iteration.
For Production (English-only)
Section titled “For Production (English-only)”EMBEDDING_PROVIDER=openaiOPENAI_EMBEDDING_MODEL=text-embedding-3-smallGreat accuracy at minimal cost.
For Production (Multilingual)
Section titled “For Production (Multilingual)”EMBEDDING_PROVIDER=openaiOPENAI_EMBEDDING_MODEL=text-embedding-3-largeMaximum accuracy across all languages.
For Privacy-Critical Deployments
Section titled “For Privacy-Critical Deployments”EMBEDDING_PROVIDER=ollamaKeep all data on-premises.