Embedders
Configure embedding providers for grepai
Embedders convert text (code chunks) into vector representations that enable semantic search.
Available Embedders
| Provider | Type | Pros | Cons |
|---|---|---|---|
| Ollama | Local | Privacy, free, no internet | Requires local resources |
| LM Studio | Local | Privacy, OpenAI-compatible API, GUI | Requires local resources |
| OpenAI | Cloud | High quality, fast | Costs money, sends code to cloud |
Ollama (Local)
Setup
- Install Ollama:
# macOS
brew install ollama
# Linux
curl -fsSL https://ollama.com/install.sh | sh
- Start the server:
ollama serve
- Pull an embedding model:
ollama pull nomic-embed-text
Configuration
During grepai init, you will be prompted for the endpoint URL (default: http://localhost:11434). This allows connecting to a remote Ollama server, a Docker container, or an instance running on a custom port.
embedder:
provider: ollama
model: nomic-embed-text
endpoint: http://localhost:11434
dimensions: 768
Available Models
| Model | Dimensions | Speed | Quality | Languages |
|---|---|---|---|---|
nomic-embed-text | 768 | Fast | Good | English |
nomic-embed-text-v2-moe | 768 | Fast | Better | ~100 languages |
bge-m3 | 1024 | Medium | Excellent | ~100 languages |
mxbai-embed-large | 1024 | Medium | Better | English |
all-minilm | 384 | Very Fast | Basic | English |
Multilingual Support
The default model nomic-embed-text is optimized for English. For non-English codebases or mixed-language projects (Korean, Chinese, French, etc.), use a multilingual model.
Recommended: nomic-embed-text-v2-moe — same dimensions (768) as the default, making it a drop-in replacement.
ollama pull nomic-embed-text-v2-moe
embedder:
provider: ollama
model: nomic-embed-text-v2-moe
dimensions: 768
Troubleshooting
# Check if Ollama is running
curl http://localhost:11434/api/tags
# Test embedding
curl http://localhost:11434/api/embeddings -d '{
"model": "nomic-embed-text",
"prompt": "Hello world"
}'
LM Studio (Local)
LM Studio provides an OpenAI-compatible API for running embedding models locally with a user-friendly GUI.
Setup
-
Download and install LM Studio
-
Start LM Studio and load an embedding model (e.g.,
nomic-embed-text) -
Enable the local server (default: http://127.0.0.1:1234)
Configuration
During grepai init, you will be prompted for the endpoint URL (default: http://127.0.0.1:1234). This allows connecting to a remote LM Studio instance or a custom port.
embedder:
provider: lmstudio
model: text-embedding-nomic-embed-text-v1.5
endpoint: http://127.0.0.1:1234
Available Models
Any embedding model supported by LM Studio, including:
| Model | Dimensions | Notes |
|---|---|---|
nomic-embed-text-v1.5 | 768 | Good general purpose |
bge-small-en-v1.5 | 384 | Fast, smaller |
bge-large-en-v1.5 | 1024 | Higher quality |
Troubleshooting
# Check if LM Studio server is running
curl http://127.0.0.1:1234/v1/models
# Test embedding
curl http://127.0.0.1:1234/v1/embeddings -d '{
"model": "text-embedding-nomic-embed-text-v1.5",
"input": ["Hello world"]
}'
OpenAI (Cloud)
Setup
-
Get an API key from OpenAI Platform
-
Set the environment variable:
export OPENAI_API_KEY=sk-...
Configuration
embedder:
provider: openai
model: text-embedding-3-small
api_key: ${OPENAI_API_KEY}
dimensions: 1536
Azure OpenAI / Microsoft Foundry
For Azure OpenAI or other OpenAI-compatible providers, use a custom endpoint:
embedder:
provider: openai
model: text-embedding-ada-002
endpoint: https://YOUR-RESOURCE.openai.azure.com/v1
api_key: ${AZURE_OPENAI_API_KEY}
dimensions: 1536
Available Models
| Model | Dimensions | Price (per 1M tokens) |
|---|---|---|
text-embedding-3-small | 1536 | $0.02 |
text-embedding-3-large | 3072 | $0.13 |
Parallelism & Rate Limiting
OpenAI embeddings support parallel batch processing with adaptive rate limiting:
embedder:
provider: openai
model: text-embedding-3-small
api_key: ${OPENAI_API_KEY}
parallelism: 4 # Concurrent API requests (default: 4)
How it works:
- Batches are processed concurrently up to
parallelismlimit - On rate limit (429), parallelism auto-reduces and retries with backoff
- After successful requests, parallelism gradually restores
Recommended Parallelism by Tier
Higher OpenAI tiers allow more concurrent requests. Use the table below as a starting point:
| OpenAI Tier | RPM Limit | Recommended parallelism |
|---|---|---|
| Free | 500 | 2 |
| Tier 1 | 500 | 2–4 |
| Tier 2 | 5,000 | 8–12 |
| Tier 3 | 5,000 | 8–16 |
| Tier 4 | 10,000 | 16–20 |
| Tier 5 | 10,000 | 16–24 |
Checking your tier and rate limits:
# Make a test embedding request and inspect rate limit headers
curl -s -D - https://api.openai.com/v1/embeddings \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{"model": "text-embedding-3-small", "input": ["test"]}' \
-o /dev/null 2>&1 | grep -i x-ratelimit
Look for x-ratelimit-limit-requests (your RPM cap) and x-ratelimit-limit-tokens (your tokens/min cap) to determine your effective tier.
Tip: Start with a conservative value and increase if you see no 429 errors. The adaptive rate limiter will automatically back off if you exceed your limit, but starting lower avoids unnecessary retries during initial indexing.
Cost Estimation
For a typical codebase:
- 10,000 lines of code ≈ 50,000 tokens
- Initial index: ~$0.001 with
text-embedding-3-small - Ongoing updates: negligible
Changing Embedding Models
You can use any embedding model available on your provider. Two parameters matter:
| Parameter | Description |
|---|---|
model | The exact model name as expected by the provider |
dimensions | The vector size produced by the model |
Finding Model Dimensions
Each embedding model produces vectors of a fixed size. Using incorrect dimensions causes errors or poor results.
For Ollama:
curl -s http://localhost:11434/api/embeddings \
-d '{"model": "MODEL_NAME", "prompt": "test"}' | jq '.embedding | length'
For LM Studio:
curl -s http://127.0.0.1:1234/v1/embeddings \
-d '{"model": "MODEL_NAME", "input": ["test"]}' | jq '.data[0].embedding | length'
Re-indexing After Model Change
Important: Embeddings from different models are incompatible. After changing models, you must re-index:
rm -rf .grepai/index.gob .grepai/symbols.gob
grepai watch
Adding a New Embedder
To add a new embedding provider:
- Implement the
Embedderinterface inembedder/:
type Embedder interface {
Embed(ctx context.Context, text string) ([]float32, error)
EmbedBatch(ctx context.Context, texts []string) ([][]float32, error)
Dimensions() int
}
-
Add configuration in
config/config.go -
Wire it up in the CLI commands
See Contributing for more details.