Embedders

Configure embedding providers for grepai

Embedders convert text (code chunks) into vector representations that enable semantic search.

Available Embedders

ProviderTypeProsCons
OllamaLocalPrivacy, free, no internetRequires local resources
LM StudioLocalPrivacy, OpenAI-compatible API, GUIRequires local resources
OpenAICloudHigh quality, fastCosts money, sends code to cloud

Ollama (Local)

Setup

  1. Install Ollama:
# macOS
brew install ollama

# Linux
curl -fsSL https://ollama.com/install.sh | sh
  1. Start the server:
ollama serve
  1. Pull an embedding model:
ollama pull nomic-embed-text

Configuration

During grepai init, you will be prompted for the endpoint URL (default: http://localhost:11434). This allows connecting to a remote Ollama server, a Docker container, or an instance running on a custom port.

embedder:
  provider: ollama
  model: nomic-embed-text
  endpoint: http://localhost:11434
  dimensions: 768

Available Models

ModelDimensionsSpeedQualityLanguages
nomic-embed-text768FastGoodEnglish
nomic-embed-text-v2-moe768FastBetter~100 languages
bge-m31024MediumExcellent~100 languages
mxbai-embed-large1024MediumBetterEnglish
all-minilm384Very FastBasicEnglish

Multilingual Support

The default model nomic-embed-text is optimized for English. For non-English codebases or mixed-language projects (Korean, Chinese, French, etc.), use a multilingual model.

Recommended: nomic-embed-text-v2-moe — same dimensions (768) as the default, making it a drop-in replacement.

ollama pull nomic-embed-text-v2-moe
embedder:
  provider: ollama
  model: nomic-embed-text-v2-moe
  dimensions: 768

Troubleshooting

# Check if Ollama is running
curl http://localhost:11434/api/tags

# Test embedding
curl http://localhost:11434/api/embeddings -d '{
  "model": "nomic-embed-text",
  "prompt": "Hello world"
}'

LM Studio (Local)

LM Studio provides an OpenAI-compatible API for running embedding models locally with a user-friendly GUI.

Setup

  1. Download and install LM Studio

  2. Start LM Studio and load an embedding model (e.g., nomic-embed-text)

  3. Enable the local server (default: http://127.0.0.1:1234)

Configuration

During grepai init, you will be prompted for the endpoint URL (default: http://127.0.0.1:1234). This allows connecting to a remote LM Studio instance or a custom port.

embedder:
  provider: lmstudio
  model: text-embedding-nomic-embed-text-v1.5
  endpoint: http://127.0.0.1:1234

Available Models

Any embedding model supported by LM Studio, including:

ModelDimensionsNotes
nomic-embed-text-v1.5768Good general purpose
bge-small-en-v1.5384Fast, smaller
bge-large-en-v1.51024Higher quality

Troubleshooting

# Check if LM Studio server is running
curl http://127.0.0.1:1234/v1/models

# Test embedding
curl http://127.0.0.1:1234/v1/embeddings -d '{
  "model": "text-embedding-nomic-embed-text-v1.5",
  "input": ["Hello world"]
}'

OpenAI (Cloud)

Setup

  1. Get an API key from OpenAI Platform

  2. Set the environment variable:

export OPENAI_API_KEY=sk-...

Configuration

embedder:
  provider: openai
  model: text-embedding-3-small
  api_key: ${OPENAI_API_KEY}
  dimensions: 1536

Azure OpenAI / Microsoft Foundry

For Azure OpenAI or other OpenAI-compatible providers, use a custom endpoint:

embedder:
  provider: openai
  model: text-embedding-ada-002
  endpoint: https://YOUR-RESOURCE.openai.azure.com/v1
  api_key: ${AZURE_OPENAI_API_KEY}
  dimensions: 1536

Available Models

ModelDimensionsPrice (per 1M tokens)
text-embedding-3-small1536$0.02
text-embedding-3-large3072$0.13

Parallelism & Rate Limiting

OpenAI embeddings support parallel batch processing with adaptive rate limiting:

embedder:
  provider: openai
  model: text-embedding-3-small
  api_key: ${OPENAI_API_KEY}
  parallelism: 4  # Concurrent API requests (default: 4)

How it works:

  • Batches are processed concurrently up to parallelism limit
  • On rate limit (429), parallelism auto-reduces and retries with backoff
  • After successful requests, parallelism gradually restores

Higher OpenAI tiers allow more concurrent requests. Use the table below as a starting point:

OpenAI TierRPM LimitRecommended parallelism
Free5002
Tier 15002–4
Tier 25,0008–12
Tier 35,0008–16
Tier 410,00016–20
Tier 510,00016–24

Checking your tier and rate limits:

# Make a test embedding request and inspect rate limit headers
curl -s -D - https://api.openai.com/v1/embeddings \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model": "text-embedding-3-small", "input": ["test"]}' \
  -o /dev/null 2>&1 | grep -i x-ratelimit

Look for x-ratelimit-limit-requests (your RPM cap) and x-ratelimit-limit-tokens (your tokens/min cap) to determine your effective tier.

Tip: Start with a conservative value and increase if you see no 429 errors. The adaptive rate limiter will automatically back off if you exceed your limit, but starting lower avoids unnecessary retries during initial indexing.

Cost Estimation

For a typical codebase:

  • 10,000 lines of code ≈ 50,000 tokens
  • Initial index: ~$0.001 with text-embedding-3-small
  • Ongoing updates: negligible

Changing Embedding Models

You can use any embedding model available on your provider. Two parameters matter:

ParameterDescription
modelThe exact model name as expected by the provider
dimensionsThe vector size produced by the model

Finding Model Dimensions

Each embedding model produces vectors of a fixed size. Using incorrect dimensions causes errors or poor results.

For Ollama:

curl -s http://localhost:11434/api/embeddings \
  -d '{"model": "MODEL_NAME", "prompt": "test"}' | jq '.embedding | length'

For LM Studio:

curl -s http://127.0.0.1:1234/v1/embeddings \
  -d '{"model": "MODEL_NAME", "input": ["test"]}' | jq '.data[0].embedding | length'

Re-indexing After Model Change

Important: Embeddings from different models are incompatible. After changing models, you must re-index:

rm -rf .grepai/index.gob .grepai/symbols.gob
grepai watch

Adding a New Embedder

To add a new embedding provider:

  1. Implement the Embedder interface in embedder/:
type Embedder interface {
    Embed(ctx context.Context, text string) ([]float32, error)
    EmbedBatch(ctx context.Context, texts []string) ([][]float32, error)
    Dimensions() int
}
  1. Add configuration in config/config.go

  2. Wire it up in the CLI commands

See Contributing for more details.