Configuration
Configure grepai for your needs
Config File Location
grepai stores its configuration in .grepai/config.yaml in your project root.
Run grepai init to create a default configuration.
Full Configuration Reference
# Config file version
version: 1
# Embedder configuration
embedder:
# Provider: "ollama" (local), "lmstudio" (local), or "openai" (cloud)
provider: ollama
# Model name (depends on provider)
model: nomic-embed-text
# Endpoint URL (depends on provider, supports Azure OpenAI)
endpoint: http://localhost:11434
# API key (for OpenAI provider, use environment variable)
api_key: ${OPENAI_API_KEY}
# Vector dimensions (depends on model, auto-detected if not set)
dimensions: 768
# Concurrent batch requests for OpenAI (default: 4)
parallelism: 4
# Vector store configuration
store:
# Backend: "gob" (file-based), "postgres" (PostgreSQL with pgvector), or "qdrant"
backend: gob
# PostgreSQL settings (if using postgres backend)
postgres:
dsn: postgres://user:pass@localhost:5432/grepai
# Qdrant settings (if using qdrant backend)
qdrant:
endpoint: localhost
port: 6334
use_tls: false
collection: "" # Optional, defaults to sanitized project path
api_key: "" # Optional, for Qdrant Cloud
# Chunking configuration
chunking:
# Maximum tokens per chunk
size: 512
# Overlap between chunks (for context continuity)
overlap: 50
# File watching configuration
watch:
# Debounce delay in milliseconds
debounce_ms: 500
# Call graph tracing configuration
trace:
# Extraction mode: "fast" (regex) or "precise" (tree-sitter)
mode: fast
# File extensions to index for symbols
enabled_languages:
- .go
- .js
- .ts
- .jsx
- .tsx
- .py
- .php
- .c
- .h
- .cpp
- .hpp
- .cc
- .cxx
- .rs
- .zig
- .cs
- .pas
- .dpr
# Patterns to exclude from symbol indexing
exclude_patterns:
- "*_test.go"
- "*.spec.ts"
# Patterns to ignore (in addition to .gitignore)
ignore:
- ".git"
- ".grepai"
- "node_modules"
- "vendor"
- "target"
- ".zig-cache"
- "zig-out"
# Path to an external gitignore file (e.g., global gitignore)
# Supports ~ expansion for home directory
external_gitignore: "~/.config/git/ignore"
Embedder Options
Ollama (Local - Recommended)
embedder:
provider: ollama
model: nomic-embed-text
endpoint: http://localhost:11434
dimensions: 768
Available models:
nomic-embed-text- Fast, good quality (768 dims, English)nomic-embed-text-v2-moe- Multilingual (~100 languages, 768 dims)bge-m3- Multilingual, excellent quality (1024 dims)mxbai-embed-large- Higher quality, slower (1024 dims)all-minilm- Very fast, lower quality (384 dims)
For multilingual codebases (comments in Korean, French, etc.):
embedder:
provider: ollama
model: nomic-embed-text-v2-moe # Supports ~100 languages
endpoint: http://localhost:11434
dimensions: 768
LM Studio (Local)
embedder:
provider: lmstudio
model: text-embedding-nomic-embed-text-v1.5
endpoint: http://127.0.0.1:1234
Available models (depends on what you load in LM Studio):
nomic-embed-text-v1.5- Good general purpose (768 dims)bge-small-en-v1.5- Fast, smaller (384 dims)bge-large-en-v1.5- Higher quality (1024 dims)
OpenAI (Cloud)
embedder:
provider: openai
model: text-embedding-3-small
api_key: ${OPENAI_API_KEY}
dimensions: 1536
parallelism: 4 # Concurrent requests (auto-adjusts on rate limits)
Available models:
text-embedding-3-small- 1536 dimensions, fast, cost-effectivetext-embedding-3-large- 3072 dimensions, higher quality
Azure OpenAI / Microsoft Foundry
Use a custom endpoint for Azure OpenAI or other OpenAI-compatible providers:
embedder:
provider: openai
model: text-embedding-ada-002
endpoint: https://YOUR-RESOURCE.openai.azure.com/v1
api_key: ${AZURE_OPENAI_API_KEY}
dimensions: 1536
Storage Options
GOB (File-based - Default)
store:
backend: gob
Best for:
- Single-developer projects
- Quick setup
- No external dependencies
The index is stored automatically in .grepai/index.gob.
PostgreSQL with pgvector
store:
backend: postgres
postgres:
dsn: postgres://user:pass@localhost:5432/grepai
Best for:
- Team environments
- Large codebases
- Advanced querying needs
Setup:
CREATE EXTENSION vector;
Qdrant
store:
backend: qdrant
qdrant:
endpoint: localhost
port: 6334
use_tls: false
Best for:
- High-performance vector search
- Docker-based environments
- Teams already using Qdrant
Setup:
docker compose --profile=qdrant up -d
See Vector Stores for detailed configuration options.
Chunking Tuning
chunking:
size: 512 # Tokens per chunk
overlap: 50 # Overlap for context
- Larger chunks: Better context, fewer results, slower
- Smaller chunks: More precise matches, more results, faster
- More overlap: Better continuity, larger index
Automatic Re-chunking
If you configure a chunking.size larger than your embedder’s context limit (e.g., 10000 tokens with a model that only supports 8192), grepai will automatically detect the error and re-chunk the content into smaller pieces.
This happens transparently:
- grepai attempts to embed the chunk
- If the embedder returns a “context length exceeded” error, grepai splits the chunk in half
- The process repeats until chunks fit within the model’s limit (up to 3 attempts)
You’ll see log messages like:
Re-chunking large_file.go chunk 0 (attempt 1/3): context limit exceeded
Split chunk into 4 sub-chunks
Recommended chunk sizes per model:
| Provider | Model | Max Context | Recommended Size |
|---|---|---|---|
| Ollama | nomic-embed-text | ~8192 | 512-2048 |
| OpenAI | text-embedding-3-small | 8191 | 512-4096 |
| LM Studio | nomic-embed-text-v1.5 | ~8192 | 512-2048 |
Search Options
grepai provides two optional search enhancements:
Search Boost (enabled by default)
Adjusts scores based on file paths. Test files are penalized, source directories are boosted.
search:
boost:
enabled: true
penalties:
- pattern: "_test."
factor: 0.5
bonuses:
- pattern: "/src/"
factor: 1.1
See Search Boost for full documentation.
Hybrid Search (disabled by default)
Combines vector similarity with text matching using RRF.
search:
hybrid:
enabled: true
k: 60
See Hybrid Search for full documentation.
External Gitignore
You can specify an external gitignore file (such as your global Git ignore file) to be respected during indexing:
external_gitignore: "~/.config/git/ignore"
This is useful for ignoring files globally configured in Git (e.g., IDE files, OS-specific files).
Common locations for global gitignore:
~/.config/git/ignore(XDG standard)~/.gitignore_global(older convention)
The tilde (~) is automatically expanded to your home directory.
If the file doesn’t exist, grepai will log a warning and continue without it.
Environment Variables
You can use environment variables in config:
openai:
api_key: ${OPENAI_API_KEY}
Or override config via environment:
export GREPAI_EMBEDDER_PROVIDER=openai
export GREPAI_STORE_BACKEND=postgres