Configuration

Configure grepai for your needs

Config File Location

grepai stores its configuration in .grepai/config.yaml in your project root.

Run grepai init to create a default configuration.

Full Configuration Reference

# Config file version
version: 1

# Embedder configuration
embedder:
  # Provider: "ollama" (local), "lmstudio" (local), or "openai" (cloud)
  provider: ollama
  # Model name (depends on provider)
  model: nomic-embed-text
  # Endpoint URL (depends on provider, supports Azure OpenAI)
  endpoint: http://localhost:11434
  # API key (for OpenAI provider, use environment variable)
  api_key: ${OPENAI_API_KEY}
  # Vector dimensions (depends on model, auto-detected if not set)
  dimensions: 768
  # Concurrent batch requests for OpenAI (default: 4)
  parallelism: 4

# Vector store configuration
store:
  # Backend: "gob" (file-based), "postgres" (PostgreSQL with pgvector), or "qdrant"
  backend: gob

  # PostgreSQL settings (if using postgres backend)
  postgres:
    dsn: postgres://user:pass@localhost:5432/grepai

  # Qdrant settings (if using qdrant backend)
  qdrant:
    endpoint: localhost
    port: 6334
    use_tls: false
    collection: ""  # Optional, defaults to sanitized project path
    api_key: ""     # Optional, for Qdrant Cloud

# Chunking configuration
chunking:
  # Maximum tokens per chunk
  size: 512
  # Overlap between chunks (for context continuity)
  overlap: 50

# File watching configuration
watch:
  # Debounce delay in milliseconds
  debounce_ms: 500

# Call graph tracing configuration
trace:
  # Extraction mode: "fast" (regex) or "precise" (tree-sitter)
  mode: fast
  # File extensions to index for symbols
  enabled_languages:
    - .go
    - .js
    - .ts
    - .jsx
    - .tsx
    - .py
    - .php
    - .c
    - .h
    - .cpp
    - .hpp
    - .cc
    - .cxx
    - .rs
    - .zig
    - .cs
    - .pas
    - .dpr
  # Patterns to exclude from symbol indexing
  exclude_patterns:
    - "*_test.go"
    - "*.spec.ts"

# Patterns to ignore (in addition to .gitignore)
ignore:
  - ".git"
  - ".grepai"
  - "node_modules"
  - "vendor"
  - "target"
  - ".zig-cache"
  - "zig-out"

# Path to an external gitignore file (e.g., global gitignore)
# Supports ~ expansion for home directory
external_gitignore: "~/.config/git/ignore"

Embedder Options

embedder:
  provider: ollama
  model: nomic-embed-text
  endpoint: http://localhost:11434
  dimensions: 768

Available models:

  • nomic-embed-text - Fast, good quality (768 dims, English)
  • nomic-embed-text-v2-moe - Multilingual (~100 languages, 768 dims)
  • bge-m3 - Multilingual, excellent quality (1024 dims)
  • mxbai-embed-large - Higher quality, slower (1024 dims)
  • all-minilm - Very fast, lower quality (384 dims)

For multilingual codebases (comments in Korean, French, etc.):

embedder:
  provider: ollama
  model: nomic-embed-text-v2-moe  # Supports ~100 languages
  endpoint: http://localhost:11434
  dimensions: 768

LM Studio (Local)

embedder:
  provider: lmstudio
  model: text-embedding-nomic-embed-text-v1.5
  endpoint: http://127.0.0.1:1234

Available models (depends on what you load in LM Studio):

  • nomic-embed-text-v1.5 - Good general purpose (768 dims)
  • bge-small-en-v1.5 - Fast, smaller (384 dims)
  • bge-large-en-v1.5 - Higher quality (1024 dims)

OpenAI (Cloud)

embedder:
  provider: openai
  model: text-embedding-3-small
  api_key: ${OPENAI_API_KEY}
  dimensions: 1536
  parallelism: 4  # Concurrent requests (auto-adjusts on rate limits)

Available models:

  • text-embedding-3-small - 1536 dimensions, fast, cost-effective
  • text-embedding-3-large - 3072 dimensions, higher quality

Azure OpenAI / Microsoft Foundry

Use a custom endpoint for Azure OpenAI or other OpenAI-compatible providers:

embedder:
  provider: openai
  model: text-embedding-ada-002
  endpoint: https://YOUR-RESOURCE.openai.azure.com/v1
  api_key: ${AZURE_OPENAI_API_KEY}
  dimensions: 1536

Storage Options

GOB (File-based - Default)

store:
  backend: gob

Best for:

  • Single-developer projects
  • Quick setup
  • No external dependencies

The index is stored automatically in .grepai/index.gob.

PostgreSQL with pgvector

store:
  backend: postgres
  postgres:
    dsn: postgres://user:pass@localhost:5432/grepai

Best for:

  • Team environments
  • Large codebases
  • Advanced querying needs

Setup:

CREATE EXTENSION vector;

Qdrant

store:
  backend: qdrant
  qdrant:
    endpoint: localhost
    port: 6334
    use_tls: false

Best for:

  • High-performance vector search
  • Docker-based environments
  • Teams already using Qdrant

Setup:

docker compose --profile=qdrant up -d

See Vector Stores for detailed configuration options.

Chunking Tuning

chunking:
  size: 512    # Tokens per chunk
  overlap: 50  # Overlap for context
  • Larger chunks: Better context, fewer results, slower
  • Smaller chunks: More precise matches, more results, faster
  • More overlap: Better continuity, larger index

Automatic Re-chunking

If you configure a chunking.size larger than your embedder’s context limit (e.g., 10000 tokens with a model that only supports 8192), grepai will automatically detect the error and re-chunk the content into smaller pieces.

This happens transparently:

  1. grepai attempts to embed the chunk
  2. If the embedder returns a “context length exceeded” error, grepai splits the chunk in half
  3. The process repeats until chunks fit within the model’s limit (up to 3 attempts)

You’ll see log messages like:

Re-chunking large_file.go chunk 0 (attempt 1/3): context limit exceeded
Split chunk into 4 sub-chunks

Recommended chunk sizes per model:

ProviderModelMax ContextRecommended Size
Ollamanomic-embed-text~8192512-2048
OpenAItext-embedding-3-small8191512-4096
LM Studionomic-embed-text-v1.5~8192512-2048

Search Options

grepai provides two optional search enhancements:

Search Boost (enabled by default)

Adjusts scores based on file paths. Test files are penalized, source directories are boosted.

search:
  boost:
    enabled: true
    penalties:
      - pattern: "_test."
        factor: 0.5
    bonuses:
      - pattern: "/src/"
        factor: 1.1

See Search Boost for full documentation.

Hybrid Search (disabled by default)

Combines vector similarity with text matching using RRF.

search:
  hybrid:
    enabled: true
    k: 60

See Hybrid Search for full documentation.

External Gitignore

You can specify an external gitignore file (such as your global Git ignore file) to be respected during indexing:

external_gitignore: "~/.config/git/ignore"

This is useful for ignoring files globally configured in Git (e.g., IDE files, OS-specific files).

Common locations for global gitignore:

  • ~/.config/git/ignore (XDG standard)
  • ~/.gitignore_global (older convention)

The tilde (~) is automatically expanded to your home directory.

If the file doesn’t exist, grepai will log a warning and continue without it.

Environment Variables

You can use environment variables in config:

openai:
  api_key: ${OPENAI_API_KEY}

Or override config via environment:

export GREPAI_EMBEDDER_PROVIDER=openai
export GREPAI_STORE_BACKEND=postgres