Hybrid Search
Hybrid search combines vector similarity with text matching using Reciprocal Rank Fusion (RRF). This improves results for queries containing exact identifiers or keywords.
How It Works
Query: "handleAuth authentication" │ ├─► Vector Search ──► [chunk_a, chunk_b, chunk_c] │ rank 0 rank 1 rank 2 │ └─► Text Search ────► [chunk_b, chunk_d, chunk_a] rank 0 rank 1 rank 2 │ ▼ RRF Fusion: score = Σ 1/(k + rank) │ ▼ [chunk_b, chunk_a, chunk_c, chunk_d] (appears in both lists = higher score)- Vector search: Semantic similarity via embeddings (existing behavior)
- Text search: Simple keyword matching in chunk content
- RRF fusion: Combines rankings from both sources
Configuration
Disabled by default. Enable in .grepai/config.yaml:
search: hybrid: enabled: true k: 60 # RRF constant (default: 60)The k parameter
The RRF formula is: score(doc) = Σ 1/(k + rank_i)
- Higher k (e.g., 100): More weight to documents appearing in multiple lists
- Lower k (e.g., 30): More weight to top-ranked documents in each list
- Default (60): Balanced weighting
When to Enable
Enable hybrid search when:
- Queries often include exact function/class names
- You mix natural language with identifiers (e.g., “handleAuth function”)
- Vector-only search misses obvious keyword matches
- You search for specific variable or method names
Examples
| Query | Vector-only | Hybrid |
|---|---|---|
handleUserLogin | May miss if embedding doesn’t capture identifier | Finds exact matches |
authentication flow | Good semantic match | Same quality |
validateEmail function in user module | Partial match | Better: combines semantic + keyword |
Performance Note
Hybrid search loads all chunks into memory for text matching. For very large indexes (100k+ chunks), this may add latency.
Consider keeping it disabled for:
- Very large monorepos
- Purely semantic queries (no identifiers)
- Performance-critical use cases
Technical Details
Text Search
Simple keyword matching:
- Query is tokenized into words (lowercase, min 2 chars)
- Each chunk is scored by:
matches / total_words - Results sorted by score
RRF Fusion
Reciprocal Rank Fusion merges ranked lists:
score(doc) = Σ 1/(k + rank_i) for each sourceBenefits:
- No need to normalize scores between sources
- Robust to outliers
- Simple and effective