Semantic Search

Master natural language code search with grepai

Semantic Search

grepai search enables natural language searches across your codebase using vector embeddings. Instead of exact text matching, it understands the meaning of your queries.

Features

Natural language queries: Search by describing what the code does
Vector embeddings: Uses AI models to understand semantic meaning
Relevance scoring: Results ranked by similarity (0.0-1.0)
Structural boosting: Prioritizes source over tests automatically
JSON output: Perfect for AI agents and automation

Quick Start

# Ensure watch is running to index your code
grepai watch

# Search for authentication code
grepai search "user authentication flow"

# Limit results
grepai search "error handling" --limit 5

# Filter by path prefix
grepai search "authentication" --path src/handlers/
grepai search "validation" --path src/middleware/ --limit 10

# JSON output for AI agents (--compact saves ~80% tokens)
grepai search "database queries" --json --compact

How It Works

Query embedding: Your search query is converted to a vector using the configured embedder (Ollama, OpenAI, or LM Studio)
Similarity search: Cosine similarity is calculated against indexed code chunks
Boost adjustment: Scores are adjusted based on file paths (tests penalized, source boosted)
Result ranking: Results are sorted by final relevance score

Writing Effective Queries

Good Query	Bad Query	Why
”user login validation"	"login”	More context improves matches
”how errors are handled in API"	"error”	Describes intent, not keyword
”where users are saved to database"	"save user”	Natural language works best
”JWT token authentication"	"token”	Specific terminology

Query Tips

Use English for best results (embedding models are trained on English)
Describe intent: “handles user login” not “func Login”
Be specific: “JWT token validation” better than “token”
Think semantically: Describe what the code does, not how it’s named

Understanding Results

Score: 0.89 | middleware/auth.go:15-45
----------------------------------------
func AuthMiddleware() gin.HandlerFunc {
    return func(c *gin.Context) {
        token := c.GetHeader("Authorization")
        // ... validate token
    }
}

Score: Relevance (0.0-1.0), higher is better
File:lines: Location of the matching chunk
Content: Code snippet with context

Structured Output

For AI agents and scripts, use --json or --toon flags:

# JSON output
grepai search "authentication" --json           # Full JSON output
grepai search "authentication" --json --compact # Minimal JSON (no content field)

# TOON output (~50% fewer tokens than JSON)
grepai search "authentication" --toon           # Full TOON output
grepai search "authentication" --toon --compact # Minimal TOON (no content field)

JSON Format

[
  {
    "file_path": "middleware/auth.go",
    "start_line": 15,
    "end_line": 45,
    "score": 0.89,
    "content": "func AuthMiddleware() gin.HandlerFunc { ... }"
  }
]

TOON Format

TOON (Token-Oriented Object Notation) is a more compact format designed for AI agents:

[{file_path:middleware/auth.go,start_line:15,end_line:45,score:0.89,content:func AuthMiddleware() gin.HandlerFunc { ... }}]

When to use TOON:

AI agents with token constraints
High-volume search operations
When bandwidth or cost is a concern

When to use JSON:

Human-readable output needed
Integration with existing JSON tooling
Debugging and inspection

Search Enhancements

grepai provides two optional search improvements:

Structural Boosting (enabled by default)

Automatically adjusts scores based on file paths:

Penalized: Tests, mocks, fixtures, generated files, docs
Boosted: Source directories (/src/, /lib/, /app/)

See Search Boost for configuration.

Hybrid Search (disabled by default)

Combines vector similarity with text matching using Reciprocal Rank Fusion (RRF). Useful when queries contain exact identifiers.

See Hybrid Search for configuration.

Troubleshooting

Problem	Solution
No results	Ensure `grepai watch` is running and index is built
Poor relevance	Try more descriptive queries, check embedder model
Missing files	Check `.grepai/config.yaml` ignore patterns
Slow search	Consider PostgreSQL backend for large codebases

Use Cases

Finding Implementation

# Where is authentication handled?
grepai search "user authentication logic"

# How are errors processed?
grepai search "error handling middleware"

Understanding Codebase

# Find database interactions
grepai search "database connection and queries"

# Locate API endpoints
grepai search "REST API route handlers"

AI Agent Integration

Provide code context to AI agents:

# JSON output (~80% fewer tokens with --compact)
grepai search "payment processing" --json --compact --limit 5

# TOON output (even more compact, ~50% fewer tokens than JSON)
grepai search "payment processing" --toon --compact --limit 5

Commands Reference

grepai search - Full CLI reference