Benchmark: grepai vs grep on Claude Code
A controlled benchmark comparing semantic search with grepai versus traditional grep in Claude Code, showing 27.5% cost savings and 97% input token reduction.
Disclaimer: This benchmark was conducted by the grepai maintainer. While we’ve aimed for methodological rigor (5 runs with consistent results), we encourage users to run their own tests on their codebases.
TL;DR
We ran a controlled benchmark comparing grepai (semantic search) versus traditional grep in Claude Code on the Excalidraw codebase (155,000+ lines of TypeScript code).
Results:
- -27.5% on API billing ($6.78 → $4.92)
- -55% tool calls (139 → 62)
- -97% input tokens (51,147 → 1,326)
- -71% cache creation tokens (563,883 → 162,289)
Methodology
Test Environment
We compared two identical clones of the Excalidraw repository:
- Session 1 (Baseline): Standard Claude Code without grepai
- Session 2 (With grepai): Claude Code + grepai with Ollama embeddings (
nomic-embed-textmodel) on MacBook Pro M3 Pro
The Five Test Questions
We posed identical questions across both sessions in the same order. These questions were designed to reflect real developer exploration patterns—describing what the code does rather than searching for known function names:
- “Locate the exact mathematical function used to determine if a user’s cursor is hovering inside a ‘diamond’ shape.”
- “Explain how the application calculates the intersection point when an arrow is attached to an ellipse.”
- “Find the algorithm responsible for simplifying or smoothing the points of a ‘freedraw’ line after the user releases the mouse.”
- “Identify the code responsible for snapping dragged elements to the grid.”
- “How does the codebase handle sending an element ‘backward’ in the z-order?”
Metric Collection
Data was extracted directly from Claude Code’s JSON logs located at ~/.claude/projects/<project-hash>/. The analysis included both main session logs and subagent logs found in <session-uuid>/subagents/ directories.
Understanding Claude Code’s Token Economics
Before diving into results, it’s important to understand how Claude Code bills API usage. The API differentiates between four token categories with distinct costs (Claude Opus 4.5 pricing):
| Token Type | Cost per Million | Description |
|---|---|---|
input_tokens | $5.00 | Fresh tokens processed for the first time |
cache_read_input_tokens | $0.50 | Previously cached tokens being reused (90% discount) |
cache_creation_input_tokens | $6.25 | New tokens being cached for future reuse (25% premium) |
output_tokens | $25.00 | Tokens generated by Claude |
The prompt caching system stores frequently used context (system prompts, conversation history, previously read files). Each new subagent in Claude Code starts with a fresh context that must be cached, incurring the 1.25× premium on cache creation.
Results
Token Metrics
| Metric | Baseline | grepai | Change |
|---|---|---|---|
| Subagents launched | 5 | 0 | -100% |
| Tool calls | 139 | 62 | -55% |
input_tokens | 51,147 | 1,326 | -97% |
cache_read_input_tokens | 5,973,161 | 7,775,888 | +30% |
cache_creation_input_tokens | 563,883 | 162,289 | -71% |
output_tokens | 476 | 347 | -27% |
Cost Breakdown
| Category | Baseline | grepai | Difference |
|---|---|---|---|
input_tokens cost | $0.26 | $0.01 | -97% |
cache_read cost | $2.99 | $3.89 | +30% |
cache_creation cost | $3.52 | $1.01 | -71% |
output_tokens cost | $0.01 | $0.01 | -27% |
| Total billed cost | $6.78 | $4.92 | -27.5% |
Tool Usage Breakdown
| Tool | Baseline | grepai |
|---|---|---|
| Bash (including grepai) | 41 | 9 |
| Grep | 37 | 20 |
| Glob | 13 | 0 |
| Read | 43 | 30 |
| Task (subagents) | 5 | 0 |
Why the Difference?
The Subagent Problem
Without grepai, Claude Code’s typical workflow looks like this:
Question → Task(subagent_type: Explore) → Multiple iterations:
├─ Grep("pattern") returns 40+ files
├─ Read files sequentially to filter
├─ Launch additional searches
└─ Each subagent = separate context = cache_creation charges
With grepai, the workflow simplifies to:
Question → Bash: grepai search "semantic query" → Targeted results
├─ Directly identifies relevant files
├─ Minimal iteration needed
└─ No subagent spawning = no new cache contexts
The baseline scenario launched 5 subagents, requiring 563,883 cache_creation tokens. The grepai scenario eliminated subagent launches entirely, reducing this to 162,289 tokens—a savings of $2.51 on cache creation alone.
Why Cost Savings Don’t Match Token Reductions
The -97% reduction in fresh input_tokens translated to only -27.5% cost savings because cache_read_input_tokens dominated total token consumption in both scenarios. Cache read operations cost 10× less than fresh input tokens:
- Baseline fresh input tokens: ~$0.26 of $6.78 total (3.8%)
- Majority of cost came from cache operations: ~$6.51 of $6.78
The Glob Elimination
A critical insight: grepai reduced Glob tool calls from 13 to 0. The Glob tool searches for files matching patterns (e.g., **/*.ts) but returns dozens of results requiring sequential reads to filter. Semantic search eliminates this trial-and-error phase by returning only pertinent files immediately.
Limitations and Caveats
We want to be transparent about what this benchmark does and doesn’t show:
What We Measured
- Actual tokens billed by the API
- Financial cost differences
- Number of tool invocations
What We Did NOT Measure
- Answer quality (both approaches found correct solutions)
- Real-world execution time (varies by server load)
- Reproducibility across different codebases
- Performance with smaller repositories
Non-Deterministic Behavior
Claude Code can take different paths for identical questions. While results were reproducible across five repetitions of this experiment, general behavior varies. The baseline using 5 subagents while grepai used none was a significant factor—different runs might show different subagent counts.
Benchmark conducted on the Excalidraw codebase (155,000+ lines of TypeScript).
Original article (in French) by the grepai maintainer: grepai Benchmark