grepai
Back to blog
benchmark claude-code performance

Benchmark: grepai vs grep on Claude Code

A controlled benchmark comparing semantic search with grepai versus traditional grep in Claude Code, showing 27.5% cost savings and 97% input token reduction.

Yoan Bernabeu January 21, 2026

Disclaimer: This benchmark was conducted by the grepai maintainer. While we’ve aimed for methodological rigor (5 runs with consistent results), we encourage users to run their own tests on their codebases.

TL;DR

We ran a controlled benchmark comparing grepai (semantic search) versus traditional grep in Claude Code on the Excalidraw codebase (155,000+ lines of TypeScript code).

Results:

  • -27.5% on API billing ($6.78 → $4.92)
  • -55% tool calls (139 → 62)
  • -97% input tokens (51,147 → 1,326)
  • -71% cache creation tokens (563,883 → 162,289)

Methodology

Test Environment

We compared two identical clones of the Excalidraw repository:

  • Session 1 (Baseline): Standard Claude Code without grepai
  • Session 2 (With grepai): Claude Code + grepai with Ollama embeddings (nomic-embed-text model) on MacBook Pro M3 Pro

The Five Test Questions

We posed identical questions across both sessions in the same order. These questions were designed to reflect real developer exploration patterns—describing what the code does rather than searching for known function names:

  1. “Locate the exact mathematical function used to determine if a user’s cursor is hovering inside a ‘diamond’ shape.”
  2. “Explain how the application calculates the intersection point when an arrow is attached to an ellipse.”
  3. “Find the algorithm responsible for simplifying or smoothing the points of a ‘freedraw’ line after the user releases the mouse.”
  4. “Identify the code responsible for snapping dragged elements to the grid.”
  5. “How does the codebase handle sending an element ‘backward’ in the z-order?”

Metric Collection

Data was extracted directly from Claude Code’s JSON logs located at ~/.claude/projects/<project-hash>/. The analysis included both main session logs and subagent logs found in <session-uuid>/subagents/ directories.


Understanding Claude Code’s Token Economics

Before diving into results, it’s important to understand how Claude Code bills API usage. The API differentiates between four token categories with distinct costs (Claude Opus 4.5 pricing):

Token TypeCost per MillionDescription
input_tokens$5.00Fresh tokens processed for the first time
cache_read_input_tokens$0.50Previously cached tokens being reused (90% discount)
cache_creation_input_tokens$6.25New tokens being cached for future reuse (25% premium)
output_tokens$25.00Tokens generated by Claude

The prompt caching system stores frequently used context (system prompts, conversation history, previously read files). Each new subagent in Claude Code starts with a fresh context that must be cached, incurring the 1.25× premium on cache creation.


Results

Token Metrics

MetricBaselinegrepaiChange
Subagents launched50-100%
Tool calls13962-55%
input_tokens51,1471,326-97%
cache_read_input_tokens5,973,1617,775,888+30%
cache_creation_input_tokens563,883162,289-71%
output_tokens476347-27%

Cost Breakdown

CategoryBaselinegrepaiDifference
input_tokens cost$0.26$0.01-97%
cache_read cost$2.99$3.89+30%
cache_creation cost$3.52$1.01-71%
output_tokens cost$0.01$0.01-27%
Total billed cost$6.78$4.92-27.5%

Tool Usage Breakdown

ToolBaselinegrepai
Bash (including grepai)419
Grep3720
Glob130
Read4330
Task (subagents)50

Why the Difference?

The Subagent Problem

Without grepai, Claude Code’s typical workflow looks like this:

Question → Task(subagent_type: Explore) → Multiple iterations:
  ├─ Grep("pattern") returns 40+ files
  ├─ Read files sequentially to filter
  ├─ Launch additional searches
  └─ Each subagent = separate context = cache_creation charges

With grepai, the workflow simplifies to:

Question → Bash: grepai search "semantic query" → Targeted results
  ├─ Directly identifies relevant files
  ├─ Minimal iteration needed
  └─ No subagent spawning = no new cache contexts

The baseline scenario launched 5 subagents, requiring 563,883 cache_creation tokens. The grepai scenario eliminated subagent launches entirely, reducing this to 162,289 tokens—a savings of $2.51 on cache creation alone.

Why Cost Savings Don’t Match Token Reductions

The -97% reduction in fresh input_tokens translated to only -27.5% cost savings because cache_read_input_tokens dominated total token consumption in both scenarios. Cache read operations cost 10× less than fresh input tokens:

  • Baseline fresh input tokens: ~$0.26 of $6.78 total (3.8%)
  • Majority of cost came from cache operations: ~$6.51 of $6.78

The Glob Elimination

A critical insight: grepai reduced Glob tool calls from 13 to 0. The Glob tool searches for files matching patterns (e.g., **/*.ts) but returns dozens of results requiring sequential reads to filter. Semantic search eliminates this trial-and-error phase by returning only pertinent files immediately.


Limitations and Caveats

We want to be transparent about what this benchmark does and doesn’t show:

What We Measured

  • Actual tokens billed by the API
  • Financial cost differences
  • Number of tool invocations

What We Did NOT Measure

  • Answer quality (both approaches found correct solutions)
  • Real-world execution time (varies by server load)
  • Reproducibility across different codebases
  • Performance with smaller repositories

Non-Deterministic Behavior

Claude Code can take different paths for identical questions. While results were reproducible across five repetitions of this experiment, general behavior varies. The baseline using 5 subagents while grepai used none was a significant factor—different runs might show different subagent counts.


Benchmark conducted on the Excalidraw codebase (155,000+ lines of TypeScript).

Original article (in French) by the grepai maintainer: grepai Benchmark