Agent Retrieval Caching for Token Efficiency

One pattern I've been experimenting with in my Claude Code workflows is what I call "agent retrieval caching". It sounds a lot more complex than it actually is because in essence it's a mechanism to avoid repeated token-expensive lookups when agents need information from external sources.

The Problem

Agents frequently need to understand things like API contracts, database schemas, or service interfaces that live in sibling repositories and every time an agent needs this information, it explores files, traces code paths, and extracts the relevant details. This burns tokens and time, especially when the source hasn't changed.

For example, when building a frontend feature that calls a backend API, the agent needs to know the response shape, error codes, and field types. Without caching, this exploration happens every single time. Of course you could use some sort of database backed check for this but not only is that overkill it isn't the correct tool since it would require more overhead to keep things in sync. However, given that nearly every project is version backed we can use that to our advantage.

The Solution

The approach I've landed on uses three components:

  1. A scout agent that specializes in navigating and extracting information from external sources
  2. A structured cache that stores the extracted data as JSON
  3. Git-based cache invalidation that tracks exactly which source files the cache depends on

The key insight is storing git commit hashes for each referenced file. This enables precise invalidation when the source has been modified. This ensures that the cache is only stale if one of the actual source files changed, not just any file in the repo.

Implementation

The cache structure looks something like this:

{
  "version": 1,
  "generated_at": "2026-01-20T00:00:00.000Z",
  "endpoint": {
    "method": "GET",
    "path": "/items/:id"
  },
  "success_responses": [...],
  "error_responses": [...],
  "evidence": [
    { "file": "src/routes/items.js", "line": 42 }
  ],
  "referenced_files": {
    "src/routes/items.js": "52add4438edd5b38...",
    "src/serializers/item.js": "3c462416592191..."
  }
}

The workflow is:

  1. Cache lookup — Before any retrieval, check if cached data exists for this query
  2. Cache validation — For each file in referenced_files, compare the stored hash against the current git hash
  3. If valid — Use the cached data directly, zero tokens spent on exploration
  4. If stale — Invoke the scout agent to regenerate and cache fresh data

The Scripts

Three small shell scripts handle the mechanics:

  • find-cache.sh — Locates cached data by query key
  • validate-cache.sh — Checks all referenced file hashes against current git state
  • validate-schema.sh — Ensures the cached JSON matches the expected structure (this was a critical step to ensure that cache entries can be trusted, I need to explore hook usage for this part though)

The validation script is particularly simple:

for file in $(jq -r '.referenced_files | keys[]' "$contract"); do
  stored_hash=$(jq -r ".referenced_files[\"$file\"]" "$contract")
  current_hash=$(git -C "$repo" log -n 1 --format=%H -- "$file" 2>/dev/null)
  if [ "$stored_hash" != "$current_hash" ]; then
    exit 1  # Stale
  fi
done
exit 0  # Valid

Results

For my API contract lookups, my anecdata has shown that repeated explorations reduce token usage on these explorations from ~10-15k tokens down to ~500 tokens (just reading the cached JSON). The cache hit rate in practice is high because API contracts don't change that frequently compared to how often they're referenced.

When to Use This

This pattern works well when:

  • The same information is needed repeatedly across sessions
  • The source data changes infrequently relative to access frequency
  • The retrieval cost is high (multiple files, cross-repo navigation)
  • The data can be serialized to a structured format

It's likely overkill for simple lookups or rapidly-changing sources where the cache would constantly invalidate.

Next Steps

Database schemas, configuration files, and even caching the results of expensive web fetches. The git-hash approach works for any file-based source; for APIs you'd need a different staleness signal (ETags, timestamps, version headers).

← Back