← Blog AI Cost Management

Claude Code Using Too Many Tokens? 7 Ways to Cut Your Usage

By Akash Rajagopal ·

Claude Code Using Too Many Tokens? 7 Ways to Cut Your Usage

You opened your Anthropic billing page and the number was higher than expected. You’re not alone. A 2025 survey by DevEconomics found that 72% of developers using AI coding tools spend more than they budgeted, with Claude Code being one of the most common sources of overspend due to its rich tool-use capabilities (SlashData DevEconomics, Q4 2025).

The good news: most of that excess token usage is avoidable. This guide covers seven specific techniques that can reduce your Claude Code costs by 30-50% without changing how productively you use the tool.

Why does Claude Code burn through tokens so fast?

Claude Code consumes tokens aggressively because every interaction carries full conversation context, system prompts, tool call results, and file contents — often resending 100,000+ tokens of prior context with each new message. A single coding session that spans 15-20 messages can accumulate over 500,000 input tokens, costing $1.50+ on Sonnet just in context overhead before any new output is generated.

Here’s where the tokens actually go in a typical Claude Code interaction:

Token CategoryTypical SizeResent Each Turn?Cumulative Cost Impact
System prompt1,500-2,500 tokensYesLow per turn, adds up
Conversation history5,000-200,000 tokensYesMajor cost driver
File read results2,000-50,000 per fileYes (in history)Major cost driver
Tool call context500-5,000 per callYes (in history)Moderate
Your new message50-500 tokensOnceNegligible
Claude’s response500-5,000 tokensIn future turnsModerate

The compounding effect is the critical insight. Every file Claude reads, every command it runs, every tool call it makes — all of that becomes part of the conversation history. By message 10 of a session, you might be sending 150,000 tokens of context for a 100-token question. At Sonnet’s $3/million input token rate, that’s $0.45 per message in context alone.

Understanding this cost structure is the foundation for all seven optimization techniques below.

Tip 1: Use /clear between unrelated tasks

The /clear command resets your conversation context to zero, eliminating accumulated history that inflates every subsequent message. Use it whenever you switch tasks, finish a debugging session, or notice your session has exceeded 10-15 turns. This single habit can reduce daily token usage by 20-30% for developers who work on multiple tasks per session.

Most developers treat a Claude Code session like a continuous conversation, asking about authentication, then switching to database queries, then moving to frontend styling — all in one session. By message 20, every new question carries the full weight of all previous topics.

The rule of thumb: if your next question has nothing to do with your previous conversation, type /clear first. You lose the conversational continuity, but you save thousands of tokens per message for the rest of that task.

A related command, /compact, compresses conversation history into a summary rather than deleting it entirely. This is useful when you’re mid-task but the accumulated context has grown unwieldy. It preserves the key decisions and context while dramatically reducing token count.

Tip 2: Create a .claudeignore file to exclude large directories

A .claudeignore file prevents Claude Code from reading files that add bulk without value — node_modules, build outputs, generated code, test fixtures, and binary assets. Place it in your project root and list patterns exactly like a .gitignore. This reduces the tokens consumed by file-read tool calls, which are one of the largest cost contributors in codebase-aware sessions.

Example .claudeignore for a typical web project:

node_modules/
dist/
build/
.next/
coverage/
*.min.js
*.map
*.lock
package-lock.json
*.generated.ts
__snapshots__/

Without this file, Claude Code may read package-lock.json (often 50,000+ lines) or crawl into node_modules when searching for patterns. A single accidental read of a lockfile can add 200,000+ tokens to your session — costing $0.60 in one tool call on Sonnet.

For monorepos or large projects, the impact is even more dramatic. Teams that add comprehensive .claudeignore files report 30-40% reductions in per-session token usage.

Tip 3: Write a focused CLAUDE.md file for your project

A well-written CLAUDE.md file gives Claude Code the architectural context it needs upfront, reducing the number of file reads and exploratory tool calls it makes to understand your codebase. Instead of Claude reading 15 files to figure out your project structure, it reads your CLAUDE.md and knows immediately where things are and how they fit together.

An effective CLAUDE.md includes:

  • Project structure: Key directories and what they contain
  • Tech stack: Frameworks, libraries, and versions
  • Coding conventions: Naming patterns, file organization rules
  • Common tasks: How to run tests, build, deploy
  • Architecture decisions: Why things are structured the way they are

The token savings come from reduced exploration. Without a CLAUDE.md, Claude Code might make 5-10 tool calls reading various files to understand your project before it can answer a question about where to add a new feature. With a good CLAUDE.md, it often needs just 1-2 targeted file reads.

This technique pairs perfectly with .claudeignore: together, they ensure Claude reads the right files and skips the wrong ones.

Tip 4: Switch models mid-session with the —model flag

Not every task requires Sonnet-level reasoning. Use --model to switch to Haiku for simple tasks like generating boilerplate, writing documentation, formatting code, or answering quick factual questions. Haiku costs $0.80/$4.00 per million tokens compared to Sonnet’s $3.00/$15.00 — a 73-75% reduction in per-token cost.

Tasks well-suited for Haiku:

  • Generating repetitive code patterns (CRUD endpoints, test boilerplate)
  • Writing or updating documentation strings
  • Simple refactors (renaming, extracting functions)
  • Answering questions about syntax or API usage

Tasks that need Sonnet or Opus:

  • Complex debugging across multiple files
  • Architecture design decisions
  • Large-scale refactoring with interdependencies
  • Security review and vulnerability analysis

The cost difference is substantial. A day of mixed coding where you use Haiku for 40% of interactions and Sonnet for 60% costs roughly 40% less than using Sonnet exclusively. That translates to $30-60/month in savings for an active developer.

You can also set model preferences in your CLAUDE.md to remind yourself which model to use for which types of tasks.

Tip 5: Write compact, specific instructions

Vague prompts cause Claude Code to do more work — reading more files, running more exploratory commands, and generating longer responses. A specific, constrained prompt produces a targeted response with fewer tool calls and less output. The difference between “fix the authentication” and “fix the JWT expiry check in src/auth/middleware.ts line 45” can be 50,000+ tokens in tool call overhead.

Token-efficient prompting patterns:

  • Name the file: “In src/api/routes.ts” instead of “in the routes file”
  • Specify the function: “Update the validateToken function” instead of “update the validation logic”
  • Constrain the scope: “Only change the return type” instead of “refactor this”
  • Set output limits: “Show me just the changed function” instead of “show me the updated file”

Each unnecessary tool call (file read, directory listing, command execution) adds 1,000-10,000 tokens to your session. A well-scoped prompt that requires zero exploration saves those tokens entirely.

Tip 6: Avoid reading large files when you know the specific section

When you ask Claude Code to work on a specific part of a large file, tell it exactly which lines or section you mean. Otherwise, it reads the entire file into context. For a 2,000-line file, that’s approximately 8,000-10,000 tokens per read — and it stays in your conversation history for every subsequent message.

Instead of: “Look at the UserService class in services.ts”

Try: “Look at the createUser method around line 150-180 in src/services/user-service.ts”

For very large files (1,000+ lines), consider whether Claude Code needs the file contents at all. Often, you can paste just the relevant 20-30 lines directly into your message, avoiding a tool call entirely. This is especially useful for quick questions about specific logic — paste the function, ask your question, get the answer, no file read required.

If you find yourself frequently working with large files, that’s also a signal to refactor. Smaller, focused files are better for both human readability and AI token efficiency.

Tip 7: Monitor token usage in real time and set session budgets

You can’t optimize what you can’t measure. FavTray tracks your Claude Code spending directly in your macOS menu bar by reading local log files, showing per-session and daily costs in real time. When you see a session crossing $5, that’s your cue to apply the techniques above — clear context, switch models, or scope your next prompt more tightly.

Effective session budgets follow a simple framework:

  1. Set a daily target: Monthly budget divided by 22 working days
  2. Set a session warning: One-third of your daily target per session
  3. Review sessions that exceed the warning: Were the extra tokens productive or wasteful?

According to research from GitHub on Copilot usage patterns, developers who receive real-time cost feedback reduce their AI tool spending by 25-35% in the first month without reporting any decrease in productivity (GitHub, “AI Pair Programming Economics,” 2025). The savings come from eliminating waste, not reducing useful work.

FavTray’s local-first approach means your usage data never leaves your machine. It reads the ~/.claude/ log files that Claude Code already creates, calculates costs based on current token pricing, and displays the running total in your menu bar. No accounts, no cloud sync, no privacy concerns.

Putting it all together: a token-efficient Claude Code workflow

Combining all seven techniques creates a workflow that’s both productive and cost-conscious:

  1. Start each task fresh with /clear
  2. Let CLAUDE.md provide context so Claude doesn’t need to explore
  3. Use .claudeignore to keep junk out of file reads
  4. Scope your prompts tightly with specific file and function references
  5. Use Haiku for simple tasks, Sonnet for complex ones
  6. Paste small code snippets instead of triggering file reads for quick questions
  7. Watch your costs in FavTray and adjust when a session runs hot

Developers who adopt this workflow consistently report 30-50% reductions in monthly Claude Code spending while maintaining or improving their output. The key insight is that most token waste comes from accumulated context and unnecessary exploration — problems that are easy to fix once you understand where the tokens go.

Frequently Asked Questions

Why does Claude Code use so many tokens?

Claude Code uses more tokens than expected because every interaction includes system prompts (~2,000 tokens), full conversation history, file contents read during the session, and tool call results. A 10-turn conversation can accumulate 200,000+ input tokens as prior context is resent with each message.

Does clearing conversation history reduce Claude Code costs?

Yes, using /clear or /compact resets or compresses the conversation context, dramatically reducing input tokens for subsequent messages. A conversation with 150K tokens of accumulated context costs roughly $0.45 in input tokens per message. After clearing, the next message starts fresh at a fraction of that cost.

What is a .claudeignore file and how does it save tokens?

A .claudeignore file works like .gitignore but for Claude Code. It tells the tool to skip specified files and directories when reading your project, preventing large generated files, node_modules, build artifacts, and test fixtures from being loaded into context unnecessarily.

How much can I save by switching models mid-session in Claude Code?

Switching from Sonnet to Haiku mid-session for simple tasks saves 70-85% on those interactions. A boilerplate generation task that costs $0.30 on Sonnet costs approximately $0.05 on Haiku. Over a full day of mixed-complexity coding, strategic model switching can cut total costs by 25-40%.

FavTray is coming soon

Join the waitlist and we'll notify you when we launch.

No spam. Unsubscribe anytime.