← Blog AI Cost Management

How to Track AI API Costs Without a Cloud Service

By Akash Rajagopal ·

How to Track AI API Costs Without a Cloud Service

If you are spending $50-300 per month on AI APIs — Claude, OpenAI, Gemini, Cursor, Copilot — you probably want to know where that money goes. The default approach is to check billing dashboards after the fact or route traffic through a cloud proxy like Helicone or LangSmith. But both approaches have problems: dashboards update hours late, and proxies send your prompts through third-party servers.

There is a better way. You can track AI API costs entirely on your local machine, in real time, without any data leaving your Mac.

The Problem with Cloud-Based AI Cost Tracking

Cloud-based AI observability tools like Helicone, LangSmith, and Portkey work by proxying your API calls. Instead of sending requests directly to Anthropic or OpenAI, you change your API base URL to route through the tracking service’s servers. They log every request and response, calculate costs, and display analytics on a web dashboard.

This approach has three significant drawbacks:

Privacy exposure. Every prompt you send and every response you receive passes through the proxy’s servers. For developers working on proprietary code, confidential projects, or sensitive data, this is a non-starter. Even if the proxy service has strong privacy policies, the data still transits and temporarily resides on their infrastructure.

Latency overhead. Proxying adds 1-5 milliseconds per API call. For interactive coding sessions where you make dozens of calls per hour, this overhead is negligible. For production systems making thousands of calls, it compounds. For latency-sensitive applications, any added hop is unacceptable.

Ongoing cost. Cloud proxies charge based on request volume. Helicone’s free tier covers 10,000 requests per month. Beyond that, pricing ranges from $20 to $150+ per month. You are paying a tracking service to track the costs of another service — the irony is not lost on developers.

What AI Tools Store Locally on Your Mac

Most AI coding tools already store detailed usage data on your machine. The information is there — it just needs to be read and summarized.

Claude Code stores session logs in ~/.claude/. Each session creates a JSONL file containing prompts, responses, token counts, model identifiers, and timestamps. These files are plain text and machine-readable.

OpenAI API responses include token usage in the response headers and body. If you use a local logging library or tool wrapper, these counts are captured on disk.

Cursor, Copilot, and Windsurf maintain local state and activity logs that include model usage patterns, though the format varies by tool.

The key insight is that you do not need a cloud proxy to count tokens. The data already exists on your filesystem.

The Local-First Approach

Local-first AI cost tracking works by reading these log files directly from your Mac, applying current token pricing, and displaying running totals. Here is how it differs from cloud proxies:

AspectCloud Proxy (Helicone, LangSmith)Local-First (FavTray)
Data flowPrompts routed through proxy serversReads local log files after the fact
PrivacyProxy sees all prompts and responsesZero data leaves your Mac
LatencyAdds 1-5ms per API callZero impact
SetupChange API base URL, add API keyInstall app, auto-detects logs
Cost$0-150+/month₹49/month (or free core version)
Real-time~1 minute dashboard delayMenu bar updates per-session
Team featuresMulti-user dashboards, rolesPer-device only
OfflineRequires internetWorks fully offline

How FavTray Tracks AI Costs Locally

FavTray is a macOS menu bar app that reads AI provider log files on your Mac and calculates costs in real time. Here is how it works under the hood:

  1. Log discovery. FavTray scans known paths (~/.claude/, OpenAI log directories, Cursor state files) for session data. It uses filesystem watchers to detect new sessions as they start.

  2. Token extraction. For each session, FavTray parses token counts from log entries — input tokens, output tokens, and the model used (Claude 3.5 Sonnet, GPT-4o, etc.). Different providers use different log formats; FavTray handles the parsing for 23+ providers.

  3. Cost calculation. Token counts are multiplied by current per-token pricing for each model. FavTray maintains an embedded pricing table that updates with the app. For example, Claude 3.5 Sonnet costs $3 per million input tokens and $15 per million output tokens.

  4. Menu bar display. The running daily total appears directly in your Mac menu bar. Click for per-session breakdowns, per-model splits, and weekly trends.

  5. Budget alerts. Set daily or weekly spending thresholds (Pro feature). FavTray sends a macOS notification when you approach your limit, so you can adjust behavior mid-session rather than discovering the overage at month end.

At no point does any data leave your machine. FavTray reads files, does math, and displays numbers. No network calls are made for tracking.

Setting Up Local Cost Tracking

Getting started takes under a minute:

  1. Install FavTray from the website and open it. It appears in your menu bar.
  2. Open the AI Usage panel. FavTray auto-detects installed AI tools and their log locations.
  3. Start using your AI tools normally. Costs appear in the menu bar as sessions progress.

The free core version shows real-time costs for Claude and OpenAI. The Pro version (₹49/month) adds 30/60/90-day history, spending alerts, per-model breakdowns, and support for all 23+ providers.

Budget Management Without Cloud Tools

Tracking costs is step one. Managing them is step two. Here are practical strategies that work with local-first tracking:

Set daily budgets based on task type. A refactoring session with large context windows might cost $8-15. A quick debugging session is usually $1-3. Set your daily alert at the level where you want to pause and evaluate whether the AI assistance is still the most efficient path.

Use the right model for each task. Claude 3.5 Haiku costs roughly 10x less than Sonnet for simple tasks. When FavTray shows your daily spend climbing, switch to a cheaper model for straightforward completions and save the expensive model for complex reasoning.

Review weekly trends, not daily spikes. A $20 day feels alarming in isolation but may average out to $8 per day over a week. FavTray’s weekly view (Pro) helps you distinguish between a one-time spike and a sustained overspend pattern.

Compare your AI spend against time saved. If Claude Code saves you 2 hours on a task that costs $12 in API calls, and your hourly rate is $75+, the ROI is obvious. Local tracking gives you the data to make this calculation per-session rather than guessing at month end.

When Cloud-Based Tracking Makes More Sense

Local-first tracking is not always the best choice. Cloud-based tools like Helicone are genuinely better when:

  • You need team-wide analytics. Helicone shows per-developer breakdowns, role-based access, and team dashboards. FavTray tracks per-device only.
  • You want response caching. Helicone can cache identical API responses, reducing costs for repeated queries. Local tracking does not intercept requests.
  • You need request-level debugging. Proxy tools log the full request/response cycle for debugging API issues. Local log files may not capture all metadata.

For individual developers or small teams where privacy and simplicity matter more than team analytics, local-first tracking is the better default.

The Privacy Calculus

Every cloud service you route API traffic through is another entity that can see your code, your prompts, and your work patterns. For developers working on:

  • Proprietary source code
  • Client projects under NDA
  • Security-sensitive applications
  • Personal projects you prefer to keep private

Local-first tracking eliminates this exposure entirely. Your AI costs, usage patterns, and the content of your sessions stay on your Mac. This is not about distrust — it is about minimizing your attack surface and respecting the principle of least privilege for your data.

FavTray’s AI credentials are stored in memory only and never written to disk. Log files are read in place without copying. No telemetry, no analytics, no phone-home behavior.

Getting Started

If you want to try local-first AI cost tracking, FavTray’s core AI Usage Tracker is free with no trial expiration. It covers Claude and OpenAI cost tracking in the menu bar. The Pro version extends to 23+ providers, adds history and alerts, and bundles 6 other developer tools — all for ₹49 per month.

Download FavTray, open it, and start a Claude Code session. Watch the cost counter tick up in your menu bar. That awareness alone tends to reduce monthly spending by 25-40%, because you start making conscious decisions about when expensive AI assistance is worth the cost.

Frequently Asked Questions

Can I track AI API costs without sending data to a cloud service?

Yes. Tools like FavTray read local log files on your Mac to calculate AI API costs without sending any data to external servers. Claude stores session logs in ~/.claude/ and OpenAI stores usage data locally. FavTray reads these files on-device, applies current token pricing, and shows running totals in your menu bar.

What is local-first AI cost tracking?

Local-first AI cost tracking means all cost calculation and display happens on your device, not on a remote server. Your prompts, responses, token counts, and spending patterns never leave your Mac. This contrasts with cloud-based tools like Helicone or LangSmith that route API traffic through their servers for analytics.

How accurate is local AI cost tracking compared to cloud proxies?

Local cost tracking based on log files is typically accurate to within 2-5% of actual billing. The small discrepancy comes from timing differences between log writes and API metering. Cloud proxies like Helicone can be more precise because they intercept the actual API response with token counts, but the difference rarely matters for budgeting purposes.

Does tracking AI costs locally affect API performance?

No. Local-first tracking reads log files after API calls complete — it never intercepts or delays your API requests. Cloud-based proxies like Helicone add 1-5ms of latency per request because they sit between you and the AI provider. For latency-sensitive applications, local tracking has zero performance impact.

FavTray is coming soon

Join the waitlist and we'll notify you when we launch.

No spam. Unsubscribe anytime.