What actually costs money when you run Claude — output tokens, model tier, and whether your prompt cache is still warm — and how to spot the overspend hiding in your own scheduled jobs.
Every morning a stack of your LaunchAgents fire `claude -p` cold: plan-day, classify, the language + learning lessons. Each one re-reads your entire CLAUDE.md context from scratch — a guaranteed cache miss — and some of them run Opus on work a much cheaper model would nail identically. That's real money leaking on a timer.
You pay per token, both directions: tokens going in (your CLAUDE.md, the skill file, the input data) and tokens coming out (the response). The catch most people miss is that output is roughly 5× the price of input. So a tight prompt that produces a rambling answer costs more than a long prompt that produces a crisp one — the lever is the response length, not the instructions.
The second lever is the prompt cache. Anthropic will cache your big static context (the CLAUDE.md, the skill) and a cache hit costs about a tenth of a cache miss. But the cache lives for only ~5 minutes. Back-to-back turns in a live session ride a warm cache and stay cheap; a job that sleeps longer than 5 minutes between steps — or a cron that spawns a brand-new process every morning — re-reads everything at full price. This is the exact reason the `/loop` and ScheduleWakeup tooling warns you off 300-second sleeps: just past the cache window is the worst place to wait.
The third lever is model tier. Opus is your most capable and most expensive model; Haiku is a fraction of the cost. For judgment-heavy work (writing your day's narrative, composing a lesson) Opus earns its keep. For deterministic sorting — like the classify step that buckets a dropped file into S/M/L tier — a smaller model produces the same answer for a fifteenth of the price. Running Opus on a sort is the most common quiet overspend in your whole system.
Find out what your scheduled jobs actually cost to load, before they do a single useful thing:
# Which model does each scheduled claude -p job run?
grep -ri "claude-\|--model\|opus\|haiku" /Users/tom/Claude/PDB/scripts/run_plan_day.sh
# Rough card rates (per million tokens):
# Opus ~ $15 in / $75 out
# Haiku ~ $1 in / $5 out
#
# A plan-day run loads ~40k tokens of context (CLAUDE.md + skill + inputs)
# every morning, cold cache, before writing anything:
# 40,000 / 1,000,000 * $15 = $0.60 just to read the room
#
# The same 40k on a warm cache (a follow-up turn within 5 min):
# ~ $0.06 — one tenth.
The two cheapest wins are picking the smallest model that still does the job and keeping the cache warm — both beat any amount of prompt-trimming.