I wanted to track how much my Claude API usage was actually costing me. Not the billing page estimate - the real cost. Per request. Per task. Per tool call.
So I built Langley: an intercepting proxy that captures every Claude API request, extracts token usage, calculates costs, and shows it all in real-time. In one coding session.
The Problem
Claude's billing shows monthly totals. Helpful, but useless for:
- Debugging - "Why did this task cost $5?"
- Optimization - "Which tool is eating my context?"
- Accountability - "What's this project actually costing?"
I needed request-level visibility. Something that sits between my code and Claude, captures everything, and gives me analytics.
The Architecture
Langley is a TLS-intercepting proxy. Traffic flows through it transparently:
Your App -> HTTPS -> Langley -> HTTPS -> Claude API
|
v
SQLite DB
|
v
Dashboard
It generates certificates on-the-fly, captures request/response pairs, parses Claude's SSE streams, extracts token counts, and calculates costs using a pricing table.
The dashboard shows:
- Real-time flow list (WebSocket updates)
- Token counts and costs per request
- Analytics by task, by tool, by day
- Anomaly detection (large contexts, slow responses, retries)
What Made It Work
1. Security From the Start
Before writing code, we did a security analysis. Matt (our auditor persona) found 10 issues to address:
- Credential redaction on write (never store API keys)
- Upstream TLS validation (no self-signed upstream)
- CA key permissions (0600, not world-readable)
- Random certificate serials (not predictable)
- LRU cert cache (prevent memory exhaustion)
These weren't afterthoughts - they shaped the design.
2. Phased Implementation
We broke the work into phases:
| Phase | Deliverable |
|---|---|
| 0 | Basic HTTP proxy that forwards requests |
| 1 | TLS interception, SQLite persistence |
| 2 | REST API, WebSocket server, basic UI |
| 3 | Token extraction, cost calculation, analytics |
| 4 | Full dashboard with filtering and charts |
| 5 | Polish, documentation, blog |
Each phase built on the last. Each had a clear deliverable.
3. Right-Sized Technology
- Go - Single binary, easy deployment, great TLS libraries
- SQLite - No server needed, WAL mode for concurrent reads
- React - Just works, Vite for fast builds
- WebSocket - Real-time without polling
No Kubernetes. No Postgres. No microservices. Just the minimum to solve the problem.
The Tricky Parts
SSE Parsing
Claude's streaming API uses Server-Sent Events. Token counts come in message_start and message_delta events, scattered across the stream. The parser accumulates them correctly:
case "message_start":
// Extract input tokens from initial message
if usage := msg["usage"]; usage != nil {
flow.InputTokens = usage["input_tokens"]
}
case "message_delta":
// Extract output tokens from final delta
if usage := event["usage"]; usage != nil {
flow.OutputTokens = usage["output_tokens"]
}
Task Grouping
Requests don't come with "task" labels. We infer them:
- Explicit
X-Langley-Taskheader (if you add it) - User ID from the request body's metadata
- Same host with 5-minute gap (new task starts)
This groups related requests together for per-task analytics.
Anomaly Detection
The system flags:
- Large contexts (>100k input tokens)
- Slow responses (>30 seconds)
- Rapid repeats (same endpoint, short window = likely retries)
- High single-request cost (>$1)
- Tool failures
These help catch runaway loops and inefficient prompts.
The Result
Langley is about 2,000 lines of Go and 600 lines of React. It:
- Intercepts HTTPS traffic transparently
- Redacts credentials before storage
- Extracts token usage from SSE streams
- Calculates costs using model-specific pricing
- Shows real-time analytics in a dashboard
- Detects anomalies automatically
All without requiring any changes to your Claude client code. Just set HTTPS_PROXY and you're capturing.
What I Learned
Plan before code. We spent time on a security analysis and phased plan before writing implementation code. The plan survived contact with reality - the phases worked as scoped.
Simple architecture wins. SQLite handles everything. No external dependencies. Deploys as a single binary (once built with embedded frontend).
Real-time matters. The WebSocket updates make debugging feel immediate. Polling would have worked but felt sluggish.
Try It
The code is at github.com/HakAl/langley.
# Build
go build -o langley ./cmd/langley
# Trust the CA (see langley -show-ca)
# Run
./langley
# Set proxy
export HTTPS_PROXY=http://localhost:9090
# Open dashboard
# http://localhost:9091
Now you can see exactly what Claude is doing with your tokens.