I’ve been running Claude Code heavily for the past few months — and at some point I started wondering: what exactly is it doing all day?
Not out of distrust. More out of curiosity and professional habit. When you deploy any autonomous system, you want visibility into what it’s doing — what commands it ran, what files it touched, whether it made any outbound network calls. For AI agents, that visibility is almost entirely missing.
So I built a /audit skill for Claude Code.
What It Does
/audit generates a structured report of everything Claude Code did in a session or across a full day. It parses Claude’s local transcript files and surfaces:
- Every bash command executed, with risk classification
- All file operations (reads, writes, edits) — with sensitive path detection
- MCP tool calls and their parameters
- Agent delegations (subagents spawned, background vs foreground, worktree isolation)
- Skills invoked and web requests made
Two modes:
| Command | What it does |
|---|---|
/audit session | Audit the current conversation from history |
/audit or daily audit | Parse all transcript JSONL files from ~/.claude/projects/ for today |
Risk Classification
The most useful part is the risk tagging. Every bash command gets classified against a table of 17 risk patterns:
| Risk Tag | Example |
|---|---|
DESTRUCTIVE | rm -rf, git clean -f |
EXTERNAL_NETWORK | curl to non-localhost URLs |
DATA_EXFILTRATION | curl -X POST -d @file, scp outbound with file args |
SECRET_LITERAL | Literal sk-, ghp_, AKIA tokens in args |
GIT_PUSH | git push, git push --force |
PKG_INSTALL | npm install, pip3 install, brew install |
SAFETY_BYPASS | --dangerously, --no-verify, --force |
DYNAMIC_EXEC | eval, bash -c with piped input, curl | bash |
The classifier only looks at top-level shell commands — content inside heredocs, python3 -c "..." scripts, or string arguments doesn’t trigger false positives. Known-safe patterns (like source ~/.bashrc or > /dev/null) are explicitly excluded.
Secrets are redacted before display — API keys, JWTs, GitHub tokens, and AWS keys are replaced with [REDACTED] in every output line.
What a Real Report Looks Like
Here’s an excerpt from running /audit on today’s sessions:
# Daily Audit — 2026-03-25
Sessions: 2 | Total tool calls: 52
## Session Overview
| # | Directory | Time Range | Bash | Risky | Files | Agents | Flags |
|---|-------------------|-------------|------|-------|-------|--------|-------|
| 1 | ~/.claude/skills | 04:15–05:11 | 32 | 5 | 0 | 0 | 5 |
| 2 | ~/Desktop/skills | 07:24–07:40 | 8 | 1 | 9 | 0 | 1 |
## Risky Commands
| # | Time | Risk Tags | Command |
|---|-------|------------------|------------------------------------------------------|
| 1 | 04:18 | EXTERNAL_NETWORK | `curl -X POST "https://api.algolia.com/..."` |
| 2 | 07:38 | — | `mkdir -p ~/.claude/skills/audit && cp SKILL.md ...` |
The EXTERNAL_NETWORK flags on the Algolia calls are accurate — those were real outbound API calls from a YC company search task. The risk classification is there to inform, not to alarm.
Why This Matters for AI Agent Security
Visibility is the foundation of trust. You can’t have a meaningful security posture for an autonomous agent if you have no record of what it did.
This is especially relevant as Claude Code gets used for more consequential work — long-running agents, production deployments, access to sensitive codebases. A daily audit habit gives you:
- Accountability — a record of what ran and when
- Anomaly detection — unusual commands stand out in the risk table
- Incident context — if something goes wrong, the transcript gives you a timeline
It’s not enterprise-grade audit logging (the skill says so explicitly). But it’s a meaningful first step for personal security hygiene.
Install It
The skill is available as a GitHub Gist. One-liner install:
mkdir -p ~/.claude/skills/audit && curl -sL \
https://gist.githubusercontent.com/lghupan/46d65f4035481ef6058d0e895bdeb73a/raw/SKILL.md \
-o ~/.claude/skills/audit/SKILL.md
Then use it with /audit in Claude Code.
A PR is open to merge it into the official Anthropic skills repo. Once that lands, it’ll be available via the Claude Code plugin marketplace.
The source is in my fork of anthropics/skills.
What’s Next
A few things I want to add:
- Date range support —
audit 2026-03-20 to 2026-03-25for week-in-review - Export to JSON — for piping into other tools or dashboards
- Threshold alerts — flag sessions with unusually high risky command counts
If you try it out and hit something broken or missing, open an issue on the PR or reach out.