Building an AI Pentest Coach
How I taught Claude to help me get better at hacking
HackYSU 2026
Who am I?
- Brad Theodore
- Junior Operator at Black Lantern Security
- Bachelors from YSU
- Masters in IT with a concentration in Cybersecurity from University of Cincinnati
- Started learning pentesting the hard way: overthewire.org BANDIT, too many tabs, scattered notes, forgotten techniques
- I live in the terminal and Obsidian
Black Lantern Security
The Problem
Learning pentesting is overwhelming
- Hundreds of tools, thousands of flags
- Every machine is different, with different combinations of software.
- Gaps in my IT knowledge
- Getting unstuck can take time - which is fine
- Writeups can be confusing with missing explainations
- Your notes are a graveyard of missing techniques and half-finished writeups
- You solve a machine, and forget how a month later
- Thought: what if my AI assistant could actually coach me?
What I actually needed
A coach who, when I get stuck:
- Has Visibility on what I've tried
- Suggests paths or things to poke at
- Remembers what I've learned across sessions
- Explains every command, not just gives answers
- Explains connections between steps
- Asks questions instead of handing me solutions
- Gets smarter as I complete more machines
- Keeps my notes clean while I work
The Journey
v0 — Just asking Claude for help
Me: "I have port 445 open, what do I do?"
Claude: "Try smbclient -L //target -N"
Me: *runs it, gets output*
Me: "What does this mean?"
Claude: "..." (no context, generic answer)
Problems:
- Doing HTB machines, losing track of what was run, copy-pasting output into notes manually
- No memory. No structure. No teaching methodology.
Every session started from zero.
The idea: Claude Code Custom Skills
Claude Code lets you define skills — persistent instruction files that shape how the AI behaves.
~/.claude/skills/pentest-coach/
├── SKILL.md # Coaching behavior (~150 lines)
└── tools-reference.md # Tool preferences (~90 lines)
When I type /pentest-coach, Claude becomes my coach.
Not a chatbot. A coach with a playbook.
v1 — The first coaching session
What changed:
- "Your move:" — Every response ends by asking ME what to do next
- Command → Output → Analysis → Next format
- Flag breakdowns for every command suggested
- No auto-answers — asks "what do you think?" first
nxc smb 10.10.11.51 -u oscar -p 'pass' --shares
Flags: smb = protocol,
-u/-p = creds,
--shares = enumerate shares
[+] = success,
[-] = failure,
(Pwn3d!) = admin
v2 — Context awareness
Problem: Coach had no idea what I'd already tried.
Solution: Painful-Pentest-Logger — automatic tmux logging + session initialization.
~/CTF/MachineName/
├── logs/cmd.log # Every command I type (timestamped)
├── logs/out.log # Filtered terminal output
├── loot/credentials.md # Credential table
├── attack-chain.md # Path taken so far
└── MachineName writeup.md
Coach reads these on startup. Picks up where I left off.
The session startup
**Status Summary:**
- Machine: Vintage (10.10.11.45)
- Current Stage: privilege escalation
- Current User: c.neri
- Flags: User ✓ | Root ✗
- Last Action: bloodhound collection
- Credentials: 3 users (see loot/credentials.md)
Where are you stuck? What have you tried?
No guessing. No repeating myself. Context from line one.
Battle-tested — real machines broke it repeatedly
| Machine |
Date |
What broke / What changed |
| EscapeTwo |
Jan 9 |
First PPL use — 4 days after building logger |
| theFrizz |
Jan 16 |
Exposed gaps in coach initialization |
| certified |
Jan 24 |
Added post-completion workflow, flag explanations |
| Puppy |
Jan 28 |
v2.1: credential tables, attack chain tracker |
| Jeeves |
Feb 4–6 |
Windows, Jenkins, alternate data streams |
| Baby |
Feb 6–7 |
Backup Operators privesc → unix2dos lesson |
| Sea |
Feb 11 |
Linux, WonderCMS XSS-to-RCE, pivoting |
Every machine exposed a gap. Every gap got fixed.
The Knowledge Base
v2.5 — Remembering techniques (JSON era)
First attempt: one big JSON file of techniques.
{
"kerberoasting": {
"tools": ["impacket-GetUserSPNs"],
"prerequisites": ["domain creds"],
"commands": ["GetUserSPNs.py ..."]
}
}
Worked but: Hard to edit. Hard to read. Hard to grep. Grew unwieldy fast.
v3 — The markdown migration
21 topic-specific markdown files. 190 techniques. Grep-friendly.
| Category |
Files |
Techniques |
| Active Directory |
7 |
46 |
| Linux |
2 |
38 |
| Windows |
2 |
34 |
| Web |
7 |
63 |
| Cross-cutting |
3 |
9 |
| Total |
21 |
190 |
grep -ril 'kerberoast' techniques/ # instant lookup
## Kerberoasting
**Prerequisites:** Valid domain credentials
**Tools:** impacket-GetUserSPNs, hashcat
```bash
impacket-GetUserSPNs domain/user:pass -dc-ip 10.10.11.1 -request
hashcat -m 13100 hash.txt rockyou.txt
```
**Obtains:** Service account passwords (often over-privileged)
**If it fails:** No SPNs registered → try AS-REP roasting instead
Every technique: what you need, what to run, what you get, what to try if it fails.
Decision trees — teaching Claude HOW to think
Not just "here are techniques" but "here's the order to try them"
## I Have One Set of AD Creds — How Do I Get More?
1. Can you reach LDAP? → BloodHound collection first
2. Any Kerberoastable SPNs? → GetUserSPNs
3. AS-REP roastable users? → GetNPUsers
4. Can you read SYSVOL? → grep for passwords in scripts
5. LAPS enabled? → check ms-Mcs-AdmPwd attribute
6. None of the above? → password spray with seasons/year
15 decision trees across 13 files. Extracted from real machines.
Where decision trees come from
After completing a machine, I study expert writeups (0xdf's blog) and ask:
- Where did the expert pivot? (the moment they knew what to try)
- What dead ends did they avoid? (and how?)
- What's the broader pattern? (not just this machine)
20 writeups extracted so far — each one makes the coach smarter.
The Two-Layer System
Technique files vs. Methodology
|
Technique Files |
Methodology.md |
| Purpose |
Reference catalog |
Battle-tested playbook |
| Size |
190 techniques |
Curated subset |
| When used |
Claude looks up when stuck |
My quick-reference during a machine |
| Entry bar |
Any useful technique |
Must be proven in practice |
| Style |
Structured, grep-friendly |
Coaching notes, flag breakdowns |
Flow: Learn technique → add to KB → prove it works → graduate to Methodology
The post-machine workflow
After every completed machine (both flags):
- Writeup — complete the narrative with branch points
- Technique files — add any NEW techniques learned
- Methodology.md — graduate battle-tested techniques
- tools-reference.md — add new tool breakdowns
- Evidence organization — screenshots, loot, creds
The coach does this review with me. It's part of the skill.
Architecture
System overview
┌─────────────────────────────────────────────────────┐
│ Claude Code CLI │
│ │
│ /pentest-coach ──→ SKILL.md (coaching rules) │
│ ──→ tools-reference.md (tools) │
│ ──→ techniques/ (190 techniques) │
│ ──→ machine logs (cmd.log, out.log) │
│ ──→ writeup, attack-chain, creds │
└─────────────────────────────────────────────────────┘
↕ ↕
Terminal (tmux) Obsidian (notes)
auto-logging writeups + KB
The logging pipeline (Painful-Pentest-Logger)
Terminal input ──→ zsh preexec hook ──→ cmd.log (commands only)
Terminal output ──→ tmux pipe-pane ──→ Python filter ──→ out.log
- strips ANSI codes
- filters prompts
- marks [CMD_START] boundaries
- debounces autocomplete
Result: Clean, parseable logs that Claude can read directly.
Session name = machine folder: tmux new -s Puppy → logs to Puppy/logs/
|
Painful-Pentest-Logger |
ctf-pentest-coach |
| Built with |
Codex (Jan 5) |
Claude (Jan 14+) |
| Job |
Infrastructure plumbing |
Intelligence layer |
| What it does |
Capture everything, filter noise |
Coach, track, analyze, teach |
| Aware of the other? |
No — just writes logs |
Yes — reads PPL logs on startup |
Codex: tmux hooks, Python filter, zsh preexec — raw plumbing
Claude: coaching skill, credential tracking, attack chains, KB, methodology
And critically: Claude is both the builder AND the user — it reads those logs during every coaching session.
What's in the repo (public!)
github.com/[your-handle]/ctf-pentest-coach
├── skill/ # The coaching brain
│ ├── SKILL.md
│ └── tools-reference.md
├── logging/ # Terminal capture system
│ ├── ctf-tmux-prefix (Python filter)
│ ├── zshrc.snippet (command logger)
│ └── tmux.conf.snippet (auto-start logging)
├── research/ # Knowledge base
│ └── techniques/ (21 files, 190 techniques)
├── install.sh # One-command setup
└── README.md
Lessons learned
What surprised me
-
Markdown > everything — JSON was over-engineered. Plain text wins.
-
"If it fails" is the most valuable field — pentesting is mostly failure. Capture the pivots.
-
Decision trees > technique lists — knowing WHAT ORDER to try things matters more than knowing things exist.
-
The coach improves because I improve — the feedback loop is the product.
What I got wrong (at first)
- Too much automation — SKILL.md grew to 595 lines with auto-suggestions, note generation, Python integrations. Cut to 203. Coaching works better when the user has to think.
- Monolithic KB — one big JSON file doesn't scale. Topic-based markdown files do.
- Speculative techniques — added things "just in case." Noise. Only add what you've used.
- Skipping the post-machine review — the reflection is where 50% of my learning happens.
The unix2dos moment
Trying to exploit Backup Operators on a Windows box:
# Wrote a diskshadow script. Ran it. Silent failure. No error.
# 45 minutes of debugging. The problem?
# Windows needed CRLF line endings. Linux wrote LF.
# One command: unix2dos script.txt
# Everything worked.
That lesson is now in my KB, my Methodology, AND my and claude's memory.
I will never lose those 45 minutes again.
Building your own
You don't need pentesting to use this pattern
The architecture works for any learning domain:
- Competitive programming — technique files by algorithm type, decision trees for problem patterns
- Web development — framework-specific KB, debugging decision trees
- System design — pattern catalog, trade-off decision trees
- Any skill with tools + decisions + failure modes
How to build your own (in a hackathon weekend)
Day 1:
1. Install Claude Code
2. Create ~/.claude/skills/your-coach/SKILL.md
3. Define: mission, coaching style, file layout, guardrails
4. Start using it on a real problem — take notes on what's missing
Day 2:
5. Add a knowledge base (start with 10 techniques, grow from there)
6. Add decision trees from your experience
7. Set up the feedback loop — every session improves the system
The key insight
An AI coach isn't about the AI knowing everything.
It's about the AI knowing how to ask the right questions
and remembering what you've learned together.
The skill file is small. The knowledge base grows.
The decision trees capture how to think, not just what to know.
Get the code
Public repo: github.com/theo2612/ctf-pentest-coach
- Full skill definition
- Logging system (tmux + Python)
- 190 techniques in 21 topic files
- 15 decision trees
- Install script (Linux / macOS)
Claude Code: claude.com/claude-code
- Codex and Gemini?
Questions?
github.com/[your-handle]/ctf-pentest-coach