Building an AI Pentest Coach¶

How I taught Claude to help me get better at hacking¶

HackYSU 2026

Who am I?¶

Brad Theodore
Junior Operator at Black Lantern Security
Bachelors from YSU
Masters in IT with a concentration in Cybersecurity from University of Cincinnati
Started learning pentesting the hard way: overthewire.org BANDIT, too many tabs, scattered notes, forgotten techniques
I live in the terminal and Obsidian

Black Lantern Security¶

Founded 2013
Headquartered in Charleston, SC
Red/Blue/Purple Cybersecurity Services and Software
Attack Surface Management (ASM)
Application, Network, and Wireless Penetration Testing
Risk Assessment
Purple Teaming and Detection Engineering
Free and Open Source (FOSS) Tool Development
Design and Build custom Command and Control (C2) frameworks
Operations
Energy
Finance
Retail
Healthcare
Hospitality
Shipping

The Problem¶

Learning pentesting is overwhelming¶

Hundreds of tools, thousands of flags
Every machine is different, with different combinations of software.
Gaps in my IT knowledge
Getting unstuck can take time - which is fine
Writeups can be confusing with missing explainations
Your notes are a graveyard of missing techniques and half-finished writeups
You solve a machine, and forget how a month later
Thought: what if my AI assistant could actually coach me?

What I actually needed¶

A coach who, when I get stuck:

Has Visibility on what I've tried
Suggests paths or things to poke at
Remembers what I've learned across sessions
Explains every command, not just gives answers
Explains connections between steps
Asks questions instead of handing me solutions
Gets smarter as I complete more machines
Keeps my notes clean while I work

The Journey¶

v0 — Just asking Claude for help¶

Me: "I have port 445 open, what do I do?"
Claude: "Try smbclient -L //target -N"
Me: *runs it, gets output*
Me: "What does this mean?"
Claude: "..." (no context, generic answer)

Problems: - Doing HTB machines, losing track of what was run, copy-pasting output into notes manually - No memory. No structure. No teaching methodology. Every session started from zero.

The idea: Claude Code Custom Skills¶

Claude Code lets you define skills — persistent instruction files that shape how the AI behaves.

~/.claude/skills/pentest-coach/
├── SKILL.md            # Coaching behavior (~150 lines)
└── tools-reference.md  # Tool preferences (~90 lines)

When I type /pentest-coach, Claude becomes my coach. Not a chatbot. A coach with a playbook.

v1 — The first coaching session¶

What changed:

"Your move:" — Every response ends by asking ME what to do next
Command → Output → Analysis → Next format
Flag breakdowns for every command suggested
No auto-answers — asks "what do you think?" first

nxc smb 10.10.11.51 -u oscar -p 'pass' --shares

Flags: smb = protocol, -u/-p = creds, --shares = enumerate shares [+] = success, [-] = failure, (Pwn3d!) = admin

v2 — Context awareness¶

Problem: Coach had no idea what I'd already tried.

Solution: Painful-Pentest-Logger — automatic tmux logging + session initialization.

~/CTF/MachineName/
├── logs/cmd.log          # Every command I type (timestamped)
├── logs/out.log          # Filtered terminal output
├── loot/credentials.md   # Credential table
├── attack-chain.md       # Path taken so far
└── MachineName writeup.md

Coach reads these on startup. Picks up where I left off.

The session startup¶

**Status Summary:**
- Machine: Vintage (10.10.11.45)
- Current Stage: privilege escalation
- Current User: c.neri
- Flags: User ✓ | Root ✗
- Last Action: bloodhound collection
- Credentials: 3 users (see loot/credentials.md)

Where are you stuck? What have you tried?

No guessing. No repeating myself. Context from line one.

Battle-tested — real machines broke it repeatedly¶

Machine	Date	What broke / What changed
EscapeTwo	Jan 9	First PPL use — 4 days after building logger
theFrizz	Jan 16	Exposed gaps in coach initialization
certified	Jan 24	Added post-completion workflow, flag explanations
Puppy	Jan 28	v2.1: credential tables, attack chain tracker
Jeeves	Feb 4–6	Windows, Jenkins, alternate data streams
Baby	Feb 6–7	Backup Operators privesc → unix2dos lesson
Sea	Feb 11	Linux, WonderCMS XSS-to-RCE, pivoting

Every machine exposed a gap. Every gap got fixed.

The Knowledge Base¶

v2.5 — Remembering techniques (JSON era)¶

First attempt: one big JSON file of techniques.

{
  "kerberoasting": {
    "tools": ["impacket-GetUserSPNs"],
    "prerequisites": ["domain creds"],
    "commands": ["GetUserSPNs.py ..."]
  }
}

Worked but: Hard to edit. Hard to read. Hard to grep. Grew unwieldy fast.

v3 — The markdown migration¶

21 topic-specific markdown files. 190 techniques. Grep-friendly.

Category	Files	Techniques
Active Directory	7	46
Linux	2	38
Windows	2	34
Web	7	63
Cross-cutting	3	9
Total	21	190

grep -ril 'kerberoast' techniques/  # instant lookup

Technique format¶

## Kerberoasting

**Prerequisites:** Valid domain credentials
**Tools:** impacket-GetUserSPNs, hashcat

```bash
impacket-GetUserSPNs domain/user:pass -dc-ip 10.10.11.1 -request
hashcat -m 13100 hash.txt rockyou.txt
```

**Obtains:** Service account passwords (often over-privileged)
**If it fails:** No SPNs registered → try AS-REP roasting instead

Every technique: what you need, what to run, what you get, what to try if it fails.

Decision trees — teaching Claude HOW to think¶

Not just "here are techniques" but "here's the order to try them"

## I Have One Set of AD Creds — How Do I Get More?

1. Can you reach LDAP? → BloodHound collection first
2. Any Kerberoastable SPNs? → GetUserSPNs
3. AS-REP roastable users? → GetNPUsers
4. Can you read SYSVOL? → grep for passwords in scripts
5. LAPS enabled? → check ms-Mcs-AdmPwd attribute
6. None of the above? → password spray with seasons/year

15 decision trees across 13 files. Extracted from real machines.

Where decision trees come from¶

After completing a machine, I study expert writeups (0xdf's blog) and ask:

Where did the expert pivot? (the moment they knew what to try)
What dead ends did they avoid? (and how?)
What's the broader pattern? (not just this machine)

20 writeups extracted so far — each one makes the coach smarter.

The Two-Layer System¶

Technique files vs. Methodology¶

	Technique Files	Methodology.md
Purpose	Reference catalog	Battle-tested playbook
Size	190 techniques	Curated subset
When used	Claude looks up when stuck	My quick-reference during a machine
Entry bar	Any useful technique	Must be proven in practice
Style	Structured, grep-friendly	Coaching notes, flag breakdowns

Flow: Learn technique → add to KB → prove it works → graduate to Methodology

The post-machine workflow¶

After every completed machine (both flags):

Writeup — complete the narrative with branch points
Technique files — add any NEW techniques learned
Methodology.md — graduate battle-tested techniques
tools-reference.md — add new tool breakdowns
Evidence organization — screenshots, loot, creds

The coach does this review with me. It's part of the skill.

Architecture¶

System overview¶

┌─────────────────────────────────────────────────────┐
│                Claude Code CLI                      │
│                                                     │
│  /pentest-coach ──→ SKILL.md (coaching rules)       │
│                 ──→ tools-reference.md (tools)      │
│                 ──→ techniques/ (190 techniques)    │
│                 ──→ machine logs (cmd.log, out.log) │
│                 ──→ writeup, attack-chain, creds    │
└─────────────────────────────────────────────────────┘
        ↕                              ↕
   Terminal (tmux)              Obsidian (notes)
   auto-logging                 writeups + KB

The logging pipeline (Painful-Pentest-Logger)¶

Terminal input ──→ zsh preexec hook ──→ cmd.log (commands only)

Terminal output ──→ tmux pipe-pane ──→ Python filter ──→ out.log
                                      - strips ANSI codes
                                      - filters prompts
                                      - marks [CMD_START] boundaries
                                      - debounces autocomplete

Result: Clean, parseable logs that Claude can read directly.

Session name = machine folder: tmux new -s Puppy → logs to Puppy/logs/

Two AI tools — each doing what they're best at¶

	Painful-Pentest-Logger	ctf-pentest-coach
Built with	Codex (Jan 5)	Claude (Jan 14+)
Job	Infrastructure plumbing	Intelligence layer
What it does	Capture everything, filter noise	Coach, track, analyze, teach
Aware of the other?	No — just writes logs	Yes — reads PPL logs on startup

Codex: tmux hooks, Python filter, zsh preexec — raw plumbing Claude: coaching skill, credential tracking, attack chains, KB, methodology

And critically: Claude is both the builder AND the user — it reads those logs during every coaching session.

What's in the repo (public!)¶

github.com/[your-handle]/ctf-pentest-coach

├── skill/           # The coaching brain
│   ├── SKILL.md
│   └── tools-reference.md
├── logging/         # Terminal capture system
│   ├── ctf-tmux-prefix     (Python filter)
│   ├── zshrc.snippet        (command logger)
│   └── tmux.conf.snippet    (auto-start logging)
├── research/        # Knowledge base
│   └── techniques/  (21 files, 190 techniques)
├── install.sh       # One-command setup
└── README.md

Lessons learned¶

What surprised me¶

Markdown > everything — JSON was over-engineered. Plain text wins.
"If it fails" is the most valuable field — pentesting is mostly failure. Capture the pivots.
Decision trees > technique lists — knowing WHAT ORDER to try things matters more than knowing things exist.
The coach improves because I improve — the feedback loop is the product.

What I got wrong (at first)¶

Too much automation — SKILL.md grew to 595 lines with auto-suggestions, note generation, Python integrations. Cut to 203. Coaching works better when the user has to think.
Monolithic KB — one big JSON file doesn't scale. Topic-based markdown files do.
Speculative techniques — added things "just in case." Noise. Only add what you've used.
Skipping the post-machine review — the reflection is where 50% of my learning happens.

The unix2dos moment¶

Trying to exploit Backup Operators on a Windows box:

# Wrote a diskshadow script. Ran it. Silent failure. No error.
# 45 minutes of debugging. The problem?
# Windows needed CRLF line endings. Linux wrote LF.
# One command: unix2dos script.txt
# Everything worked.

That lesson is now in my KB, my Methodology, AND my and claude's memory. I will never lose those 45 minutes again.

Building your own¶

You don't need pentesting to use this pattern¶

The architecture works for any learning domain:

Competitive programming — technique files by algorithm type, decision trees for problem patterns
Web development — framework-specific KB, debugging decision trees
System design — pattern catalog, trade-off decision trees
Any skill with tools + decisions + failure modes

How to build your own (in a hackathon weekend)¶

Day 1: 1. Install Claude Code 2. Create ~/.claude/skills/your-coach/SKILL.md 3. Define: mission, coaching style, file layout, guardrails 4. Start using it on a real problem — take notes on what's missing

Day 2: 5. Add a knowledge base (start with 10 techniques, grow from there) 6. Add decision trees from your experience 7. Set up the feedback loop — every session improves the system

The key insight¶

An AI coach isn't about the AI knowing everything. It's about the AI knowing how to ask the right questions and remembering what you've learned together.

The skill file is small. The knowledge base grows. The decision trees capture how to think, not just what to know.

Get the code¶

Public repo: github.com/theo2612/ctf-pentest-coach

Full skill definition
Logging system (tmux + Python)
190 techniques in 21 topic files
15 decision trees
Install script (Linux / macOS)

Claude Code: claude.com/claude-code - Codex and Gemini?