Skip to content

Building an AI Pentest Coach

How I taught Claude to help me get better at hacking

HackYSU 2026


Who am I?

  • Brad Theodore
  • Junior Operator at Black Lantern Security
  • Bachelors from YSU
  • Masters in IT with a concentration in Cybersecurity from University of Cincinnati
  • Started learning pentesting the hard way: overthewire.org BANDIT, too many tabs, scattered notes, forgotten techniques
  • I live in the terminal and Obsidian

Black Lantern Security

  • Founded 2013​
  • Headquartered in Charleston, SC​
  • Red/Blue/Purple Cybersecurity Services and Software​
  • Attack Surface Management (ASM)​
  • Application, Network, and Wireless Penetration Testing​
  • Risk Assessment​
  • Purple Teaming and Detection Engineering​
  • Free and Open Source (FOSS) Tool Development​
  • Design and Build custom Command and Control (C2) frameworks​

  • Operations​

  • Energy​
  • Finance​
  • Retail​
  • Healthcare
  • Hospitality​
  • Shipping​

The Problem


Learning pentesting is overwhelming

  • Hundreds of tools, thousands of flags
  • Every machine is different, with different combinations of software.
  • Gaps in my IT knowledge
  • Getting unstuck can take time - which is fine
  • Writeups can be confusing with missing explainations
  • Your notes are a graveyard of missing techniques and half-finished writeups
  • You solve a machine, and forget how a month later
  • Thought: what if my AI assistant could actually coach me?

What I actually needed

A coach who, when I get stuck:

  1. Has Visibility on what I've tried
  2. Suggests paths or things to poke at
  3. Remembers what I've learned across sessions
  4. Explains every command, not just gives answers
  5. Explains connections between steps
  6. Asks questions instead of handing me solutions
  7. Gets smarter as I complete more machines
  8. Keeps my notes clean while I work

The Journey


v0 — Just asking Claude for help

Me: "I have port 445 open, what do I do?"
Claude: "Try smbclient -L //target -N"
Me: *runs it, gets output*
Me: "What does this mean?"
Claude: "..." (no context, generic answer)

Problems: - Doing HTB machines, losing track of what was run, copy-pasting output into notes manually - No memory. No structure. No teaching methodology. Every session started from zero.


The idea: Claude Code Custom Skills

Claude Code lets you define skills — persistent instruction files that shape how the AI behaves.

~/.claude/skills/pentest-coach/
├── SKILL.md            # Coaching behavior (~150 lines)
└── tools-reference.md  # Tool preferences (~90 lines)

When I type /pentest-coach, Claude becomes my coach. Not a chatbot. A coach with a playbook.


v1 — The first coaching session

What changed:

  • "Your move:" — Every response ends by asking ME what to do next
  • Command → Output → Analysis → Next format
  • Flag breakdowns for every command suggested
  • No auto-answers — asks "what do you think?" first

nxc smb 10.10.11.51 -u oscar -p 'pass' --shares
Flags: smb = protocol, -u/-p = creds, --shares = enumerate shares [+] = success, [-] = failure, (Pwn3d!) = admin


v2 — Context awareness

Problem: Coach had no idea what I'd already tried.

Solution: Painful-Pentest-Logger — automatic tmux logging + session initialization.

~/CTF/MachineName/
├── logs/cmd.log          # Every command I type (timestamped)
├── logs/out.log          # Filtered terminal output
├── loot/credentials.md   # Credential table
├── attack-chain.md       # Path taken so far
└── MachineName writeup.md

Coach reads these on startup. Picks up where I left off.


The session startup

**Status Summary:**
- Machine: Vintage (10.10.11.45)
- Current Stage: privilege escalation
- Current User: c.neri
- Flags: User ✓ | Root ✗
- Last Action: bloodhound collection
- Credentials: 3 users (see loot/credentials.md)

Where are you stuck? What have you tried?

No guessing. No repeating myself. Context from line one.


Battle-tested — real machines broke it repeatedly

Machine Date What broke / What changed
EscapeTwo Jan 9 First PPL use — 4 days after building logger
theFrizz Jan 16 Exposed gaps in coach initialization
certified Jan 24 Added post-completion workflow, flag explanations
Puppy Jan 28 v2.1: credential tables, attack chain tracker
Jeeves Feb 4–6 Windows, Jenkins, alternate data streams
Baby Feb 6–7 Backup Operators privesc → unix2dos lesson
Sea Feb 11 Linux, WonderCMS XSS-to-RCE, pivoting

Every machine exposed a gap. Every gap got fixed.


The Knowledge Base


v2.5 — Remembering techniques (JSON era)

First attempt: one big JSON file of techniques.

{
  "kerberoasting": {
    "tools": ["impacket-GetUserSPNs"],
    "prerequisites": ["domain creds"],
    "commands": ["GetUserSPNs.py ..."]
  }
}

Worked but: Hard to edit. Hard to read. Hard to grep. Grew unwieldy fast.


v3 — The markdown migration

21 topic-specific markdown files. 190 techniques. Grep-friendly.

Category Files Techniques
Active Directory 7 46
Linux 2 38
Windows 2 34
Web 7 63
Cross-cutting 3 9
Total 21 190
grep -ril 'kerberoast' techniques/  # instant lookup

Technique format

## Kerberoasting

**Prerequisites:** Valid domain credentials
**Tools:** impacket-GetUserSPNs, hashcat

​```bash
impacket-GetUserSPNs domain/user:pass -dc-ip 10.10.11.1 -request
hashcat -m 13100 hash.txt rockyou.txt
​```

**Obtains:** Service account passwords (often over-privileged)
**If it fails:** No SPNs registered → try AS-REP roasting instead

Every technique: what you need, what to run, what you get, what to try if it fails.


Decision trees — teaching Claude HOW to think

Not just "here are techniques" but "here's the order to try them"

## I Have One Set of AD Creds — How Do I Get More?

1. Can you reach LDAP? → BloodHound collection first
2. Any Kerberoastable SPNs? → GetUserSPNs
3. AS-REP roastable users? → GetNPUsers
4. Can you read SYSVOL? → grep for passwords in scripts
5. LAPS enabled? → check ms-Mcs-AdmPwd attribute
6. None of the above? → password spray with seasons/year

15 decision trees across 13 files. Extracted from real machines.


Where decision trees come from

After completing a machine, I study expert writeups (0xdf's blog) and ask:

  • Where did the expert pivot? (the moment they knew what to try)
  • What dead ends did they avoid? (and how?)
  • What's the broader pattern? (not just this machine)

20 writeups extracted so far — each one makes the coach smarter.


The Two-Layer System


Technique files vs. Methodology

Technique Files Methodology.md
Purpose Reference catalog Battle-tested playbook
Size 190 techniques Curated subset
When used Claude looks up when stuck My quick-reference during a machine
Entry bar Any useful technique Must be proven in practice
Style Structured, grep-friendly Coaching notes, flag breakdowns

Flow: Learn technique → add to KB → prove it works → graduate to Methodology


The post-machine workflow

After every completed machine (both flags):

  1. Writeup — complete the narrative with branch points
  2. Technique files — add any NEW techniques learned
  3. Methodology.md — graduate battle-tested techniques
  4. tools-reference.md — add new tool breakdowns
  5. Evidence organization — screenshots, loot, creds

The coach does this review with me. It's part of the skill.


Architecture


System overview

┌─────────────────────────────────────────────────────┐
│                Claude Code CLI                      │
│                                                     │
│  /pentest-coach ──→ SKILL.md (coaching rules)       │
│                 ──→ tools-reference.md (tools)      │
│                 ──→ techniques/ (190 techniques)    │
│                 ──→ machine logs (cmd.log, out.log) │
│                 ──→ writeup, attack-chain, creds    │
└─────────────────────────────────────────────────────┘
        ↕                              ↕
   Terminal (tmux)              Obsidian (notes)
   auto-logging                 writeups + KB

The logging pipeline (Painful-Pentest-Logger)

Terminal input ──→ zsh preexec hook ──→ cmd.log (commands only)

Terminal output ──→ tmux pipe-pane ──→ Python filter ──→ out.log
                                      - strips ANSI codes
                                      - filters prompts
                                      - marks [CMD_START] boundaries
                                      - debounces autocomplete

Result: Clean, parseable logs that Claude can read directly.

Session name = machine folder: tmux new -s Puppy → logs to Puppy/logs/


Two AI tools — each doing what they're best at

Painful-Pentest-Logger ctf-pentest-coach
Built with Codex (Jan 5) Claude (Jan 14+)
Job Infrastructure plumbing Intelligence layer
What it does Capture everything, filter noise Coach, track, analyze, teach
Aware of the other? No — just writes logs Yes — reads PPL logs on startup

Codex: tmux hooks, Python filter, zsh preexec — raw plumbing Claude: coaching skill, credential tracking, attack chains, KB, methodology

And critically: Claude is both the builder AND the user — it reads those logs during every coaching session.


What's in the repo (public!)

github.com/[your-handle]/ctf-pentest-coach

├── skill/           # The coaching brain
│   ├── SKILL.md
│   └── tools-reference.md
├── logging/         # Terminal capture system
│   ├── ctf-tmux-prefix     (Python filter)
│   ├── zshrc.snippet        (command logger)
│   └── tmux.conf.snippet    (auto-start logging)
├── research/        # Knowledge base
│   └── techniques/  (21 files, 190 techniques)
├── install.sh       # One-command setup
└── README.md

Lessons learned


What surprised me

  1. Markdown > everything — JSON was over-engineered. Plain text wins.

  2. "If it fails" is the most valuable field — pentesting is mostly failure. Capture the pivots.

  3. Decision trees > technique lists — knowing WHAT ORDER to try things matters more than knowing things exist.

  4. The coach improves because I improve — the feedback loop is the product.


What I got wrong (at first)

  • Too much automation — SKILL.md grew to 595 lines with auto-suggestions, note generation, Python integrations. Cut to 203. Coaching works better when the user has to think.
  • Monolithic KB — one big JSON file doesn't scale. Topic-based markdown files do.
  • Speculative techniques — added things "just in case." Noise. Only add what you've used.
  • Skipping the post-machine review — the reflection is where 50% of my learning happens.

The unix2dos moment

Trying to exploit Backup Operators on a Windows box:

# Wrote a diskshadow script. Ran it. Silent failure. No error.
# 45 minutes of debugging. The problem?
# Windows needed CRLF line endings. Linux wrote LF.
# One command: unix2dos script.txt
# Everything worked.

That lesson is now in my KB, my Methodology, AND my and claude's memory. I will never lose those 45 minutes again.


Building your own


You don't need pentesting to use this pattern

The architecture works for any learning domain:

  • Competitive programming — technique files by algorithm type, decision trees for problem patterns
  • Web development — framework-specific KB, debugging decision trees
  • System design — pattern catalog, trade-off decision trees
  • Any skill with tools + decisions + failure modes

How to build your own (in a hackathon weekend)

Day 1: 1. Install Claude Code 2. Create ~/.claude/skills/your-coach/SKILL.md 3. Define: mission, coaching style, file layout, guardrails 4. Start using it on a real problem — take notes on what's missing

Day 2: 5. Add a knowledge base (start with 10 techniques, grow from there) 6. Add decision trees from your experience 7. Set up the feedback loop — every session improves the system


The key insight

An AI coach isn't about the AI knowing everything. It's about the AI knowing how to ask the right questions and remembering what you've learned together.

The skill file is small. The knowledge base grows. The decision trees capture how to think, not just what to know.


Get the code

Public repo: github.com/theo2612/ctf-pentest-coach

  • Full skill definition
  • Logging system (tmux + Python)
  • 190 techniques in 21 topic files
  • 15 decision trees
  • Install script (Linux / macOS)

Claude Code: claude.com/claude-code - Codex and Gemini?


Questions?

github.com/[your-handle]/ctf-pentest-coach