Duration: 60 minutes
Difficulty: Beginner–Intermediate
Prerequisites: AWS CLI configured (aws configure), Claude Code installed, GitHub personal access token, Node.js 18+
Deliverable: Completed diagnosis report, GitHub issue filed, permission mode journal
You will learn to use Claude Code as an AI-augmented DevOps co-pilot for real AWS infrastructure work — scanning EC2 instances, diagnosing CloudWatch alarms, auditing VPC and security groups, and correlating findings with recent GitHub commits.
The central skill is not prompting. It is permission mode discipline — knowing when to let Claude run freely, when to review before acting, and when to demand explicit confirmation before anything destructive happens.
Key principle: The verb tells you the tier.
describe / list / get→ FREE (run without asking)create / modify / restart→ ASK (review first)delete / terminate / purge→ DANGEROUS (type CONFIRMED)
This principle works identically across AWS CLI, kubectl, Terraform, Docker, and Git. You will internalize it through practice today.
By the end of this lab you will have:
- Connected AWS CLI and GitHub MCP server to Claude Code
- Used Plan Mode to audit your AWS environment before touching anything
- Used Default Mode to safely diagnose EC2, CloudWatch alarms, VPC, and security groups
- Used Accept Edits Mode to let Claude write remediation notes and a GitHub issue
- Practiced the CONFIRMED keyword for any DANGEROUS-tier action
- Filed a structured incident/findings report
Before starting any exercises, create this file in the directory where you run Claude Code. This is your standing brief — Claude reads it at session start.
cat > CLAUDE.md << 'EOF'
# CLAUDE.md — DevOps/SRE
Role: DevOps/SRE on AWS, GitHub. You are my co-pilot. I own every write decision.
## PERMISSION MODEL
### FREE — run without asking
Verbs: describe, get, list, show, status, logs, diff, output, validate,
explain, inspect, top, events, history, version, info, check,
verify, audit, whoami, cat, ls
### ASK — show command first, wait for my go-ahead
Verbs: create, apply, update, set, modify, patch, restart, scale, deploy,
push, commit, merge, sync, enable, disable, attach, detach, add,
install, build, run, tag, publish, reboot, start, stop
### DANGEROUS — full stop. State what will be affected. Wait for CONFIRMED
Verbs: delete, remove, destroy, terminate, purge, drop, rm, prune,
reset, force, wipe, revoke, flush, truncate
## HARD RULES
1. Dry-run first: add --dry-run to any mutating AWS CLI command
2. Show blast radius: if >1 resource affected, list all before proceeding
3. Rollback ready: pair every ASK/DANGEROUS step with its rollback command
4. Prod banner: if account/region is prod, prefix responses with PROD
5. No chaining: never chain ASK or DANGEROUS commands
6. No secrets: never commit credentials, tokens, or keys
## CONFIRMATION KEYWORDS
DANGEROUS tier needs me to type: CONFIRMED DELETE / CONFIRMED TERMINATE / CONFIRMED PURGE
## PLAN MODE
Trigger: I say "plan:" or /plan
Format:
1. [FREE] <cmd>
2. [ASK] <cmd> | rollback: <cmd>
3. [DANGEROUS] <cmd> | rollback: <cmd> | blast: <what dies>
## ENVIRONMENTS
prod → PROD banner + max caution
staging → ASK before DANGEROUS
dev → permissive, still flag DANGEROUS
## INCIDENT MODE
Trigger: INCIDENT or P0 — terse + fast, still gate CONFIRMED for DANGEROUS
EOFVerify it is there:
cat CLAUDE.md | head -5
# Should show: # CLAUDE.md — DevOps/SREBefore connecting Claude Code, confirm your AWS credentials work:
# Check identity
aws sts get-caller-identity
# Expected output:
# {
# "UserId": "AIDA...",
# "Account": "123456789012",
# "Arn": "arn:aws:iam::123456789012:user/yourname"
# }If this fails, run aws configure and enter your Access Key ID, Secret Access Key, and default region.
Check which region you are in:
aws configure get region
# Expected: us-east-1 (or your configured region)You need a GitHub personal access token with repo scope (read access is sufficient for this lab).
If you do not have one:
- Go to GitHub → Settings → Developer settings → Personal access tokens → Tokens (classic)
- Generate new token, name it
agentic-devops-lab - Check only
reposcope - Copy it — you will not see it again
Test it:
export GITHUB_TOKEN=ghp_YOUR_TOKEN_HERE
curl -H "Authorization: token $GITHUB_TOKEN" https://api.github.com/user | grep login
# Should show your GitHub usernamenpm install -g @modelcontextprotocol/server-githubVerify:
npx @modelcontextprotocol/server-github --help 2>&1 | head -3Create this file in the same directory as your CLAUDE.md:
cat > .mcp.json << 'EOF'
{
"mcpServers": {
"github": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-github"],
"env": {
"GITHUB_PERSONAL_ACCESS_TOKEN": "ghp_YOUR_TOKEN_HERE"
}
}
}
}
EOFReplace ghp_YOUR_TOKEN_HERE with your actual token.
Note: AWS CLI is already available as a shell tool in Claude Code — it does not need an MCP server. Claude Code calls it directly via
Bash. The GitHub MCP server adds structured read access to GitHub repos, issues, and commits.
claudeInside Claude Code, test both connections:
Test 1 — AWS: Run aws sts get-caller-identity and tell me my account ID and region.
Test 2 — GitHub: List the 3 most recent commits in the anthropics/anthropic-sdk-python repository.
Both should return real data. If AWS fails, check your credentials. If GitHub fails, check your token in .mcp.json.
Checkpoint: You should see your AWS account ID and GitHub commits before proceeding.
Plan Mode is your pre-flight checklist. Claude reads your infrastructure, maps what exists, identifies risks, and hands you a numbered plan. Nothing is changed.
In Claude Code press Shift+Tab once, or prefix your prompt with /plan.
You will see plan mode appear in the status bar.
Run this prompt in Plan Mode:
plan: Give me a complete audit of my AWS environment. I want to understand the current state before we do anything. Please:
1. List all running EC2 instances (name, instance ID, type, state, public IP if any)
2. List any stopped or terminated instances from the last 7 days
3. List all CloudWatch alarms and their current states (OK, ALARM, INSUFFICIENT_DATA)
4. List all VPCs and their CIDR blocks
5. List all security groups — flag any that have 0.0.0.0/0 inbound rules on sensitive ports (22, 3389, 5432, 3306)
6. List all S3 buckets — flag any with public access
For each finding, classify it as:
[OK] — looks healthy
[WARN] — worth investigating
[RISK] — needs attention
Do not change anything. Just read and report.
What Claude does: It runs a series of FREE-tier commands:
aws ec2 describe-instancesaws cloudwatch describe-alarmsaws ec2 describe-vpcsaws ec2 describe-security-groupsaws s3api list-buckets+get-public-access-block
All reads. Nothing changes.
What you observe: Notice that Claude runs all of these without asking permission. That is correct behavior — these are FREE-tier verbs.
Claude should return a structured audit with [OK], [WARN], and [RISK] tags. Record what it finds:
My environment audit notes:
- EC2 instances found: ___
- CloudWatch alarms in ALARM state: ___
- Security groups with 0.0.0.0/0 on sensitive ports: ___
- S3 buckets with public access: ___
Still in Plan Mode:
Based on the audit above, create a numbered remediation plan. For each finding marked [WARN] or [RISK]:
- Classify each step as [FREE], [ASK], or [DANGEROUS]
- Provide the exact AWS CLI command
- Provide the rollback command
- Estimate the blast radius
Do not execute anything yet. I will approve each step.
What you observe: Claude produces a plan with tagged steps. DANGEROUS-tier actions (like terminating an instance or deleting a security group rule) are explicitly flagged. You have not done anything yet.
Exit Plan Mode by pressing Shift+Tab again.
Default Mode is your everyday working mode. Claude asks before each ASK or DANGEROUS action. FREE actions run without interruption.
Press Shift+Tab until you see default in the status bar (no mode indicator = default).
Run this prompt:
I want to diagnose my EC2 instances. For each running instance:
1. Show me its launch time, uptime, and instance type
2. Check if it has a Name tag — flag any untagged instances
3. Check if it is in a public subnet (has public IP) — if so, verify the security group only allows necessary ports
4. Check if detailed monitoring is enabled
5. Show me the last 5 CloudWatch metric data points for CPUUtilization
Run the read commands freely. For anything that would change configuration, show me the command and ask first.
Watch how Claude behaves:
aws ec2 describe-instances→ runs immediately (FREE)aws cloudwatch get-metric-statistics→ runs immediately (FREE)- If it suggests enabling detailed monitoring: it should show the command and ask (ASK tier)
If Claude proposes enabling detailed monitoring, it should say something like:
I found that instance i-0abc123 does not have detailed monitoring enabled.
To enable it, I would run:
aws ec2 monitor-instances --instance-ids i-0abc123 --dry-run
Should I proceed? (This will enable 1-minute metric granularity, cost ~$3.50/month per instance)
This is correct behavior. Review the command, then say yes or no.
If your audit found stopped instances, investigate:
Show me the stopped instance i-XXXX (replace with actual ID from your audit).
Tell me:
1. When was it last stopped?
2. What is the reason (check instance status events)?
3. What is the cost implication of keeping a stopped instance vs terminating it?
Read only — do not start or terminate it yet.
Note what Claude finds. If it suggests starting or terminating the instance, it must ask you first (both are ASK/DANGEROUS tier verbs).
Now test that the gate works. Type this:
Terminate the stopped instance i-XXXX.
Claude should respond with something like:
⚠️ DANGEROUS action requested: terminate-instances
This will permanently terminate instance i-XXXX.
Instance: i-XXXX
Name: [tag name]
State: stopped
Blast radius: 1 instance, all data on ephemeral storage lost
Rollback: Not possible — termination is irreversible. If you need this instance again,
you would need to launch a new one from the same AMI.
Command I would run:
aws ec2 terminate-instances --instance-ids i-XXXX
To proceed, type: CONFIRMED TERMINATE
Do not type CONFIRMED TERMINATE unless you actually want to terminate that instance. The point is to observe that Claude stopped and described the blast radius before acting.
If you want to test the gate without real consequences, type cancel and Claude should stand down.
Show me all CloudWatch alarms that are currently in ALARM or INSUFFICIENT_DATA state.
For each one:
1. Alarm name and description
2. The metric it monitors (namespace, metric name, dimensions)
3. The threshold and comparison operator
4. How long it has been in this state
5. The last 10 data points for the underlying metric
This is read-only investigation. Run all of it freely.
What you observe: All of this is FREE-tier. No asks. Claude should gather all this data in one pass.
Pick one alarm from the output (or use a hypothetical CPUUtilization alarm if your account is clean):
I have a CPUUtilization alarm for instance i-XXXX. Help me diagnose it:
1. Show me the CPU metrics for this instance over the last 4 hours (5-minute intervals)
2. Check if there were any instance status check failures during that time
3. Check the instance's network metrics (NetworkIn, NetworkOut) over the same period
4. Check if there are any Systems Manager (SSM) run command invocations on this instance in the last 4 hours
Correlate the findings. Is the high CPU caused by network traffic, a scheduled job, or something else?
Read only.
Based on the CloudWatch investigation, propose a remediation for the alarm.
Give me options ranked by risk:
Option A: low-risk first step
Option B: medium-risk (requires ASK)
Option C: higher-risk (requires DANGEROUS confirmation)
For each option, show the exact command and rollback.
If Claude proposes something in the ASK tier (like restarting the instance), review the command before saying yes.
Run a deep audit of my VPCs and networking:
1. For each VPC, list:
- CIDR block
- Attached internet gateways (public VPC or private?)
- Number of subnets, separated by public vs private
- Route tables — any routes to 0.0.0.0/0?
2. For security groups, find and flag:
- Any group allowing SSH (port 22) from 0.0.0.0/0
- Any group allowing RDP (port 3389) from 0.0.0.0/0
- Any group allowing database ports (3306, 5432, 1433, 27017) from 0.0.0.0/0
- Any group with no inbound rules (orphaned?)
- Any group with no attached resources (attached to nothing)
3. For each flagged security group, show:
- Which instances or services are using it
- The specific offending rule
Read only. Flag everything but change nothing.
If you found a security group with 0.0.0.0/0 on port 22, practice the ASK workflow:
I want to restrict SSH access on security group sg-XXXX.
Instead of 0.0.0.0/0, I want to allow only my current IP address.
Show me:
1. What my current public IP is
2. The exact command to revoke the current rule
3. The exact command to add the new restricted rule
Do not run either command yet. Show me both and I will approve.
Claude should respond with the commands and ask for approval before running each one.
If it runs the revoke command, that is an ASK-tier action (modifying a security group rule). It should have asked. If it did not, remind it:
You ran that command without asking. Per our CLAUDE.md rules, modify commands are ASK tier — always show the command and ask first.
This is a real debugging exercise in permission discipline.
Now bring GitHub into the picture. You need a GitHub repository to inspect — use your own repo, your company's infrastructure repo, or any public repo you have access to.
I want to correlate recent infrastructure findings with code changes.
In GitHub, for the repository [YOUR_ORG/YOUR_REPO]:
1. Show me the 10 most recent commits
2. Filter for commits that touched infrastructure files (.tf, .yaml, .yml, Dockerfile, .json config files)
3. For each infrastructure commit, show:
- Commit hash (short)
- Author
- Timestamp
- Changed files
- A one-line summary of what changed
Then correlate: do any of these commit timestamps match the CloudWatch alarm start time or the EC2 event timestamps from our earlier investigation?
What you observe: Claude now reaches across two systems — AWS (via CLI) and GitHub (via MCP server) — and synthesizes an integrated answer. You did not have to switch tools, copy-paste timestamps, or manually compare logs.
Now switch to Accept Edits Mode by pressing Shift+Tab once from Default Mode. You will see accept edits in the status bar.
Accept Edits Mode means: Claude can write and edit files freely without asking, but shell commands still require confirmation.
Based on everything we found in this lab, write a GitHub issue to document the findings.
The issue should be in this repository: [YOUR_ORG/YOUR_REPO]
Title: "Infrastructure Audit Findings — [today's date]"
Body:
## Summary
[2-3 sentence summary of what we found]
## Findings
| Category | Finding | Severity | Action Required |
[fill from our audit]
## CloudWatch Alarms
[alarms found and their status]
## Security Group Risks
[any overly permissive rules found]
## Recommended Next Steps
[numbered list from the remediation plan]
## Audit Details
- Audited by: Claude Code (AI-augmented DevOps session)
- AWS Account: [account ID]
- Region: [region]
- Date: [today]
Write the issue body to a local file called findings-issue.md first.
Then ask me before posting it to GitHub.
Claude will:
- Write
findings-issue.mdwithout asking (file edit = Accept Edits mode, free) - Then ask before creating the GitHub issue (posting = ASK tier)
Review the file:
cat findings-issue.mdIf it looks good, tell Claude to proceed. If you want to edit it first, do so in your editor, then tell Claude to post.
When Claude posts the issue via GitHub MCP, verify it appeared in your repository.
Complete this template. Save it as permission-mode-journal.md.
# Permission Mode Journal — AWS AI-Augmented DevOps Lab
## Date: [today]
## AWS Account: [your account ID]
## Region: [your region]
---
## Part 2: Plan Mode Audit Findings
**Instances found:** ___
**Alarms in ALARM state:** ___
**Security groups with 0.0.0.0/0 risks:** ___
**S3 buckets with public access:** ___
**Did Plan Mode change anything in my account?**
[ ] Yes [ ] No (should be No — Plan Mode is read-only)
**What was the value of seeing the plan before executing?**
[write 1-2 sentences]
---
## Part 3: EC2 DANGEROUS Gate Test
**Which instance did I test the DANGEROUS gate on?** ___
**Did Claude stop and describe the blast radius?**
[ ] Yes [ ] No
**Did Claude require me to type CONFIRMED TERMINATE?**
[ ] Yes [ ] No
**What would have happened if I had typed CONFIRMED TERMINATE?**
[write 1 sentence]
---
## Part 4: CloudWatch Alarm Correlation
**Alarm investigated:** ___
**Root cause hypothesis:** ___
**Did AWS CLI reads run without asking?**
[ ] Yes [ ] No (should be Yes — describe/get/list are FREE)
---
## Part 5: Security Group Findings
**Any 0.0.0.0/0 rules found?**
[ ] Yes — on ports: ___
[ ] No — all rules are properly scoped
**Did Claude ask before proposing to modify a security group rule?**
[ ] Yes [ ] No (should be Yes — modify is ASK tier)
---
## Part 6: GitHub + AWS Correlation
**Did you find a code change that correlates with an infrastructure event?**
[ ] Yes [ ] No [ ] Inconclusive
**What was the experience of having one prompt reach AWS + GitHub?**
[write 1-2 sentences]
**Was the GitHub issue filed?**
[ ] Yes — URL: ___
[ ] No
---
## Overall Reflection
### Manual workflow (without Claude Code):
Steps I would have taken manually:
1. ___
2. ___
3. ___
Estimated time: ___ minutes
### With Claude Code + permission modes:
Time taken in this lab: ___ minutes
### The verb model in practice:
The most important thing I learned about FREE / ASK / DANGEROUS:
[write 2-3 sentences]
### One thing I would add to my CLAUDE.md based on today:
[write 1 sentence]Keep this visible during the lab.
| Mode | How to Enter | Claude's Behavior |
|---|---|---|
| Plan Mode | Shift+Tab (1st press) or /plan |
Reads freely, writes nothing, produces a plan |
| Default Mode | No modifier (startup default) | FREE runs freely, ASK requires confirmation |
| Accept Edits | Shift+Tab (2nd press) |
File writes are free, shell commands still ask |
| Don't Ask | Settings only | Denies all unless pre-approved |
| Tier | Verbs | Behavior |
|---|---|---|
| FREE | describe, get, list, show, status, logs, diff, inspect | Runs immediately |
| ASK | create, apply, update, modify, restart, scale, start, stop | Shows command, waits |
| DANGEROUS | delete, terminate, destroy, purge, rm, reset, force | Stops. Describes blast radius. Waits for CONFIRMED |
| Command | Tier | Why |
|---|---|---|
aws ec2 describe-instances |
FREE | describe = read |
aws cloudwatch describe-alarms |
FREE | describe = read |
aws ec2 stop-instances |
ASK | stop = mutation |
aws ec2 modify-instance-attribute |
ASK | modify = mutation |
aws ec2 authorize-security-group-ingress |
ASK | authorize = mutation |
aws ec2 terminate-instances |
DANGEROUS | terminate = irreversible |
aws ec2 revoke-security-group-ingress |
DANGEROUS | revoke = potentially locks you out |
aws rds delete-db-instance |
DANGEROUS | delete = data loss |
aws s3 rm --recursive |
DANGEROUS | rm = data loss |
| Keyword | Use for |
|---|---|
CONFIRMED TERMINATE |
EC2 instance termination |
CONFIRMED DELETE |
Any delete operation |
CONFIRMED DESTROY |
Terraform destroy, stack deletion |
CONFIRMED PURGE |
Bulk deletions, data wipes |
CONFIRMED PUBLIC |
Making S3 buckets or resources public |
aws: command not found
# Install AWS CLI v2
curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
unzip awscliv2.zip
sudo ./aws/installUnable to locate credentials
aws configure
# Enter: Access Key ID, Secret Access Key, Region, Output format (json)An error occurred (AccessDenied)
Your IAM user lacks the required permission. For this lab, you need read access to EC2, CloudWatch, VPC, and S3. Ask your AWS account admin for ReadOnlyAccess policy.
GitHub MCP server fails to start
npm install -g @modelcontextprotocol/server-github
npx @modelcontextprotocol/server-github --helpBad credentials from GitHub API
Your token has expired or lacks repo scope. Generate a new token at GitHub → Settings → Developer settings → Personal access tokens.
GitHub rate limiting The free GitHub API allows 5,000 requests/hour with a token. You are unlikely to hit this in a single lab session.
MCP server not connecting
- Validate your
.mcp.jsonsyntax:
cat .mcp.json | python3 -m json.tool
# Should print formatted JSON with no errors- Restart Claude Code completely
- Run a simple test prompt
Claude Code ignoring CLAUDE.md rules
Verify the file is in the same directory where you launched claude:
ls -la CLAUDE.mdIf it is in the wrong directory, move it and restart Claude Code.
Claude runs a DANGEROUS command without asking This should not happen. If it does:
- Type
undo that immediately— Claude should attempt to reverse it - After the session, add a more explicit rule to your CLAUDE.md
- Report the behavior using the thumbs-down button in Claude Code
Claude keeps asking for permission on FREE-tier reads This means your CLAUDE.md verb list needs adjusting. Add the specific verb that is being over-blocked to the FREE section.
Once you have completed the core lab, try these extensions:
If you have a Terraform-managed infrastructure:
plan: Scan my Terraform state file. What resources does it manage?
List all resources and their types. Do not run any apply or plan that makes changes.
Then practice terraform plan (FREE) vs terraform apply (ASK) vs terraform destroy (DANGEROUS).
Run the same EC2 and security group audit across us-east-1, us-west-2, and eu-west-1.
Aggregate the findings. Are there any resources in unexpected regions?
Using the AWS Cost Explorer API, show me:
1. My top 5 most expensive services this month
2. Any resources that have been running for more than 30 days with zero traffic
3. Recommendations for cost savings based on current utilization
Read only.
Based on everything we found in today's lab, write me a runbook in Markdown format.
The runbook should document:
1. How to run this audit again in 30 days
2. The exact prompts to use (in Plan Mode and Default Mode)
3. The decision tree for when to use CONFIRMED
4. The escalation path for DANGEROUS findings
Save it as aws-audit-runbook.md.
You have experienced AI-augmented DevOps through three lenses:
Plan Mode — the pre-flight checklist. Read everything, change nothing, produce a structured plan. Use this at the start of any complex session.
Default Mode — everyday working mode. FREE-tier reads happen without friction. ASK-tier mutations require your review. DANGEROUS-tier actions are fully gated.
Accept Edits Mode — write freely, shell commands still ask. Use this when you trust Claude's direction and want to speed up file generation (runbooks, reports, GitHub issues) without removing the gate on shell execution.
The verb model is the portable mental model. describe = safe everywhere. delete = dangerous everywhere. It does not matter whether you are on AWS CLI, kubectl, Terraform, or Git. The verb tells you the tier.
In your next session, bring this CLAUDE.md into any AWS or infrastructure context. The rules apply immediately, with zero reconfiguration.