Lab: AI-Augmented DevOps with AWS, GitHub MCP & Claude Permission Modes

Duration: 60 minutes Difficulty: Beginner–Intermediate Prerequisites: AWS CLI configured (aws configure), Claude Code installed, GitHub personal access token, Node.js 18+ Deliverable: Completed diagnosis report, GitHub issue filed, permission mode journal

Lab Objective

You will learn to use Claude Code as an AI-augmented DevOps co-pilot for real AWS infrastructure work — scanning EC2 instances, diagnosing CloudWatch alarms, auditing VPC and security groups, and correlating findings with recent GitHub commits.

The central skill is not prompting. It is permission mode discipline — knowing when to let Claude run freely, when to review before acting, and when to demand explicit confirmation before anything destructive happens.

Key principle: The verb tells you the tier.

describe / list / get → FREE (run without asking)
create / modify / restart → ASK (review first)
delete / terminate / purge → DANGEROUS (type CONFIRMED)

This principle works identically across AWS CLI, kubectl, Terraform, Docker, and Git. You will internalize it through practice today.

What You Will Build

By the end of this lab you will have:

Connected AWS CLI and GitHub MCP server to Claude Code
Used Plan Mode to audit your AWS environment before touching anything
Used Default Mode to safely diagnose EC2, CloudWatch alarms, VPC, and security groups
Used Accept Edits Mode to let Claude write remediation notes and a GitHub issue
Practiced the CONFIRMED keyword for any DANGEROUS-tier action
Filed a structured incident/findings report

Your CLAUDE.md (set this up first)

Before starting any exercises, create this file in the directory where you run Claude Code. This is your standing brief — Claude reads it at session start.

cat > CLAUDE.md << 'EOF'
# CLAUDE.md — DevOps/SRE

Role: DevOps/SRE on AWS, GitHub. You are my co-pilot. I own every write decision.

## PERMISSION MODEL

### FREE — run without asking
Verbs: describe, get, list, show, status, logs, diff, output, validate,
       explain, inspect, top, events, history, version, info, check,
       verify, audit, whoami, cat, ls

### ASK — show command first, wait for my go-ahead
Verbs: create, apply, update, set, modify, patch, restart, scale, deploy,
       push, commit, merge, sync, enable, disable, attach, detach, add,
       install, build, run, tag, publish, reboot, start, stop

### DANGEROUS — full stop. State what will be affected. Wait for CONFIRMED
Verbs: delete, remove, destroy, terminate, purge, drop, rm, prune,
       reset, force, wipe, revoke, flush, truncate

## HARD RULES
1. Dry-run first: add --dry-run to any mutating AWS CLI command
2. Show blast radius: if >1 resource affected, list all before proceeding
3. Rollback ready: pair every ASK/DANGEROUS step with its rollback command
4. Prod banner: if account/region is prod, prefix responses with PROD
5. No chaining: never chain ASK or DANGEROUS commands
6. No secrets: never commit credentials, tokens, or keys

## CONFIRMATION KEYWORDS
DANGEROUS tier needs me to type: CONFIRMED DELETE / CONFIRMED TERMINATE / CONFIRMED PURGE

## PLAN MODE
Trigger: I say "plan:" or /plan
Format:
  1. [FREE]      <cmd>
  2. [ASK]       <cmd>  | rollback: <cmd>
  3. [DANGEROUS] <cmd>  | rollback: <cmd>  | blast: <what dies>

## ENVIRONMENTS
prod    → PROD banner + max caution
staging → ASK before DANGEROUS
dev     → permissive, still flag DANGEROUS

## INCIDENT MODE
Trigger: INCIDENT or P0 — terse + fast, still gate CONFIRMED for DANGEROUS
EOF

Verify it is there:

cat CLAUDE.md | head -5
# Should show: # CLAUDE.md — DevOps/SRE

Part 1: Connect Your Tools (10 minutes)

Step 1.1: Verify AWS CLI Access

Before connecting Claude Code, confirm your AWS credentials work:

# Check identity
aws sts get-caller-identity

# Expected output:
# {
#     "UserId": "AIDA...",
#     "Account": "123456789012",
#     "Arn": "arn:aws:iam::123456789012:user/yourname"
# }

If this fails, run aws configure and enter your Access Key ID, Secret Access Key, and default region.

Check which region you are in:

aws configure get region
# Expected: us-east-1 (or your configured region)

Step 1.2: Verify GitHub Token

You need a GitHub personal access token with repo scope (read access is sufficient for this lab).

If you do not have one:

Go to GitHub → Settings → Developer settings → Personal access tokens → Tokens (classic)
Generate new token, name it agentic-devops-lab
Check only repo scope
Copy it — you will not see it again

Test it:

export GITHUB_TOKEN=ghp_YOUR_TOKEN_HERE
curl -H "Authorization: token $GITHUB_TOKEN" https://api.github.com/user | grep login
# Should show your GitHub username

Step 1.3: Install GitHub MCP Server

npm install -g @modelcontextprotocol/server-github

Verify:

npx @modelcontextprotocol/server-github --help 2>&1 | head -3

Step 1.4: Configure .mcp.json

Create this file in the same directory as your CLAUDE.md:

cat > .mcp.json << 'EOF'
{
  "mcpServers": {
    "github": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-github"],
      "env": {
        "GITHUB_PERSONAL_ACCESS_TOKEN": "ghp_YOUR_TOKEN_HERE"
      }
    }
  }
}
EOF

Replace ghp_YOUR_TOKEN_HERE with your actual token.

Note: AWS CLI is already available as a shell tool in Claude Code — it does not need an MCP server. Claude Code calls it directly via Bash. The GitHub MCP server adds structured read access to GitHub repos, issues, and commits.

Step 1.5: Start Claude Code and Verify

claude

Inside Claude Code, test both connections:

Test 1 — AWS: Run aws sts get-caller-identity and tell me my account ID and region.

Test 2 — GitHub: List the 3 most recent commits in the anthropics/anthropic-sdk-python repository.

Both should return real data. If AWS fails, check your credentials. If GitHub fails, check your token in .mcp.json.

Checkpoint: You should see your AWS account ID and GitHub commits before proceeding.

Part 2: Plan Mode — Audit Before You Touch (10 minutes)

Plan Mode is your pre-flight checklist. Claude reads your infrastructure, maps what exists, identifies risks, and hands you a numbered plan. Nothing is changed.

Step 2.1: Enter Plan Mode

In Claude Code press Shift+Tab once, or prefix your prompt with /plan.

You will see plan mode appear in the status bar.

Step 2.2: Full Environment Audit

Run this prompt in Plan Mode:

plan: Give me a complete audit of my AWS environment. I want to understand the current state before we do anything. Please:

1. List all running EC2 instances (name, instance ID, type, state, public IP if any)
2. List any stopped or terminated instances from the last 7 days
3. List all CloudWatch alarms and their current states (OK, ALARM, INSUFFICIENT_DATA)
4. List all VPCs and their CIDR blocks
5. List all security groups — flag any that have 0.0.0.0/0 inbound rules on sensitive ports (22, 3389, 5432, 3306)
6. List all S3 buckets — flag any with public access

For each finding, classify it as:
  [OK] — looks healthy
  [WARN] — worth investigating
  [RISK] — needs attention

Do not change anything. Just read and report.

What Claude does: It runs a series of FREE-tier commands:

aws ec2 describe-instances
aws cloudwatch describe-alarms
aws ec2 describe-vpcs
aws ec2 describe-security-groups
aws s3api list-buckets + get-public-access-block

All reads. Nothing changes.

What you observe: Notice that Claude runs all of these without asking permission. That is correct behavior — these are FREE-tier verbs.

Step 2.3: Read the Plan Output

Claude should return a structured audit with [OK], [WARN], and [RISK] tags. Record what it finds:

My environment audit notes:
- EC2 instances found: ___
- CloudWatch alarms in ALARM state: ___
- Security groups with 0.0.0.0/0 on sensitive ports: ___
- S3 buckets with public access: ___

Step 2.4: Request a Remediation Plan

Still in Plan Mode:

Based on the audit above, create a numbered remediation plan. For each finding marked [WARN] or [RISK]:
  - Classify each step as [FREE], [ASK], or [DANGEROUS]
  - Provide the exact AWS CLI command
  - Provide the rollback command
  - Estimate the blast radius

Do not execute anything yet. I will approve each step.

What you observe: Claude produces a plan with tagged steps. DANGEROUS-tier actions (like terminating an instance or deleting a security group rule) are explicitly flagged. You have not done anything yet.

Exit Plan Mode by pressing Shift+Tab again.

Part 3: Default Mode — EC2 Diagnosis (12 minutes)

Default Mode is your everyday working mode. Claude asks before each ASK or DANGEROUS action. FREE actions run without interruption.

Step 3.1: Confirm Default Mode

Press Shift+Tab until you see default in the status bar (no mode indicator = default).

Step 3.2: Diagnose EC2 Instances

Run this prompt:

I want to diagnose my EC2 instances. For each running instance:

1. Show me its launch time, uptime, and instance type
2. Check if it has a Name tag — flag any untagged instances
3. Check if it is in a public subnet (has public IP) — if so, verify the security group only allows necessary ports
4. Check if detailed monitoring is enabled
5. Show me the last 5 CloudWatch metric data points for CPUUtilization

Run the read commands freely. For anything that would change configuration, show me the command and ask first.

Watch how Claude behaves:

aws ec2 describe-instances → runs immediately (FREE)
aws cloudwatch get-metric-statistics → runs immediately (FREE)
If it suggests enabling detailed monitoring: it should show the command and ask (ASK tier)

If Claude proposes enabling detailed monitoring, it should say something like:

I found that instance i-0abc123 does not have detailed monitoring enabled.
To enable it, I would run:

  aws ec2 monitor-instances --instance-ids i-0abc123 --dry-run

Should I proceed? (This will enable 1-minute metric granularity, cost ~$3.50/month per instance)

This is correct behavior. Review the command, then say yes or no.

Step 3.3: Check a Stopped Instance

If your audit found stopped instances, investigate:

Show me the stopped instance i-XXXX (replace with actual ID from your audit).
Tell me:
1. When was it last stopped?
2. What is the reason (check instance status events)?
3. What is the cost implication of keeping a stopped instance vs terminating it?

Read only — do not start or terminate it yet.

Note what Claude finds. If it suggests starting or terminating the instance, it must ask you first (both are ASK/DANGEROUS tier verbs).

Step 3.4: Practice the DANGEROUS Gate

Now test that the gate works. Type this:

Terminate the stopped instance i-XXXX.

Claude should respond with something like:

⚠️  DANGEROUS action requested: terminate-instances

This will permanently terminate instance i-XXXX.
  Instance: i-XXXX
  Name: [tag name]
  State: stopped
  Blast radius: 1 instance, all data on ephemeral storage lost

Rollback: Not possible — termination is irreversible. If you need this instance again,
you would need to launch a new one from the same AMI.

Command I would run:
  aws ec2 terminate-instances --instance-ids i-XXXX

To proceed, type: CONFIRMED TERMINATE

Do not type CONFIRMED TERMINATE unless you actually want to terminate that instance. The point is to observe that Claude stopped and described the blast radius before acting.

If you want to test the gate without real consequences, type cancel and Claude should stand down.

Part 4: Default Mode — CloudWatch Alarm Diagnosis (10 minutes)

Step 4.1: List Alarms in Detail

Show me all CloudWatch alarms that are currently in ALARM or INSUFFICIENT_DATA state.
For each one:
1. Alarm name and description
2. The metric it monitors (namespace, metric name, dimensions)
3. The threshold and comparison operator
4. How long it has been in this state
5. The last 10 data points for the underlying metric

This is read-only investigation. Run all of it freely.

What you observe: All of this is FREE-tier. No asks. Claude should gather all this data in one pass.

Step 4.2: Correlate an Alarm with EC2

Pick one alarm from the output (or use a hypothetical CPUUtilization alarm if your account is clean):

I have a CPUUtilization alarm for instance i-XXXX. Help me diagnose it:

1. Show me the CPU metrics for this instance over the last 4 hours (5-minute intervals)
2. Check if there were any instance status check failures during that time
3. Check the instance's network metrics (NetworkIn, NetworkOut) over the same period
4. Check if there are any Systems Manager (SSM) run command invocations on this instance in the last 4 hours

Correlate the findings. Is the high CPU caused by network traffic, a scheduled job, or something else?

Read only.

Step 4.3: Propose and Review a Remediation

Based on the CloudWatch investigation, propose a remediation for the alarm.
Give me options ranked by risk:
  Option A: low-risk first step
  Option B: medium-risk (requires ASK)
  Option C: higher-risk (requires DANGEROUS confirmation)

For each option, show the exact command and rollback.

If Claude proposes something in the ASK tier (like restarting the instance), review the command before saying yes.

Part 5: Default Mode — VPC and Security Group Audit (8 minutes)

Step 5.1: Deep VPC Audit

Run a deep audit of my VPCs and networking:

1. For each VPC, list:
   - CIDR block
   - Attached internet gateways (public VPC or private?)
   - Number of subnets, separated by public vs private
   - Route tables — any routes to 0.0.0.0/0?

2. For security groups, find and flag:
   - Any group allowing SSH (port 22) from 0.0.0.0/0
   - Any group allowing RDP (port 3389) from 0.0.0.0/0
   - Any group allowing database ports (3306, 5432, 1433, 27017) from 0.0.0.0/0
   - Any group with no inbound rules (orphaned?)
   - Any group with no attached resources (attached to nothing)

3. For each flagged security group, show:
   - Which instances or services are using it
   - The specific offending rule

Read only. Flag everything but change nothing.

Step 5.2: Fix an Overly Permissive Rule

If you found a security group with 0.0.0.0/0 on port 22, practice the ASK workflow:

I want to restrict SSH access on security group sg-XXXX.
Instead of 0.0.0.0/0, I want to allow only my current IP address.

Show me:
1. What my current public IP is
2. The exact command to revoke the current rule
3. The exact command to add the new restricted rule

Do not run either command yet. Show me both and I will approve.

Claude should respond with the commands and ask for approval before running each one.

If it runs the revoke command, that is an ASK-tier action (modifying a security group rule). It should have asked. If it did not, remind it:

You ran that command without asking. Per our CLAUDE.md rules, modify commands are ASK tier — always show the command and ask first.

This is a real debugging exercise in permission discipline.

Part 6: GitHub MCP — Correlate with Code (8 minutes)

Step 6.1: Connect Code Changes to Infrastructure Events

Now bring GitHub into the picture. You need a GitHub repository to inspect — use your own repo, your company's infrastructure repo, or any public repo you have access to.

I want to correlate recent infrastructure findings with code changes.

In GitHub, for the repository [YOUR_ORG/YOUR_REPO]:
1. Show me the 10 most recent commits
2. Filter for commits that touched infrastructure files (.tf, .yaml, .yml, Dockerfile, .json config files)
3. For each infrastructure commit, show:
   - Commit hash (short)
   - Author
   - Timestamp
   - Changed files
   - A one-line summary of what changed

Then correlate: do any of these commit timestamps match the CloudWatch alarm start time or the EC2 event timestamps from our earlier investigation?

What you observe: Claude now reaches across two systems — AWS (via CLI) and GitHub (via MCP server) — and synthesizes an integrated answer. You did not have to switch tools, copy-paste timestamps, or manually compare logs.

Step 6.2: File a GitHub Issue with Findings

Now switch to Accept Edits Mode by pressing Shift+Tab once from Default Mode. You will see accept edits in the status bar.

Accept Edits Mode means: Claude can write and edit files freely without asking, but shell commands still require confirmation.

Based on everything we found in this lab, write a GitHub issue to document the findings.
The issue should be in this repository: [YOUR_ORG/YOUR_REPO]

Title: "Infrastructure Audit Findings — [today's date]"

Body:
  ## Summary
  [2-3 sentence summary of what we found]

  ## Findings
  | Category | Finding | Severity | Action Required |
  [fill from our audit]

  ## CloudWatch Alarms
  [alarms found and their status]

  ## Security Group Risks
  [any overly permissive rules found]

  ## Recommended Next Steps
  [numbered list from the remediation plan]

  ## Audit Details
  - Audited by: Claude Code (AI-augmented DevOps session)
  - AWS Account: [account ID]
  - Region: [region]
  - Date: [today]

Write the issue body to a local file called findings-issue.md first.
Then ask me before posting it to GitHub.

Claude will:

Write findings-issue.md without asking (file edit = Accept Edits mode, free)
Then ask before creating the GitHub issue (posting = ASK tier)

Review the file:

cat findings-issue.md

If it looks good, tell Claude to proceed. If you want to edit it first, do so in your editor, then tell Claude to post.

When Claude posts the issue via GitHub MCP, verify it appeared in your repository.

Part 7: Reflection — Your Permission Mode Journal (5 minutes)

Complete this template. Save it as permission-mode-journal.md.

# Permission Mode Journal — AWS AI-Augmented DevOps Lab

## Date: [today]
## AWS Account: [your account ID]
## Region: [your region]

---

## Part 2: Plan Mode Audit Findings

**Instances found:** ___
**Alarms in ALARM state:** ___
**Security groups with 0.0.0.0/0 risks:** ___
**S3 buckets with public access:** ___

**Did Plan Mode change anything in my account?**
[ ] Yes  [ ] No  (should be No — Plan Mode is read-only)

**What was the value of seeing the plan before executing?**
[write 1-2 sentences]

---

## Part 3: EC2 DANGEROUS Gate Test

**Which instance did I test the DANGEROUS gate on?** ___
**Did Claude stop and describe the blast radius?**
[ ] Yes  [ ] No

**Did Claude require me to type CONFIRMED TERMINATE?**
[ ] Yes  [ ] No

**What would have happened if I had typed CONFIRMED TERMINATE?**
[write 1 sentence]

---

## Part 4: CloudWatch Alarm Correlation

**Alarm investigated:** ___
**Root cause hypothesis:** ___
**Did AWS CLI reads run without asking?**
[ ] Yes  [ ] No  (should be Yes — describe/get/list are FREE)

---

## Part 5: Security Group Findings

**Any 0.0.0.0/0 rules found?**
[ ] Yes — on ports: ___
[ ] No — all rules are properly scoped

**Did Claude ask before proposing to modify a security group rule?**
[ ] Yes  [ ] No  (should be Yes — modify is ASK tier)

---

## Part 6: GitHub + AWS Correlation

**Did you find a code change that correlates with an infrastructure event?**
[ ] Yes  [ ] No  [ ] Inconclusive

**What was the experience of having one prompt reach AWS + GitHub?**
[write 1-2 sentences]

**Was the GitHub issue filed?**
[ ] Yes — URL: ___
[ ] No

---

## Overall Reflection

### Manual workflow (without Claude Code):
Steps I would have taken manually:
1. ___
2. ___
3. ___

Estimated time: ___ minutes

### With Claude Code + permission modes:
Time taken in this lab: ___ minutes

### The verb model in practice:
The most important thing I learned about FREE / ASK / DANGEROUS:
[write 2-3 sentences]

### One thing I would add to my CLAUDE.md based on today:
[write 1 sentence]

Quick Reference Card

Keep this visible during the lab.

Permission Mode Cheat Sheet

Mode	How to Enter	Claude's Behavior
Plan Mode	`Shift+Tab` (1st press) or `/plan`	Reads freely, writes nothing, produces a plan
Default Mode	No modifier (startup default)	FREE runs freely, ASK requires confirmation
Accept Edits	`Shift+Tab` (2nd press)	File writes are free, shell commands still ask
Don't Ask	Settings only	Denies all unless pre-approved

The Verb Tiers

Tier	Verbs	Behavior
FREE	describe, get, list, show, status, logs, diff, inspect	Runs immediately
ASK	create, apply, update, modify, restart, scale, start, stop	Shows command, waits
DANGEROUS	delete, terminate, destroy, purge, rm, reset, force	Stops. Describes blast radius. Waits for CONFIRMED

AWS CLI Tier Examples

Command	Tier	Why
`aws ec2 describe-instances`	FREE	describe = read
`aws cloudwatch describe-alarms`	FREE	describe = read
`aws ec2 stop-instances`	ASK	stop = mutation
`aws ec2 modify-instance-attribute`	ASK	modify = mutation
`aws ec2 authorize-security-group-ingress`	ASK	authorize = mutation
`aws ec2 terminate-instances`	DANGEROUS	terminate = irreversible
`aws ec2 revoke-security-group-ingress`	DANGEROUS	revoke = potentially locks you out
`aws rds delete-db-instance`	DANGEROUS	delete = data loss
`aws s3 rm --recursive`	DANGEROUS	rm = data loss

CONFIRMED Keywords

Keyword	Use for
`CONFIRMED TERMINATE`	EC2 instance termination
`CONFIRMED DELETE`	Any delete operation
`CONFIRMED DESTROY`	Terraform destroy, stack deletion
`CONFIRMED PURGE`	Bulk deletions, data wipes
`CONFIRMED PUBLIC`	Making S3 buckets or resources public

Appendix A: Troubleshooting

AWS CLI Issues

aws: command not found

# Install AWS CLI v2
curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
unzip awscliv2.zip
sudo ./aws/install

Unable to locate credentials

aws configure
# Enter: Access Key ID, Secret Access Key, Region, Output format (json)

An error occurred (AccessDenied) Your IAM user lacks the required permission. For this lab, you need read access to EC2, CloudWatch, VPC, and S3. Ask your AWS account admin for ReadOnlyAccess policy.

GitHub MCP Issues

GitHub MCP server fails to start

npm install -g @modelcontextprotocol/server-github
npx @modelcontextprotocol/server-github --help

Bad credentials from GitHub API Your token has expired or lacks repo scope. Generate a new token at GitHub → Settings → Developer settings → Personal access tokens.

GitHub rate limiting The free GitHub API allows 5,000 requests/hour with a token. You are unlikely to hit this in a single lab session.

Claude Code MCP Issues

MCP server not connecting

Validate your .mcp.json syntax:

cat .mcp.json | python3 -m json.tool
# Should print formatted JSON with no errors

Restart Claude Code completely
Run a simple test prompt

Claude Code ignoring CLAUDE.md rules Verify the file is in the same directory where you launched claude:

ls -la CLAUDE.md

If it is in the wrong directory, move it and restart Claude Code.

Permission Mode Issues

Claude runs a DANGEROUS command without asking This should not happen. If it does:

Type undo that immediately — Claude should attempt to reverse it
After the session, add a more explicit rule to your CLAUDE.md
Report the behavior using the thumbs-down button in Claude Code

Claude keeps asking for permission on FREE-tier reads This means your CLAUDE.md verb list needs adjusting. Add the specific verb that is being over-blocked to the FREE section.

Appendix B: Extending the Lab

Once you have completed the core lab, try these extensions:

Extension 1: Add Terraform

If you have a Terraform-managed infrastructure:

plan: Scan my Terraform state file. What resources does it manage?
List all resources and their types. Do not run any apply or plan that makes changes.

Then practice terraform plan (FREE) vs terraform apply (ASK) vs terraform destroy (DANGEROUS).

Extension 2: Multi-Region Audit

Run the same EC2 and security group audit across us-east-1, us-west-2, and eu-west-1.
Aggregate the findings. Are there any resources in unexpected regions?

Extension 3: Cost Audit

Using the AWS Cost Explorer API, show me:
1. My top 5 most expensive services this month
2. Any resources that have been running for more than 30 days with zero traffic
3. Recommendations for cost savings based on current utilization

Read only.

Extension 4: Auto-generate a Runbook

Based on everything we found in today's lab, write me a runbook in Markdown format.
The runbook should document:
1. How to run this audit again in 30 days
2. The exact prompts to use (in Plan Mode and Default Mode)
3. The decision tree for when to use CONFIRMED
4. The escalation path for DANGEROUS findings

Save it as aws-audit-runbook.md.

Summary

You have experienced AI-augmented DevOps through three lenses:

Plan Mode — the pre-flight checklist. Read everything, change nothing, produce a structured plan. Use this at the start of any complex session.

Default Mode — everyday working mode. FREE-tier reads happen without friction. ASK-tier mutations require your review. DANGEROUS-tier actions are fully gated.

Accept Edits Mode — write freely, shell commands still ask. Use this when you trust Claude's direction and want to speed up file generation (runbooks, reports, GitHub issues) without removing the gate on shell execution.

The verb model is the portable mental model. describe = safe everywhere. delete = dangerous everywhere. It does not matter whether you are on AWS CLI, kubectl, Terraform, or Git. The verb tells you the tier.

In your next session, bring this CLAUDE.md into any AWS or infrastructure context. The rules apply immediately, with zero reconfiguration.

initcron/lab_ai_augmented_devops.md