Use this prompt when investigating failing Terraform Cloud (TFC) workspaces related to EKS cluster provisioning using Takeda Building Blocks. This methodology was developed from successfully resolving TFC workspace failures involving VPC CNI resource adoption issues.
- Access to Terraform Cloud API
- GitHub CLI tools available
- Git repository access
- Terraform CLI installed
Step 1: Retrieve TFC Workspace Run Details
Using TFC API, fetch the failing run details from the provided workspace and run ID. Extract:
- Run status and error messages
- Plan/apply logs
- Workspace configuration
- Module versions being used
Step 2: Extract Complete Apply Logs
Retrieve the full apply logs from the TFC run, focusing on:
- Error messages containing "does not exist"
- Resource adoption failures
- Kubernetes provider errors
- Module-specific failures
Step 3: Identify Root Error Patterns
Analyze logs for patterns like:
- "The resource 'aws-node' does not exist"
- "The resource 'amazon-vpc-cni' does not exist"
- Resource import/adoption failures
- Version compatibility issues
Step 4: Determine Building Block Versions
From the TFC logs, identify:
- Terraform module versions in use
- Building block version (e.g., EKSClusterResources v4.1.2)
- Provider versions (AWS, Kubernetes, Helm)
- Any recent version upgrades
Step 5: Clone Building Block Repository
Clone the relevant building block repository:
git clone https://github.com/oneTakeda/terraform-aws-EKSClusterResources.git
Navigate to the specific version being used in the failing workspace
Step 6: Examine Building Block Variables
Review variables.tf in the building block to understand:
- Available configuration options
- Default values for key parameters
- Recent changes in variable defaults
- Resource adoption settings
Step 7: Analyze Version History
Check git history and releases to identify:
- Changes in default values between versions
- Breaking changes or behavioral modifications
- Migration guides or upgrade notes
- Backward compatibility issues
Step 8: Research Resource Adoption Behavior
Investigate how the building block handles:
- Existing EKS-created resources
- Resource adoption vs. creation
- Helm chart management
- Kubernetes resource management
Step 9: Clone Target Infrastructure Repository
Clone the failing infrastructure repository:
git clone [infrastructure-repo-url]
cd [repo-name]
git checkout [appropriate-branch]
Step 10: Examine Current Configuration
Review the module configuration in main.tf:
- Building block version being used
- Parameters passed to the module
- Missing or default parameters
- Resource adoption settings
Step 11: Check Variable Definitions
Review parameters.auto.tfvars and variables.tf:
- Shared tags configuration
- Environment-specific settings
- Cluster configuration
- Any version-specific requirements
Step 12: Validate Configuration Structure
Ensure proper configuration format:
- Module source and version
- Required parameters present
- Terraform backend configuration
- Provider versions compatibility
Step 13: Correlate Building Block Changes with Failures
Compare:
- Building block default changes between versions
- Current configuration parameters
- Missing parameters that could resolve the issue
- Version upgrade impact on existing resources
Step 14: Develop Targeted Fix
Based on analysis, implement fix such as:
- Adding resource_adoption = false parameter
- Updating configuration for new version requirements
- Adjusting provider settings
- Modifying resource management approach
Step 15: Local Validation
Perform local Terraform workflow:
- terraform init (validate module download)
- terraform validate (check syntax)
- terraform plan (verify fix effectiveness)
- Ensure original errors are resolved
Step 16: Handle Local Testing Limitations
Address local vs. remote workspace differences:
- Modify locals.tf for local testing if needed
- Handle workspace name parsing issues
- Ensure validation works in both contexts
- Document temporary modifications
Step 17: Create KEDB Issue
Create comprehensive GitHub issue in terraform-Takeda-KEDB with:
- Clear problem description and symptoms
- Root cause explanation
- Step-by-step resolution
- Code examples and configuration changes
- Version compatibility notes
Step 18: Document Fix Validation
Add detailed validation results to KEDB issue:
- Terraform workflow execution results
- Before/after error comparison
- Plan output confirmation
- Link to original failing TFC run
Step 19: Commit and Prepare Deployment
Commit changes with descriptive message:
- Reference KEDB issue
- Explain root cause briefly
- Note TFC run being fixed
- Stage for deployment (if permissions allow)
Step 20: Create Reusable Methodology
Document the troubleshooting process:
- Create step-by-step guide
- Include common patterns and solutions
- Share methodology for similar issues
- Update team knowledge base
After following these steps, you should have:
- ✅ Identified the root cause of the TFC workspace failure
- ✅ Developed and validated a targeted fix
- ✅ Documented the issue and resolution in KEDB
- ✅ Created a deployable solution
- ✅ Established a reusable troubleshooting methodology
- Version Upgrade Issues: Building blocks changing default behavior
- Resource Adoption Conflicts: New versions trying to manage existing resources
- Provider Compatibility: Version mismatches between Terraform providers
- Kubernetes Resource Management: Conflicts between Terraform and native EKS resources
- Original error messages no longer appear in terraform plan
- Plan shows expected resource creation/modification
- No resource adoption or import conflicts
- Clean terraform validate and plan execution
- Comprehensive documentation created for future reference
Usage: Copy this prompt and provide it to any AI agent along with the specific TFC workspace details and failure symptoms. The agent should be able to systematically work through the investigation and resolution process.