Skip to content

Instantly share code, notes, and snippets.

@bpmct
Last active April 3, 2026 18:48
Show Gist options
  • Select an option

  • Save bpmct/68fea045f5fc9bf7985480328cc363d9 to your computer and use it in GitHub Desktop.

Select an option

Save bpmct/68fea045f5fc9bf7985480328cc363d9 to your computer and use it in GitHub Desktop.
Reproduction of coder/coder#21719: Terraform Unexpected Identity Change with kubernetes_pod

Issue #21719: Terraform Identity Change Error — Reproduction & Findings

Reproduction Confirmed

Successfully reproduced the Unexpected Identity Change error.

Environment: k3s v1.34.6, Terraform 1.13.0, kubernetes provider 2.38.0

Root Cause

This is a bug in the hashicorp/kubernetes Terraform provider (tracked upstream as hashicorp/terraform-provider-kubernetes#2779), triggered by the resource identity feature introduced in Terraform 1.12.

The sequence of events:

  1. Coder v2.28.7 ships with Terraform 1.13.0 (which supports resource identity)
  2. The kubernetes provider v2.38.0 defines an identity schema for resources (api_version, kind, name, namespace)
  3. When kubernetes_pod is created, the provider creates the pod in k8s then waits for it to become ready
  4. If the pod fails to become ready (timeout, image pull failure, node scaling, etc.), Terraform saves the resource ID to state but returns an error before the Read call populates the identity fields — identity is stored as all nulls
  5. On the next terraform plan/apply, the provider's Read operation returns the actual identity values from the live resource
  6. Terraform detects the mismatch (null → populated) and throws Unexpected Identity Change

Reproduction Steps

Terraform config used

resource "kubernetes_pod" "main" {
  metadata {
    name      = "coder-workspace-test"
    namespace = "default"
  }
  spec {
    container {
      name    = "dev"
      image   = "busybox:latest"
      command = ["sleep", "infinity"]
    }
  }
  timeouts {
    create = "30s"
  }
}

Steps

  1. terraform apply — pod is created in k8s but times out waiting for readiness
  2. State file now contains:
    "identity": {
      "api_version": null,
      "kind": null,
      "name": null,
      "namespace": null
    }
    with "status": "tainted" and "id": "default/coder-workspace-test"
  3. terraform planError: Unexpected Identity Change

Exact Error Output

Error: Unexpected Identity Change: During the read operation, the Terraform Provider
unexpectedly returned a different identity then the previously stored one.

Current Identity: cty.ObjectVal(map[string]cty.Value{
  "api_version":cty.NullVal(cty.String),
  "kind":cty.NullVal(cty.String),
  "name":cty.NullVal(cty.String),
  "namespace":cty.NullVal(cty.String)
})

New Identity: cty.ObjectVal(map[string]cty.Value{
  "api_version":cty.StringVal("v1"),
  "kind":cty.StringVal("Pod"),
  "name":cty.StringVal("coder-workspace-test"),
  "namespace":cty.StringVal("default")
})

Impact on Coder

This affects any Coder deployment using Terraform >= 1.12 where workspace pod creation times out or fails during the wait phase. Common triggers:

  • Node autoscaling delays (pod pending while node provisions)
  • Image pull failures/timeouts
  • Resource quota limits
  • Any transient failure during pod startup

Once a workspace hits this state, it becomes permanently broken — users cannot start, stop, or rebuild the workspace.

Workarounds

  1. terraform state rm the affected resource and re-apply
  2. Pin Terraform to 1.11.x (before identity feature)
  3. Delete the pod from k8s and re-apply from Coder UI

Upstream Fix

Tracked at hashicorp/terraform-provider-kubernetes#2779.

Two open PRs exist to fix this:

  1. PR #2841 — Bumps terraform-plugin-sdk from v2.37.0 to v2.38.2. This is the core fix. No provider code changes needed; the SDK itself handles null→real identity transitions:

    • SDK v2.38.1 (terraform-plugin-sdk#1527): Skips identity change validation when stored identity is all-null.
    • SDK v2.38.2 (terraform-plugin-sdk#1544): Prevents "Missing Resource Identity" error when the resource create returns an error (e.g., rollout timeout).
  2. PR #2859 — Builds on #2841. Adds a code fix for kubernetes_secret_v1 where setResourceIdentityNamespaced was unreachable due to early returns in the write-only attribute code path.

Both PRs are still open / unmerged as of 2026-04-03.

Coder could mitigate this by either:

  • Pinning to Terraform < 1.12 until the provider ships a release with these fixes
  • Adding error handling/recovery logic for this specific failure mode
  • Documenting the workaround for affected users

Coder-Side Mitigation (Verified)

Coder can fix this without waiting for the upstream provider release.

Approach: Sanitize the Terraform state before writing it to disk. If any resource instance has an identity object where every value is null, strip the identity key. Terraform then treats it as "no identity stored" and the Read populates it cleanly on the next refresh.

Verified: Stripping the all-null identity from the broken state file makes terraform plan succeed (it plans to replace the tainted resource instead of crashing).

Intervention point: provisioner/terraform/provision.go, around line 194, where the state is written to disk before terraform plan:

if len(request.GetState()) > 0 {
    // Sanitize null identities before writing — workaround for
    // hashicorp/terraform-provider-kubernetes#2779.
    sanitized := sanitizeNullIdentities(request.GetState())
    err := os.WriteFile(statefilePath, sanitized, 0o600)

The sanitization logic:

  1. Parse the state JSON
  2. Walk resources[].instances[].identity
  3. If the identity object exists and all values are null, delete the key
  4. Re-serialize

This is safe because:

  • Absent identity is the normal state for resources created before TF 1.12
  • Terraform handles absent → populated identity transitions gracefully (verified)
  • It only modifies state that is already broken (all-null identity = the provider never set it)

Key Observations

  • Upgrading state from Terraform 1.11 → 1.13 with successfully created resources works fine (identity goes from absent → populated without error).
  • The bug only triggers when the initial Create partially fails (resource exists in k8s but Terraform errored before Read populated identity).
  • The resource is marked "tainted" in state, so Terraform tries to destroy+recreate it, but the Read during refresh fails before it gets that far.
  • Stripping the all-null identity from state is equivalent to reverting to pre-1.12 state format for that resource, which Terraform handles gracefully.
terraform {
required_providers {
kubernetes = {
source = "hashicorp/kubernetes"
version = "~> 2.38.0"
}
}
}
provider "kubernetes" {
config_path = "~/.kube/config"
}
# This simulates a Coder workspace pod.
# The pod will be created in k8s but won't be able to start,
# causing the Terraform create to timeout. This leaves the
# resource ID in state but identity values as null.
resource "kubernetes_pod" "main" {
metadata {
name = "coder-workspace-test"
namespace = "default"
labels = {
"app.coder.com" = "workspace"
}
}
spec {
container {
name = "dev"
image = "busybox:latest"
command = ["sleep", "infinity"]
}
}
timeouts {
create = "30s"
}
}

Reproduce: Terraform Identity Change Error (#21719)

Reproduces the Unexpected Identity Change error when Terraform ≥ 1.12 is used with the hashicorp/kubernetes provider and a resource create times out.

Prerequisites

Follow .claude/skills/k8s-repro.md to get k3s + Terraform running.

Steps

1. Create the Terraform config

# main.tf
terraform {
  required_providers {
    kubernetes = {
      source  = "hashicorp/kubernetes"
      version = "~> 2.38.0"
    }
  }
}

provider "kubernetes" {
  config_path = "~/.kube/config"
}

resource "kubernetes_pod" "main" {
  metadata {
    name      = "coder-workspace-test"
    namespace = "default"
  }
  spec {
    container {
      name    = "dev"
      image   = "busybox:latest"
      command = ["sleep", "infinity"]
    }
  }
  timeouts {
    create = "30s"
  }
}

2. Create broken state (Terraform 1.13.0)

terraform init
terraform apply -auto-approve
# Pod create times out (~30s). State now has null identity.

3. Trigger the error

terraform plan
# Error: Unexpected Identity Change

4. Verify the state

python3 -c "
import json
with open('terraform.tfstate') as f:
    state = json.load(f)
for r in state['resources']:
    for inst in r['instances']:
        print('identity:', json.dumps(inst.get('identity'), indent=2))
"
# Shows: all null values

5. Test the Coder-side fix (strip null identity)

python3 -c "
import json
with open('terraform.tfstate') as f:
    state = json.load(f)
for r in state['resources']:
    for inst in r['instances']:
        identity = inst.get('identity')
        if identity and all(v is None for v in identity.values()):
            del inst['identity']
with open('terraform.tfstate', 'w') as f:
    json.dump(state, f, indent=2)
    f.write('\n')
"
terraform plan  # Works now

Root cause

  1. Terraform 1.12+ added resource identity tracking.
  2. kubernetes provider defines identity (api_version, kind, name, namespace) but only populates it in Read, not Create.
  3. If Create succeeds in k8s but the wait-for-ready times out, the resource ID is saved to state with all-null identity.
  4. Next plan → Read returns real identity → null vs populated mismatch.

References

{
"version": 4,
"terraform_version": "1.13.0",
"serial": 1,
"lineage": "b719cf4b-27dd-dc07-3565-e6b17766d84b",
"outputs": {},
"resources": [
{
"mode": "managed",
"type": "kubernetes_pod",
"name": "main",
"provider": "provider[\"registry.terraform.io/hashicorp/kubernetes\"]",
"instances": [
{
"status": "tainted",
"schema_version": 1,
"attributes": {
"id": "default/coder-workspace-test",
"metadata": [
{
"annotations": null,
"generate_name": "",
"generation": 0,
"labels": {
"app.coder.com": "workspace"
},
"name": "coder-workspace-test",
"namespace": "default",
"resource_version": "",
"uid": ""
}
],
"spec": [
{
"active_deadline_seconds": 0,
"affinity": [],
"automount_service_account_token": true,
"container": [
{
"args": null,
"command": [
"sleep",
"infinity"
],
"env": [],
"env_from": [],
"image": "busybox:latest",
"image_pull_policy": "",
"lifecycle": [],
"liveness_probe": [],
"name": "dev",
"port": [],
"readiness_probe": [],
"resources": [],
"security_context": [],
"startup_probe": [],
"stdin": false,
"stdin_once": false,
"termination_message_path": "/dev/termination-log",
"termination_message_policy": "",
"tty": false,
"volume_device": [],
"volume_mount": [],
"working_dir": ""
}
],
"dns_config": [],
"dns_policy": "ClusterFirst",
"enable_service_links": true,
"host_aliases": [],
"host_ipc": false,
"host_network": false,
"host_pid": false,
"hostname": "",
"image_pull_secrets": [],
"init_container": [],
"node_name": "",
"node_selector": null,
"os": [],
"priority_class_name": "",
"readiness_gate": [],
"restart_policy": "Always",
"runtime_class_name": "",
"scheduler_name": "",
"security_context": [],
"service_account_name": "",
"share_process_namespace": false,
"subdomain": "",
"termination_grace_period_seconds": 30,
"toleration": [],
"topology_spread_constraint": [],
"volume": []
}
],
"target_state": null,
"timeouts": {
"create": "30s",
"delete": null
}
},
"sensitive_attributes": [],
"identity_schema_version": 1,
"private": "eyJlMmJmYjczMC1lY2FhLTExZTYtOGY4OC0zNDM2M2JjN2M0YzAiOnsiY3JlYXRlIjozMDAwMDAwMDAwMCwiZGVsZXRlIjozMDAwMDAwMDAwMDB9LCJzY2hlbWFfdmVyc2lvbiI6IjEifQ==",
"identity": {
"api_version": null,
"kind": null,
"name": null,
"namespace": null
}
}
]
}
],
"check_results": null
}
{
"version": 4,
"terraform_version": "1.13.0",
"serial": 1,
"lineage": "b719cf4b-27dd-dc07-3565-e6b17766d84b",
"outputs": {},
"resources": [
{
"mode": "managed",
"type": "kubernetes_pod",
"name": "main",
"provider": "provider[\"registry.terraform.io/hashicorp/kubernetes\"]",
"instances": [
{
"status": "tainted",
"schema_version": 1,
"attributes": {
"id": "default/coder-workspace-test",
"metadata": [
{
"annotations": null,
"generate_name": "",
"generation": 0,
"labels": {
"app.coder.com": "workspace"
},
"name": "coder-workspace-test",
"namespace": "default",
"resource_version": "",
"uid": ""
}
],
"spec": [
{
"active_deadline_seconds": 0,
"affinity": [],
"automount_service_account_token": true,
"container": [
{
"args": null,
"command": [
"sleep",
"infinity"
],
"env": [],
"env_from": [],
"image": "busybox:latest",
"image_pull_policy": "",
"lifecycle": [],
"liveness_probe": [],
"name": "dev",
"port": [],
"readiness_probe": [],
"resources": [],
"security_context": [],
"startup_probe": [],
"stdin": false,
"stdin_once": false,
"termination_message_path": "/dev/termination-log",
"termination_message_policy": "",
"tty": false,
"volume_device": [],
"volume_mount": [],
"working_dir": ""
}
],
"dns_config": [],
"dns_policy": "ClusterFirst",
"enable_service_links": true,
"host_aliases": [],
"host_ipc": false,
"host_network": false,
"host_pid": false,
"hostname": "",
"image_pull_secrets": [],
"init_container": [],
"node_name": "",
"node_selector": null,
"os": [],
"priority_class_name": "",
"readiness_gate": [],
"restart_policy": "Always",
"runtime_class_name": "",
"scheduler_name": "",
"security_context": [],
"service_account_name": "",
"share_process_namespace": false,
"subdomain": "",
"termination_grace_period_seconds": 30,
"toleration": [],
"topology_spread_constraint": [],
"volume": []
}
],
"target_state": null,
"timeouts": {
"create": "30s",
"delete": null
}
},
"sensitive_attributes": [],
"identity_schema_version": 1,
"private": "eyJlMmJmYjczMC1lY2FhLTExZTYtOGY4OC0zNDM2M2JjN2M0YzAiOnsiY3JlYXRlIjozMDAwMDAwMDAwMCwiZGVsZXRlIjozMDAwMDAwMDAwMDB9LCJzY2hlbWFfdmVyc2lvbiI6IjEifQ=="
}
]
}
],
"check_results": null
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment