Successfully reproduced the Unexpected Identity Change error.
Environment: k3s v1.34.6, Terraform 1.13.0, kubernetes provider 2.38.0
This is a bug in the hashicorp/kubernetes Terraform provider (tracked upstream as hashicorp/terraform-provider-kubernetes#2779), triggered by the resource identity feature introduced in Terraform 1.12.
The sequence of events:
- Coder v2.28.7 ships with Terraform 1.13.0 (which supports resource identity)
- The kubernetes provider v2.38.0 defines an identity schema for resources (
api_version,kind,name,namespace) - When
kubernetes_podis created, the provider creates the pod in k8s then waits for it to become ready - If the pod fails to become ready (timeout, image pull failure, node scaling, etc.), Terraform saves the resource ID to state but returns an error before the Read call populates the identity fields — identity is stored as all nulls
- On the next
terraform plan/apply, the provider's Read operation returns the actual identity values from the live resource - Terraform detects the mismatch (null → populated) and throws
Unexpected Identity Change
resource "kubernetes_pod" "main" {
metadata {
name = "coder-workspace-test"
namespace = "default"
}
spec {
container {
name = "dev"
image = "busybox:latest"
command = ["sleep", "infinity"]
}
}
timeouts {
create = "30s"
}
}terraform apply— pod is created in k8s but times out waiting for readiness- State file now contains:
with
"identity": { "api_version": null, "kind": null, "name": null, "namespace": null }
"status": "tainted"and"id": "default/coder-workspace-test" terraform plan→ Error: Unexpected Identity Change
Error: Unexpected Identity Change: During the read operation, the Terraform Provider
unexpectedly returned a different identity then the previously stored one.
Current Identity: cty.ObjectVal(map[string]cty.Value{
"api_version":cty.NullVal(cty.String),
"kind":cty.NullVal(cty.String),
"name":cty.NullVal(cty.String),
"namespace":cty.NullVal(cty.String)
})
New Identity: cty.ObjectVal(map[string]cty.Value{
"api_version":cty.StringVal("v1"),
"kind":cty.StringVal("Pod"),
"name":cty.StringVal("coder-workspace-test"),
"namespace":cty.StringVal("default")
})
This affects any Coder deployment using Terraform >= 1.12 where workspace pod creation times out or fails during the wait phase. Common triggers:
- Node autoscaling delays (pod pending while node provisions)
- Image pull failures/timeouts
- Resource quota limits
- Any transient failure during pod startup
Once a workspace hits this state, it becomes permanently broken — users cannot start, stop, or rebuild the workspace.
terraform state rmthe affected resource and re-apply- Pin Terraform to 1.11.x (before identity feature)
- Delete the pod from k8s and re-apply from Coder UI
Tracked at hashicorp/terraform-provider-kubernetes#2779.
Two open PRs exist to fix this:
-
PR #2841 — Bumps
terraform-plugin-sdkfrom v2.37.0 to v2.38.2. This is the core fix. No provider code changes needed; the SDK itself handles null→real identity transitions:- SDK v2.38.1 (terraform-plugin-sdk#1527): Skips identity change validation when stored identity is all-null.
- SDK v2.38.2 (terraform-plugin-sdk#1544): Prevents "Missing Resource Identity" error when the resource create returns an error (e.g., rollout timeout).
-
PR #2859 — Builds on #2841. Adds a code fix for
kubernetes_secret_v1wheresetResourceIdentityNamespacedwas unreachable due to early returns in the write-only attribute code path.
Both PRs are still open / unmerged as of 2026-04-03.
Coder could mitigate this by either:
- Pinning to Terraform < 1.12 until the provider ships a release with these fixes
- Adding error handling/recovery logic for this specific failure mode
- Documenting the workaround for affected users
Coder can fix this without waiting for the upstream provider release.
Approach: Sanitize the Terraform state before writing it to disk. If any resource instance has an identity object where every value is null, strip the identity key. Terraform then treats it as "no identity stored" and the Read populates it cleanly on the next refresh.
Verified: Stripping the all-null identity from the broken state file makes terraform plan succeed (it plans to replace the tainted resource instead of crashing).
Intervention point: provisioner/terraform/provision.go, around line 194, where the state is written to disk before terraform plan:
if len(request.GetState()) > 0 {
// Sanitize null identities before writing — workaround for
// hashicorp/terraform-provider-kubernetes#2779.
sanitized := sanitizeNullIdentities(request.GetState())
err := os.WriteFile(statefilePath, sanitized, 0o600)The sanitization logic:
- Parse the state JSON
- Walk
resources[].instances[].identity - If the identity object exists and all values are null, delete the key
- Re-serialize
This is safe because:
- Absent identity is the normal state for resources created before TF 1.12
- Terraform handles absent → populated identity transitions gracefully (verified)
- It only modifies state that is already broken (all-null identity = the provider never set it)
- Upgrading state from Terraform 1.11 → 1.13 with successfully created resources works fine (identity goes from absent → populated without error).
- The bug only triggers when the initial Create partially fails (resource exists in k8s but Terraform errored before Read populated identity).
- The resource is marked
"tainted"in state, so Terraform tries to destroy+recreate it, but the Read during refresh fails before it gets that far. - Stripping the all-null identity from state is equivalent to reverting to pre-1.12 state format for that resource, which Terraform handles gracefully.