Skip to content

Instantly share code, notes, and snippets.

@acsulli
Last active February 14, 2019 19:06
Show Gist options
  • Save acsulli/c23469c68d4a0988f5d6c3f1c5be6977 to your computer and use it in GitHub Desktop.
Save acsulli/c23469c68d4a0988f5d6c3f1c5be6977 to your computer and use it in GitHub Desktop.

Container Image Registry

The Registry is managed by the Image Registry Operator, deployed to the openshift-image-registry project.

As of this writing (installer 0.12.0), the deployment follows the steps outlined in the manifest directory:

  1. Create the CRD
  2. Create the namespace
  3. Request credentials from the Cloud Credential Operator
  4. RBAC and ServiceAccount(s)
  5. Request certificate(s) from the Cert Signing Operator
  6. Create the cluster-image-registry-operator deployment
  7. Create the cluster operator object

Each of these depends on the previous being successful. Failure may not prevent future steps from creating the objects, but will result in the deployment not being successful for a myriad of reasons depending on the step which failed.

Registry Info

  • Viewing objects related to the registry instance All commands below are expected to be executed from within the project: oc project openshift-image-registry.

    • Secrets:
      # the secret for the custom storage S3 credentials:
      oc get secret image-registry-private-configuration-user -o yaml
      
    • Deployments:
      # the operator
      oc describe deployment cluster-image-registry-operator
      
      # the registry instance
      oc describe deployment image-registry
      
    • CRD:
      oc describe crd configs.imageregistry.operator.openshift.io
      
    • Configuration:
      # this is where the operator configuration is applied
      oc get config instance -o yaml
      
    • ConfigMap(s):
      # only two config maps exist, they hold the certificates used
      oc get configmap
      
    • Route(s):
      oc describe route default-route
      
  • Checking the health of the registry deployment

    For the most up to date troubleshooting information, check the GitHub page for the operator.

    To verify if the registry has been deployed, check the status of the deployment:

    # check for the deployment
    oc get deployment image-registry -n openshift-image-registry
    
    # check the pods for the deployment
    oc get pods -n openshift-image-registry
    

    If you do not see the pod for the registry deployed, then you'll need to find out why deployment failed. Follow the instructions at the operator GitHub page linked above.

Modifying the registry configuration

The config/instance object in the openshift-image-registry controls the configuration which is pushed to the deployment. To change the configuration, edit the object (oc edit config instance -n openshift-image-registry) and set values in the spec section.

By default, with 0.12.0, the operator will configure these defaults:

  • One (1) replica
  • S3 storage in the same region which the cluster is deployed
  • The S3 bucket will be encrypted
  • Multipart uploads which fail will be cleaned after one (1) day

Some common values to edit/change:

  • Un-deploying the registry - Set the value of spec.managementState to Removed (set to Managed to resume default behavior)
  • Changing the number of replicas - Set the value of spec.replicas to your desired value, e.g. 3
  • Increase/decrease the log level - Set the value of spec.logging to your desired value: 0 = error, 1 = warn, 2 and 3 = info, all other values = debug
  • Managing storage configuration - Edit the values in spec.storage.s3. Other storage types are not supported at this time. The following values can be modified:
    • bucket - the name of the bucket to use if the default generated name is undesired
    • region - the name of the region to use, e.g. us-east-1
    • regionEndpoint - the region endpoint, will use the default for the region specified above
    • encrypt - whether to encrypt the bucket or not, defaults to false, but the managed deployment (spec.managementState = Managed) will enable encryption for the bucket
  • Create a route so that the registry is publicly accessible - Set spec.defaultRoute to true
  • Specify a node selector for the pods - Set spec.nodeSelector to the desired value, e.g. node-role.kubernetes.io/infra

Workarounds

  • When using the 0.12.0 installer, it was discovered that the registry was not being successfully deployed. The reason for the failure was that the credentials object did not get created in the project by the Cloud Credential Operator (which had failed with an error about insufficient permissions).

    • Symptoms

      1. Verify that the image-registry pods (not just image-registry-operator!) are not deployed:

        oc get pod -n openshift-image-registry | grep image-registry | grep -v operator
        
      2. Check the status of the registry deployment

        oc get deployment image-registry -o yaml -n openshift-image-registry
        
      3. If the deployment doesn't exist, check errors at the operator

        oc get configs.imageregistry.operator.openshift.io/instance -o yaml
        

        At this point the errors revealed that the secret with the credentials did not exist.

    • Fixing the problem As documented on the operator page, providing custom credentials is supported by creating a separate secret:

      cat << EOL > registry-user.yaml
      apiVersion: v1
      kind: Secret
      metadata:
        name: image-registry-private-configuration-user
        namespace: openshift-image-registry
      data:
        REGISTRY_STORAGE_S3_ACCESSKEY: <access key>
        REGISTRY_STORAGE_S3_SECRETKEY: <secret key>
      type: Opaque
      EOL
      
      # create the object
      oc create -f registry-user.yaml
      

      Or, create it from the CLI:

      export ACCESSKEY=< Your AWS Access Key>
      export SECRETKEY=<Your AWS Secret Key>
      oc create secret generic image-registry-private-configuration-user -n openshift-image-registry --from-literal=REGISTRY_STORAGE_S3_ACCESSKEY=${ACCESSKEY} --from-literal=REGISTRY_STORAGE_S3_SECRETKEY=${SECRETKEY}
      
  • When using the 0.12.0 installer, after applying the above credential fix, the registry deployment failed due to no S3 region being supplied.

    • Symptoms:
      1. Check the status of the deployment:

        # the logs for the pods reported the error
        oc logs image-registry-XXX-XXX -n openshift-image-registry
        

        The above output reflected the error:

        time="2019-02-12T18:33:25.716442598Z" level=info msg="start registry" distribution_version=v2.6.0+unknown go.version=go1.9.4 openshift_version=v4.0.0-0.149.0
        time="2019-02-12T18:33:25.716923859Z" level=info msg="caching project quota objects with TTL 1m0s" go.version=go1.9.4
        panic: No region parameter provided
        
        [...]
        
      2. Describing the pod showed that the parameter had not been defined:

        oc describe pod image-registry-XXX-XXX
        

        The describe looks like this, note the lack of a value for the environment variable REGISTRY_STORAGE_S3_REGION.

        Name:               image-registry-54db8bd7c4-9vpgt
        Namespace:          openshift-image-registry
        Priority:           0
        PriorityClassName:  <none>
        Node:               ip-10-0-31-5.ec2.internal/10.0.31.5
        Start Time:         Tue, 12 Feb 2019 18:27:19 +0000
        Labels:             docker-registry=default
                            pod-template-hash=54db8bd7c4
        [...]
        Status:             Running
        IP:                 10.128.0.31
        Controlled By:      ReplicaSet/image-registry-54db8bd7c4
        Containers:
          registry:
            Container ID:   cri-o://0bb77f94907ff7e6becf52c6e210ff547009bab893b356fc1f80931d4dfa4206
            Image:          quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0adacc84709af069e234d5567c70d0cbdf0b8d51ecb1b37e63160a15658fae13
            Image ID:       quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0adacc84709af069e234d5567c70d0cbdf0b8d51ecb1b37e63160a15658fae13
            Port:           5000/TCP
            Host Port:      0/TCP
            State:          Waiting
              Reason:       CrashLoopBackOff
            Last State:     Terminated
              Reason:       Error
              Exit Code:    2
              Started:      Tue, 12 Feb 2019 18:33:24 +0000
              Finished:     Tue, 12 Feb 2019 18:33:25 +0000
            Ready:          False
            Restart Count:  6
            Requests:
              cpu:      100m
              memory:   256Mi
            Liveness:   http-get https://:5000/healthz delay=10s timeout=5s period=10s #success=1 #failure=3
            Readiness:  http-get https://:5000/healthz delay=0s timeout=5s period=10s #success=1 #failure=3
            Environment:
              REGISTRY_STORAGE:                       s3
              REGISTRY_STORAGE_S3_BUCKET:             image-registry--91d67b3b845f4231a21e8cb912c13dbc-d45f0ab82ef31
              REGISTRY_STORAGE_S3_REGION:
              REGISTRY_STORAGE_S3_REGIONENDPOINT:
              REGISTRY_STORAGE_S3_ENCRYPT:            false
              REGISTRY_STORAGE_S3_ACCESSKEY:          <set to the key 'REGISTRY_STORAGE_S3_ACCESSKEY' in secret 'image-registry-private-configuration'>  Optional: false
              REGISTRY_STORAGE_S3_SECRETKEY:          <set to the key 'REGISTRY_STORAGE_S3_SECRETKEY' in secret 'image-registry-private-configuration'>  Optional: false
              REGISTRY_HTTP_ADDR:                     :5000
              REGISTRY_HTTP_NET:                      tcp
              REGISTRY_HTTP_SECRET:                   <data>
              REGISTRY_LOG_LEVEL:                     info
              REGISTRY_OPENSHIFT_QUOTA_ENABLED:       true
              REGISTRY_STORAGE_CACHE_BLOBDESCRIPTOR:  inmemory
              REGISTRY_STORAGE_DELETE_ENABLED:        true
              REGISTRY_OPENSHIFT_METRICS_ENABLED:     true
              REGISTRY_OPENSHIFT_SERVER_ADDR:         image-registry.openshift-image-registry.svc:5000
              REGISTRY_HTTP_TLS_CERTIFICATE:          /etc/secrets/tls.crt
              REGISTRY_HTTP_TLS_KEY:                  /etc/secrets/tls.key
            Mounts:
              /etc/pki/ca-trust/source/anchors from registry-certificates (rw)
              /etc/secrets from registry-tls (rw)
              /var/run/secrets/kubernetes.io/serviceaccount from registry-token-dv64s (ro)
        Conditions:
          Type              Status
          Initialized       True
          Ready             False
          ContainersReady   False
          PodScheduled      True
        [...]
        
      3. Lastly, the config reflects the error:

        oc get -o yaml config instance -n openshift-image-registry
        

        Look for the current status in the status.conditions section of the output.

      4. The fix is to edit the deployment definition and set the value for spec.storage.s3.region to the region the cluster is deployed to, e.g. us-east-1:

        oc edit config instance -n openshift-image-registry
        

        Or, patch the config:

        oc patch config instance -n openshift-image-registry --type merge --patch '{"spec": { "storage": { "s3": { "region":"us-east-1"}}}}'
        
      5. After editing, OpenShift will redeploy with the updated configuration. Monitor the deployment using the same commands:

        oc get deployment
        oc describe deployment image-registry
        oc get pod | grep image-registry | grep -v operator
        oc describe pod image-registry-XXX-XXX
        oc logs image-registry-XXX-XXX
        
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment