Skip to content

Instantly share code, notes, and snippets.

@sajayantony
Last active April 20, 2019 01:45
Show Gist options
  • Save sajayantony/699ecfc299293220bb533f036e9b845f to your computer and use it in GitHub Desktop.
Save sajayantony/699ecfc299293220bb533f036e9b845f to your computer and use it in GitHub Desktop.
ACR-Diagnostics

ACR Diagnostics

The basic idea here is to enable customer to self diagnose and give us some data.

  • Validate DNS query works through docker.
  • Potentially provide and image that they can run so that the script is actually run inside a docker environment rather than on the host itself.
  • Give a way for them to report back the correlation ID.
  • Run a token service command
  • Run a dataplane command like a test upload a 0 byte blob with a correlation id.
$ az acr health check -n myregistry
DNS lookup to myregistry.azurecr.io 123.123.23.23. : OK
AAD Token acquisition : OK
Upload test : OK
Correlation Id : 1234123012341231234

Customer could potentially even run this in a container to actually have docker do the network calls

$ az acr health docker run-diagnostics
$ az acr health check --with-docker-image ## this could run the same commands by pulling a docker image and pass in an accesstoken etc. 
$ az acr health check --operation push # we can decide if we want to use docker to create an image and push it after we do the DNS validations 

more thoughts coming soon.

--Yu

Get client and server region info, client IP The roles of the user? Upload test to registry (should be auto-purged later): correlationId, status, latency, speed Upload test to a test storage in the region directly Download test from registry (may need to do multiple times): correlationId, status, latency, speed Download test from a test storage in the region directly.

--Yihuang

  1. We've seen storage cert expiration before. This blocked customer from pulling image. It would be nice if we can provide a tool to let them verify blob store healthy information. 2. Storage connection may fail at any time during pulling. We can think about giving customer more useful info during pulling.
@msyihchen
Copy link

Azure storage now has connectivity check blade in portal. This is helpful for VNet enabled registry user.

Copy link

ghost commented Apr 19, 2019

az acr health check

Flags overview

Flag Full command Command description Parameters Needs authentication
connectivity az acr health check --connectivity Get general connectivity status for the registry: check that registry is online and maybe VNET status. Also make a speed test to ensure latency is fine. Registry name, Subscription Maybe (due to VNET config, which is private)
docker-env az acr health check --docker-env Ensure that the environment is proper to the failing operation. This can include: checking docker version, azure cli version, ensure docker daemon is running and connectable. None No
docker-op az acr health-check --docker-op Ensure that docker operations are running as expected None No
all az acr health check --all Perform both checks Registry Name, Subscription Maybe (due to connectivity)

az acr health check --connectivity -r myRegistry

Priority: 1

Steps to be implemented/checked in this command:

  1. nslookup <LOGIN SERVER>: ensure registry is online
  2. az acr network-rule list -n {registryName}: get ip rules for registry and ensure user can connect to it.
    • how can we check the user external ip to ensure it is allowed?
  3. Validate DNS through docker
    • I didn't find anything that allow us to validate DNS through docker client. The most close that I found was docker network subgroup
  4. docker pull mcr.microsoft.com/mcr/hello-world: download mcr hello world to ensure latency is acceptable.
  5. AAD Token acquisition: ensure credentials are validated.

az acr health check --docker-env

Priority: 1

Steps to be implemented/checked in this command:

  1. az version: get versions for az module commands including az acr
    • verify if there is an easier way to get just az acr version
  2. docker version (or docker info): get docker version (both client and server), and additional info for these versions
  3. Ensure docker daemon is running
    • I didn't find any ways to doing this ...

az acr health check --docker-op

Priority: 2

Steps to be implemented/checked in this command:

  1. Validate docker login
  2. Validate docker pull
  3. Validate docker push

az acr health check --all

Priority: 1

This flag runs the full command.

Future flags overview

Subgroup name Full path Description Priority
helm az acr health check --helm Validate helm operations 3

az acr self-healing (future planning)

Priority: 4

Command name Full path Description Parameters Needs authentication
environment az acr self-healing environment Restart environment setting: reconnect to docker, docker daemon None No

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment