Skip to content

Instantly share code, notes, and snippets.

@bellpr
Last active October 18, 2023 22:54
Show Gist options
  • Save bellpr/3f7f935677550128632671fe2925518e to your computer and use it in GitHub Desktop.
Save bellpr/3f7f935677550128632671fe2925518e to your computer and use it in GitHub Desktop.
Bits of knowledge to share

Setup EtherPad instance with TLS for local network "copy/paste buffer" between machines

Introduction and purpose

As a consultant, I am sometimes required to use customer-provided hardware without admin access or the ability to install any unapproved software to satisfy their intellectual property or security requirements. Nevertheless, I still occasionally require the ability to copy small snippets of text (mainly links or bits of configuration) from my primary work machine to or from the customer-provided hardware. EtherPad provides a friendly "Google Docs"-style interface that allows shared editing of a document, and does not generate huge, difficult-to-remember URLs like some of the self-hosted PasteBin-style solutions that would defeat the purpose of easily sharing small bits of information. Hosting this service on my work laptop (with TLS) enables me to respect our customer agreements by keeping any customer data off of my personal hardware or any cloud services. Using a Podman pod (very similar to a Kubernetes pod, if you are familiar) permits us to expose only the TLS proxy to the machine, preventing accidental (or intentional) direct access to the EtherPad service without TLS encryption.

Prerequisites and system requirements

  • This procedure was written and tested on an x86-64 Fedora 37 Workstation with Podman installed. It should also work on MacOS or Windows with minor modifications using podman remote and podman machine, although I have not tested any other Linux distribution or operating system.
  • The following UNIX/POSIX utilities are required (minor modifications may be required if using a different shell or the MacOS/BSD versions of these utilities).
    • podman
    • GNU cureutils (chmod, cat, mkdir, mktemp, rm)
    • GNU bash 4.0 or later
    • GNU sed
    • GNU awk
    • bind-utils 9.x (dig)
    • OpenSSL 3.0
    • cURL 7.x
    • firewalld (firewall-cmd) - Used to manage firewall configuration on Fedora Linux and some other linux distributions

The installation process

  1. Set variables for later use and pre-pull our container images (versions current at time of writing). The important variables to be used are:

    • EP_TAG: The version tag used for the EtherPad container image
    • HAP_TAG: The version tag used for the HAProxy container image
    • EP_DATA: Local read/write path to store EtherPad data
    • HAP_DATA: Local read-only path to store HAProxy configuration
    EP_TAG='1.8' HAP_TAG='2.7' POD_NAME='secure-share' EP_DATA="${HOME}/work/etherpad" FEP_DATA="${HOME}/work/haproxy-fe"
    podman pull "docker.io/etherpad/etherpad:${EP_TAG}" "docker.io/haproxy:${HAP_TAG}"
    mkdir -p "${EP_DATA}" "${FEP_DATA}/ssl/certs"
    chmod 777 "${EP_DATA}" # Needs to be world-writable because EtherPad needs to create its "database" file
    chmod -R 755 "${FEP_DATA}" # HAProxy only needs read access
  2. Create pod to host containers (publish only the TLS proxy port).

    podman pod create --publish 8443:8443/tcp --label "EtherPad=${EP_TAG},HAProxy=${HAP_TAG}" --name "${POD_NAME}"
  3. Run EtherPad container in pod - uses "DirtyDB" since SQLite isn't included by default. Modify SKIN_VARIANTS to your personal tastes as documented here (I like dark mode).

    podman run --pod "${POD_NAME}" --name work_etherpad --detach --security-opt=label=disable --env TITLE="Secure ShareSpace" --env ADMIN_PASSWORD=<ADMIN_PASSWORD> --env USER_PASSWORD=<USER_PASSWORD> --env REQUIRE_AUTHENTICATION=true --env TRUST_PROXY=true --env SKIN_VARIANTS="dark-toolbar super-dark-editor full-width-editor dark-background" --volume "${EP_DATA}:/opt/etherpad-lite/var:rw" "docker.io/etherpad/etherpad:${EP_TAG}"
  4. Create self-signed EC/P521 cert with SANs for the current DNS hostname, current active IP and several localhost variants. Feel free to use an RSA cert if you prefer, I just felt like being fancy.

    PRI_IP=$(ip route | awk '/^default via/ {print $9}')
    DNS_NAME=$(dig +short -x ${PRI_IP} | sed 's/\.$//')
    TLS_TMP=$(mktemp --directory)
    openssl req -new -x509 -days 3653 -subj "/CN=${DNS_NAME}/ORG=self-signed" \
      -newkey ec -pkeyopt ec_paramgen_curve:secp521r1 -sha384 \
      -addext 'basicConstraints=critical,CA:true,pathlen:1' \
      -addext "subjectAltName=DNS:${DNS_NAME},IP:${PRI_IP},DNS:localhost,DNS:localhost.localdomain,DNS:localhost4,DNS:localhost4.localdomain4,DNS:localhost6,DNS:localhost6.localdomain6,IP:127.0.0.1,IP:::1" \
      -addext 'keyUsage=critical,nonRepudiation,digitalSignature,keyEncipherment,keyAgreement,keyCertSign' \
      -addext 'extendedKeyUsage=critical,serverAuth' \
      -keyout ${TLS_TMP}/key.pem -out ${TLS_TMP}/cert.pem -noenc
    cat ${TLS_TMP}/key.pem ${TLS_TMP}/cert.pem > ${FEP_DATA}/ssl/certs/01-default.pem
    openssl x509 -in ${TLS_TMP}/cert.pem -text -noout
    rm -rf "${TLS_TMP}" && unset TLS_TMP
  5. Create HAProxy configuration for TLS termination and proxying to the internal-facing EtherPad container. Modify connection limits and timeouts if desired.

    Note that specifying a directory in the ssl crt directive causes it to choose a cert based on the SNI (Server Name Indication) value in the TLS negotiation (sorta like a Host header for TLS) or choose the first cert file sorted alphanumerically.

    cat > "${FEP_DATA}/haproxy.cfg" << 'EOF'
    global
      maxconn 10
      ssl-default-bind-options ssl-min-ver TLSv1.2
    
    defaults
      timeout connect 5s
      timeout client 5s
      timeout server 5s
      mode http
      maxconn 10
    
    listen etherpad
      bind *:8443 ssl crt /usr/local/etc/haproxy/ssl/certs/
      server local-pod 127.0.0.1:9001
    EOF
  6. Run HAProxy container in pre-created pod.

    podman run --pod "${POD_NAME}" --name tls_proxy --detach --security-opt=label=disable --volume "${FEP_DATA}:/usr/local/etc/haproxy:ro" "docker.io/haproxy:${HAP_TAG}"
  7. Check that TLS works using both cURL and the OpenSSL s_client utility to ensure certs are being presented as expected.

    Note that we disable cert trust validation because we know we are using a self-signed cert. The cURL command should output "Authentication Required", which is expected as we configured EtherPad to require authentication.

    curl --silent --insecure --location "https://127.0.0.1:8443/" && openssl s_client -connect 127.0.0.1:8443 -servername 127.0.0.1 < /dev/null
  8. Check the link with a local browser (you will need the password you supplied when starting EtherPad. The username is "user").

  9. Configure local firewall to permit inboud tcp/8443 access. This procedure is for Fedora 37 with firewalld, other distros may vary and other operating systems certainly will.

    Note that the foreman-proxy service is used for simplicity's sake as it is already defined on Fedora 37 using tcp/8443).

    sudo firewall-cmd --add-service=foreman-proxy
    sudo firewall-cmd --add-service=foreman-proxy --permanent

Extra credit

Podman can generate systemd unit files to configure the system to stop and restart the pod and associated containers automatically (obviously this requires a Linux distro with systemd init like Fedora 37, other OSes or distros using another init method will require a different procedure).

podman generate systemd --name "${POD_NAME}" --after network.target

Various potentiallty helpful commands

  • Restart HAProxy container and view the logs (for troubleshooting issues with TLS negotiation or cert file recognition).

    podman stop tls_proxy && podman rm tls_proxy
    podman restart tls_proxy
    podman logs tls_proxy
  • Remove pod and all containers within

    podman pod stop "${POD_NAME}" && podman pod rm "${POD_NAME}"
#!/bin/python3
import http.server, ssl, pprint, os, sys, subprocess, tempfile, shutil, hashlib, base64, json, urllib, urllib.parse, urllib.request
# Embedded HTTPS server configuration
openssl_bin = '/usr/bin/openssl'
svr_keypair = 'oauth-keypair.pem'
svr_port = 8443
# OIDC client configuration
issuer_url = ''
client_id = ''
client_secret = ''
redir_url = 'https://localhost:8443/oidc_test'
scopes = [
'openid',
'email'
'profile'
]
# Generate state value for CSRF prevention
state = base64.b64encode(hashlib.sha256(os.getrandom(2)).digest())
# Generate PKCE verifier and challenge
pkce_verifier = b''
while len(pkce_verifier) < 96:
pkce_verifier += hashlib.sha256(os.getrandom(2)).digest()
pkce_verifier = base64.urlsafe_b64encode(pkce_verifier).rstrip(b'=')[:128]
pkce_challenge = base64.urlsafe_b64encode(hashlib.sha256(pkce_verifier).digest()).rstrip(b'=')
# TLS Client setup
cli_ctx = ssl.create_default_context(ssl.Purpose.SERVER_AUTH)
cli_ctx.minimum_version = ssl.TLSVersion.TLSv1_2
cli_ctx.check_hostname = True
cli_ctx.verify_mode=ssl.VerifyMode.CERT_REQUIRED
# Fetch OIDC endpoints or fall back to hardcoded list
oidc_metadata = {
'authorization_endpoint': None,
'token_endpoint': None,
'userinfo_endpoint': None,
'revocation_endpoint': None
}
try:
resp = urllib.request.urlopen(f'{issuer_url.rstrip("/")}/.well-known/openid-configuration', data=None, timeout=None, cafile=None, capath=None, cadefault=False, context=cli_ctx)
oidc_metadata=json.loads(resp.read())
except Exception as ex:
print('Error fetching OIDC metadata, falling back to hardcoded values.')
# Set Authorize paramaters
auth_params = {
'response_type': 'code',
'client_id': client_id,
'redirect_uri': redir_url,
'scope': ' '.join(scopes),
'state': state,
}
if pkce_verifier:
auth_params.update({
'code_challenge': pkce_challenge,
'code_challenge_method': 'S256'
})
# Generate authorization URL
print('Paste the following URL into your browser: %s' % '?'.join([oidc_metadata['authorization_endpoint'],urllib.parse.urlencode(auth_params)]))
# Check for TLS keypair in the current directory, generate self-signed keypair if not present
if not os.path.exists(svr_keypair):
print('NOTICE: No keypair found, generating self-signed keypair')
ssl_tmpdir = tempfile.mkdtemp()
keygen_cmd = [
openssl_bin, 'req', '-new', '-x509', '-days', '30', '-subj', '/CN=OAuth Tester/ORG=self-signed', '-newkey', 'rsa:2048', '-sha256',
'-addext', 'basicConstraints=critical,CA:true,pathlen:1',
'-addext', 'subjectAltName=DNS:localhost,DNS:localhost.localdomain,DNS:localhost4,DNS:localhost4.localdomain4,DNS:localhost6,DNS:localhost6.localdomain6,IP:127.0.0.1,IP:::1',
'-addext', 'keyUsage=critical,nonRepudiation,digitalSignature,keyEncipherment,keyAgreement,keyCertSign',
'-addext', 'extendedKeyUsage=critical,serverAuth',
'-out', os.path.join(ssl_tmpdir,'cert.pem'), '-keyout', os.path.join(ssl_tmpdir, 'key.pem'), '-noenc'
]
subprocess.run(keygen_cmd, stdout=sys.stdout, stderr=subprocess.STDOUT)
with open(os.path.join(ssl_tmpdir,'cert.pem'), 'r') as cert_file:
cert = cert_file.read()
with open(os.path.join(ssl_tmpdir,'key.pem'), 'r') as key_file:
key = key_file.read()
with open(svr_keypair, 'w') as keypair_file:
keypair_file.write(os.linesep.join([cert, key]))
shutil.rmtree(ssl_tmpdir)
def get_userinfo(endpoint_url, token):
pass
def get_token(token_url, auth_code, redirect_url, client_id, client_secret, pkce_verifier=None, ssl_context=None, creds_in_params=False):
auth_val = None
token_params = {
'grant_type': 'authorization_code',
'code': auth_code,
'redirect_uri': redirect_url,
}
if pkce_verifier: token_params.update({'code_verifier': pkce_verifier})
if creds_in_params:
token_params.update({'client_id': client_id, '': 'client_secret'})
else:
auth_val = b'Basic ' + base64.b64encode(f'{client_id}:{client_secret}'.encode())
print('Fetching Token!')
req = urllib.request.Request(token_url, data=urllib.parse.urlencode(token_params).encode(), headers={}, method='POST')
if auth_val: req.add_header('Authorization', auth_val)
req.add_header('Content-Type', 'application/x-www-form-urlencoded')
req.add_header('User-Agent', 'OAuth Tester/1.0')
resp = urllib.request.urlopen(req, context=ssl_context)
output = resp.read()
return output
class redir_handler(http.server.BaseHTTPRequestHandler):
def do_GET(self):
if self.path.lower().startswith('/oidc_test'):
query = urllib.parse.parse_qs(urllib.parse.urlparse(self.path).query)
token = get_token(oidc_metadata['token_endpoint'], query.get('code', [None])[0], redir_url, client_id, client_secret, pkce_verifier, ssl_context=cli_ctx, creds_in_params=False)
self.send_header('Content-Length', len(token))
output = token
else:
output = b'Nothing to see here.'
self.send_header('Content-Type', 'text/plain')
self.send_response(200)
self.end_headers()
self.wfile.write(output)
# Configure TLS-Enabled simple web server
svr_ctx = ssl.create_default_context(ssl.Purpose.CLIENT_AUTH)
svr_ctx.minimum_version = ssl.TLSVersion.TLSv1_2
svr_ctx.check_hostname = False
svr_ctx.verify_mode=ssl.VerifyMode.CERT_NONE
svr_ctx.load_cert_chain(certfile=svr_keypair)
server = http.server.HTTPServer(('127.0.0.1', svr_port), redir_handler)
server.socket = svr_ctx.wrap_socket(server.socket, server_side=True, do_handshake_on_connect=True, suppress_ragged_eofs=True)
print(f"Webserver listening on https://127.0.0.1:{svr_port}")
server.serve_forever()

Create a client certificate for login to an OpenShift cluster

Introduction and purpose

The OpenShift installer creates a client certificate for administrative use as part of the installation process. End users may desire to create a X.509 client certificate for other users for a myriad of reasons, for example:

  • A replacement for the kubeconfig file created during installation that was lost or discarded
  • An emergency admin credential to troubleshoot IdP issues
  • An alternative to a Bearer token for external access from a service account
  • An authentication token that is valid only for a specific time period

Prerequisites

  • An OpenShift 4.x cluster (it's right there in the title)! A similar process may work for upstream Kubernetes or other Kubernetes distributions, but this procedure was written for and tested on OpenShift 4.12.3. Contributions are welcome!
  • An x86-64 workstation (Fedora Linux 37 was used for these instructions) in order to execute the instructions as written. A similar enough environment (say Ubuntu on Windows with WSL) should work with minor changes.
  • An existing OpenShift user with the ability to create and approve CertificateSigningRequests (usually this means a user with the cluster-admin role bound, but you can find a complete list by running oc adm policy who-can create CertificateSigningRequest.certificates.k8s.io.
  • The following UNIX/POSIX utilities are required (minor modifications may be required if using a different shell or the MacOS/BSD versions of these utilities).
    • OpenSSL 3.0
    • bash 4.00 or greater
    • GNU tar
    • wget
    • GNU awk
    • rm, mkdir, mktemp, base64, cat, and similar utilities from the GNU coreutils package (or compatible equivalents)
    • jq (suggested to make some output prettier, but not technically required)

Background

If you examine the client certificate in the kubeconfig file created by the OpenShift installer, you will see that the Subject is O = system:masters, CN = system:admin, i.e. the O value maps to a Kubernetes/OpenShift Group and the CN value maps to a User. You can see this by querying OpenShift for the current user's details while using the installer-created kubeconfig file (obviously, your local file path will vary):

oc --kubeconfig /path/to/installer/kubeconfig get --raw '/apis/user.openshift.io/v1/users/~' | jq .

The output should look like this:

{
  "kind": "User",
  "apiVersion": "user.openshift.io/v1",
  "metadata": {
    "name": "system:admin",
    "creationTimestamp": null
  },
  "groups": [
    "system:authenticated",
    "system:masters"
  ]
}

Note that the system:authenticated virtual group is automatically granted to all authenticated users.

Create a certificate request and sign it with the API server CA (Certificate Authority)

For this example, we'll be creating a cert associated with the kubeadmin built-in user that is auto-created during OpenShift installation. Technically, the user and group values can be anything, even if they don't actually exist in the cluster (although that doesn't seem very useful unless you're pre-creating the certs before you create the relevant objects). The cluster will respect both the user and group specified in the certificate.

  1. Create a working directory and set some environment variables. The important variables are:

    • WORKDIR: The path where we will store file and ultimately generate the desired output
    • CERT_USER: The OpenShift User name to be associated with this certificate
    • CERT_GROUP: The OpenShift Group name to be associated with this certificate
    • CERT_DURATION: The validity period of the certificate after signing (in seconds)
    • CLUSTER_API: The API URL for the cluster, e.g. https://api.mycluster.mydomain.com:6443
    WORKDIR=~/openshift_client_cert
    CERT_USER='kube:admin'
    CERT_GROUP='system:cluster-admins'
    CLUSTER_API='https://api.mycluster.mydomain.com:6443'
    CERT_DURATION=$[3650*24*60*60] # Approximately 10 years, same as the installer-created client certificate
    mkdir -p "${WORKDIR}"/{dl,bin} && cd "${WORKDIR}"
    eval $(awk -F: '{ host = $2 ; port = $3 ; sub (/^\/*/, "", host) ; print "API_HOSTNAME=" host " API_PORT=" port }' <<< ${CLUSTER_API})
    NEW_KUBECONFIG="${WORKDIR}/kubeconfig"
  2. Download the latest stable version of OpenShift CLI, add bin direcory to the PATH environment variable.

    This step can be skipped if you already have the latest version of oc installed and available in your PATH.

    wget -P "${WORKDIR}/dl/" 'https://mirror.openshift.com/pub/openshift-v4/x86_64/clients/ocp/stable/openshift-client-linux.tar.gz' && \
    tar -C "${WORKDIR}/bin/" -zxf dl/openshift-client-linux.tar.gz oc kubectl && \
    export PATH="${WORKDIR}/bin:${PATH}"
  3. Generate a 2048-bit RSA keypair and CSR (Certificate Signing Request) in PEM format. Store the generated key in a new kubeconfig file.

    The generated kubeconfig file mimics the naming scheme for clusters, users and contexts created by oc login, so it should be safe to merge this file with others, as the names should not overlap.

    TLS_TMP=$(mktemp --directory)
    openssl req -new -subj "/O=${CERT_GROUP}/CN=${CERT_USER}" \
      -newkey rsa:2048 -sha256 \
      -addext 'basicConstraints=critical,CA:false' \
      -addext 'keyUsage=critical,digitalSignature' \
      -addext 'extendedKeyUsage=critical,clientAuth' \
      -keyout ${TLS_TMP}/key.pem -out ${TLS_TMP}/csr.pem -noenc && \
    CTX_CLUSTER="${API_HOSTNAME//./-}:${API_PORT}" && \
    CTX_USER="${CERT_USER}/${CTX_CLUSTER}" && \
    NEW_CONTEXT="default/${CTX_CLUSTER}/${CERT_USER}" && \
    oc --kubeconfig "${NEW_KUBECONFIG}" config set-credentials ${CTX_USER} --embed-certs --client-key=${TLS_TMP}/key.pem && \
    oc --kubeconfig "${NEW_KUBECONFIG}" config set-cluster "${CTX_CLUSTER}" --server "${CLUSTER_API}" && \
    oc --kubeconfig "${NEW_KUBECONFIG}" config set-context "${NEW_CONTEXT}" --cluster "${CTX_CLUSTER}" --user "${CTX_USER}" --namespace default && \
    oc --kubeconfig "${NEW_KUBECONFIG}" config use-context "${NEW_CONTEXT}"
  4. Login to your cluster with the CLI as a user with appropriate privileges to create and approve a CertificateSigningRequest (likely a user with the cluster-admin role).

    The exact login method will vary based on your identity provider configuration, but this is an example of logging in to a cluster configured to present a trusted certificate for the API server, using the installer-created kubeadmin account:

    oc login --user kubeadmin --server "${CLUSTER_API}"
  5. Create the CertificateSigningRequest object, approve it and wait for it to be issued.

    REQ_NAME=$(oc create --filename - --output jsonpath='{.metadata.name}' << EOF
    apiVersion: certificates.k8s.io/v1
    kind: CertificateSigningRequest
    metadata:
      generateName: client-cert-${CERT_USER//:/-}-
    spec:
      signerName: kubernetes.io/kube-apiserver-client
      expirationSeconds: ${CERT_DURATION}
      usages:
      - digital signature
      - client auth
      request: $(base64 --wrap 0 < "${TLS_TMP}/csr.pem")
    EOF
    ) && \
    oc adm certificate approve "${REQ_NAME}" && \
    unset CERT_SIGNED ; until ! [ -z "${CERT_SIGNED}" ]; do
      CERT_SIGNED=$(oc get CertificateSigningRequest "${REQ_NAME}" -o jsonpath='{.status.certificate}')
      if [ -z "${CERT_SIGNED}" ] ; then sleep 5 ; fi
    done && \
    base64 -d <<< "${CERT_SIGNED}" > ${TLS_TMP}/cert.pem
  6. Add the signed certificate to the new kubeconfig file and test.

    oc --kubeconfig "${NEW_KUBECONFIG}" config set-credentials "${CTX_USER}" --embed-certs --client-certificate="${TLS_TMP}/cert.pem" && \
    oc --kubeconfig "${NEW_KUBECONFIG}" --insecure-skip-tls-verify whoami && \
    oc --kubeconfig "${NEW_KUBECONFIG}" --insecure-skip-tls-verify get --raw '/apis/user.openshift.io/v1/users/~' | jq .

    The output of the final command should look similar to this (the specific user name and groups may differ depending on your choices):

    {
      "kind": "User",
      "apiVersion": "user.openshift.io/v1",
      "metadata": {
        "name": "kube:admin",
        "creationTimestamp": null
      },
      "groups": [
        "system:authenticated",
        "system:cluster-admins"
      ]
    }
  7. Clean up the temporary certificate directory as it is no longer needed.

    rm -rf "${TLS_TMP}" && unset TLS_TMP
  8. OPTIONAL: If your cluster API is using a self-signed certificate, you can add it to your kubeconfig file to avoid needing the --insecure-skip-tls-verify flag.

    API_CERT=$(mktemp) && \
    openssl s_client -connect "${API_HOSTNAME}:${API_PORT}" -servername "${API_HOSTNAME}" -showcerts < /dev/null | openssl x509 > "${API_CERT}" && \
    oc --kubeconfig "${NEW_KUBECONFIG}" config set-cluster "${CTX_CLUSTER}" --embed-certs --certificate-authority "${API_CERT}" && \
    echo 'Testing querying the cluster API server without the "--insecure-skip-tls-verify" flag:'
    oc --kubeconfig "${NEW_KUBECONFIG}" whoami && \
    oc --kubeconfig "${NEW_KUBECONFIG}" get --raw '/apis/user.openshift.io/v1/users/~' | jq .
  9. OPTIONAL BUT HIGHLY RECOMMENDED: Move the generated kubeconfig file somewhere secure for safekeeping

    The generated file is an access credential (similar to a password) and should be protected as such.

Configuring oc to use the created kubeconfig file when it is used in the future

  1. Set the KUBECONFIG environment variable to the path to the created kubeconfig file.

    export KUBECONFIG=/path/to/generated/kubeconfig
  2. Test with oc whoami

    oc whoami

Deploying Single-Node OpenShift on a spare laptop without a dedicated DNS or DHCP server

Purpose/motivation

I wanted to have a "real" (as in fully functional from an API point of view, definitely not production-grade) OpenShift cluster using a couple spare laptops I already owned without creating a dedicated lab network or purchasing and maintaining additional hardware, and my home network has a simple commodity router without advanced DNS or DHCP management features. So the goal here is getting as much functionality as possible out of the commercial OpenShift offering as simply and with as small a hardware footprint as possible. This is definitely not intended to be a production-grade cluster and is only suitable for education and experimentation.

If you are looking for a less "converged" deployment or you'd prefer to use the open-source upstream project (OKD) rather than the commercial product (OpenShift), I recommend checking out this blog from my colleague Charro.

Prerequisites

  • You will require a Red Hat account to download OpenShift pull secrets. It can be a personal account, it doesn't have to be a corporate account or associated with an existing organization). If you do not have an account, you can create one by going here and clicking "Register for a Red Hat account".

  • You will need an x86-64 host meeting the minimum requirements to run RHCOS and host OpenShift. This may be a laptop or desktop. It must have a wired network adapter, as CoreOS does not currently support wireless networking. Depending on your specific configuration, you may have to disable the wireless adapter in the system firmware or blacklist the driver module in the kernel boot parameters, as some drivers crash when required support files are missing. As an example, I used the following old laptops I had lying around:

    • Lenovo ThinkPad T530 (control-plane)
      • 4C/8T Intel(R) Core(TM) i7-3610QM CPU @ 2.30GHz
      • 16GiB RAM
      • 1TB SSD
    • Dell Latitude E5530 (OPTIONAL worker)
      • 2C/4T Intel(R) Core(TM) i7-3540M CPU @ 3.00GHz
      • 12GiB RAM
      • 500GB HDD
  • You must have DHCP available and your DHCP server must be capable of "reserving" an IP address for your control-plane node and any other nodes that will participate in your cluster. This is configurable on most commodity routers. Your DNS server must be able to resolve the name(s) of the control-plane node and any worker node(s) that may be added later.

  • You must have a connection to the public Internet and a DNS server capable of resolving the various services required to install a cluster, primarily the container registries. In OpenShift parlance, this is a "connected" install. Internet access with working DNS is an ongoing requirement to receive software updates for the cluster.

  • You will need an x86-64 administrative workstation (Fedora Linux 37 was used for these instructions) in order to prepare the CoreOS installation media and serve the generated installation manifest to the control-plane node during the first phase of installation. A similar enough environment (say Ubuntu on Windows with WSL) should work with minor changes.

  • This guide assumes a fairly simple network environment, with no network address translation (NAT) between the administrative workstation and the host(s) being deployed and a shared DNS server. A more complex network environment may require adjustments to detected IPs and/or host names and additional configuration of the destination host(s), the administrative workstation, and/or key network services, such as firewalls.

  • This guide was written for and tested against OpenShift 4.12.3. Some changes may be required to adapt the instructions to OKD, the open-source upstream of OpenShift, or to future OpenShift versions, which will likely employ newer versions of the Ignition configuration specification.

  • You will need an SSH public/private keypair (the instructions assume an RSA key, but anything supported by OpenSSH 8.0p1 is usable)

  • Required packages/commands to be installed

    • bash 4.00 or greater
    • GNU tar
    • wget
    • dd (or another utility capable of writing an image to a USB drive)
    • GNU awk
    • nmcli (for distributions using NetworkManager)
    • ip (from the iproute package)
    • cp, rm, mkdir, mktemp, base64, cat, and similar utilities from the GNU coreutils package (or compatible equivalents)
    • firewall-cmd (for distributions using firewalld to manage the local firewall)
    • Python 3.0 or greater (or another way to trivially host some files at an HTTP location)
    • OpenSSH or another SSH client

Setup working directory and download prerequisite binaries

  1. Define key variables and create working directory. The important variables are:

    • WORKDIR: The path where we will store file and ultimately generate the desired output
    • CLUSTER_NAME: The name of the openshift cluster, e.g. "my-cluster"
    • OCP_DOMAIN: The DNS base domain for the cluster, e.g. ".acme.com"
    • CTRL_NAME: The DNS name of the control-plane node for your SNO cluster (must be resolvable via DNS)
    • DEFAULT_DNS: IP address for your default DNS server (usually assigned by your router) - we attempt to detect the correct value
    • CTRL_IP: IP address of the control-plane node for your SNO cluster - we attempt to detect the correct value
    export WORKDIR=~/sno \
           CLUSTER_NAME=snocone \
           OCP_DOMAIN=lan \
           CTRL_NAME=lab-t530.lan \
           IGN_URL=http://$(ip route | awk '/^default via/ && $8 == "src" {print $9}'):8000/bootstrap-in-place-for-live-iso.ign && \
           DEFAULT_DNS=$(ip route | awk '/^default via/ && $4 == "dev" {print $3}')
    export CTRL_IP=$(dig +short ${CTRL_NAME}) && \
    if [ -z "${CTRL_IP}" ] ; then read -p 'Enter controller IP: ' CTRL_IP ; fi && \
    read -p 'Default DNS IP: ' -e -i "${DEFAULT_DNS}" DEFAULT_DNS && \
    printf 'Controller hostname: %s\nController IP: %s\nDefault DNS: %s\nConfigure your workstation to use %s as your DNS server to access cluster resources.\n' ${CTRL_NAME} ${CTRL_IP} ${DEFAULT_DNS} ${CTRL_IP}
    
    mkdir -p ${WORKDIR}/{dl,bin,install}
    cd ${WORKDIR}
  2. Download pull secrets from the Red Hat Hybrid Cloud Console and save to ${WORKDIR}/ocp-pull-secrets.json

  3. Download latest stable version of OpenShift CLI and installer, add bin direcory to the PATH environment variable.

    cd ${WORKDIR} && \
    wget -P dl/ 'https://mirror.openshift.com/pub/openshift-v4/x86_64/clients/ocp/stable/openshift-client-linux.tar.gz' 'https://mirror.openshift.com/pub/openshift-v4/x86_64/clients/ocp/stable/openshift-install-linux.tar.gz' 'https://mirror.openshift.com/pub/openshift-v4/x86_64/clients/ocp/stable/release.txt' && \
    tar -C bin/ -zxf dl/openshift-client-linux.tar.gz oc kubectl && \
    tar -C bin/ -zxf dl/openshift-install-linux.tar.gz openshift-install
    export PATH="${WORKDIR}/bin:${PATH}"
  4. Download the latest version of Red Hat CoreOS for the selected OpenShift release.

    cd ${WORKDIR} && \
    export OCP_VERSION=$(./bin/oc version | awk -F': ' '$1 == "Client Version" {print $2}') && \
    export OCP_XY=$(grep -Eoe '^[[:digit:]]{0,3}\.[[:digit:]]{0,3}' <<< ${OCP_VERSION}) && \
    wget -P dl/ "https://mirror.openshift.com/pub/openshift-v4/x86_64/dependencies/rhcos/${OCP_XY}/latest/rhcos-live.x86_64.iso"

Prepare installation environment

  1. Detect network settings (if on same network), otherwise set OCP_DOMAIN and OCP_MACHINENET manually.

    ACTIVE_NETDEV=$(ip route | awk '/^default via/ && $4 == "dev" {print $5}')
    CONN_UUID=$(nmcli conn show --active | awk "/${ACTIVE_NETDEV}/ { print \$2 }")
    if [ -z ${OCP_DOMAIN} ] ; then
      OCP_DOMAIN=$(nmcli --fields IP4.DOMAIN conn show uuid ${CONN_UUID} | awk 'NR == 1 && $1 ~ /IP4\.DOMAIN/ {print $2}')
    fi
    if [ -z ${OCP_MACHINENET} ] ; then
      OCP_MACHINENET=$(ip route | awk "\$2 == \"dev\" && \$3 == \"${ACTIVE_NETDEV}\" {print \$1}")
    fi
  2. Generate install-config.yaml from detected values.

    cat > install-config.yaml << EOF
    apiVersion: v1
    metadata:
      name: ${CLUSTER_NAME}
    baseDomain: ${OCP_DOMAIN}
    networking:
      networkType: OVNKubernetes
      machineNetwork:
      - cidr: ${OCP_MACHINENET}
    BootstrapInPlace:
      InstallationDisk: /dev/sda
    compute:
    - architecture: amd64
      name: worker
      replicas: 0
    controlPlane:
      architecture: amd64
      hyperthreading: Enabled
      name: master
      platform: {}
      replicas: 1
    platform:
      none: {}
    pullSecret: |
    $(sed 's/^/  /' < ocp-pull-secrets.json)
    sshKey: |
    $(awk '{print $1,$2}' ~/.ssh/id_rsa.pub | sed 's/^/  /')
    EOF
  3. Create YAML manifests for self-hosted external DNS.

    This allows us to serve the required wildcard DNS entries (*.apps.<CLUSTER_NAMR>.<YOUR_DOMAIN>) for cluster services from within the cluster itself. We run a custom deployment of CoreDNS for this purpose. We pre-create the manifests because the cluster requires these DNS records to initialize completely without outside interaction. The deployment of CoreDNS is configured to forward requests that don't match the cluster domain or cluster.local to ${DEFAULT_DNS} (probably your home router), which we set above. So things should "just work" unless you have a very complicated DNS configuration and/or specific security requirements you need to address.

    DNS_REGEX=.${CLUSTER_NAME}.${OCP_DOMAIN}.
    DNS_REGEX=${DNS_REGEX//./\\.}
    
    cat > 99-dns-ns.yaml << DNSEOF
    apiVersion: v1
    kind: Namespace
    metadata:
      name: snodns
    DNSEOF
    
    cat > 99-dns-configmap.yaml << DNSEOF
    apiVersion: v1
    kind: ConfigMap
    metadata:
      namespace: snodns
      name: dnsconfig
    data:
      Corefile: |
        (stdconf) {
          errors
          log
          cache
          prometheus
          reload 60s
          health
          ready
        }
        dns://.:5353 {
          forward . dns://${DEFAULT_DNS} {
            except cluster.local
          }
        }
        dns://snocone.lan:5353 {
          import stdconf
          template IN A snocone.lan {
            match "^(api|api-int|dns)${DNS_REGEX}$"
            answer "{{ .Name }} 3600 IN A {\$HOST_IP}"
            fallthrough
            }
          template ANY ANY snocone.lan {
            rcode "NXDOMAIN"
            }
          }
        dns://apps.${CLUSTER_NAME}.${OCP_DOMAIN}:5353 {
          import stdconf
          file /etc/coredns/zone.txt {
            reload 60s
            }
          }
      zone.txt: |
        \$ORIGIN apps.snocone.lan.
        @ 3600 IN SOA dns.${CLUSTER_NAME}.${OCP_DOMAIN}. noreply.${CLUSTER_NAME}.${OCP_DOMAIN}. $(date '+%y%m%d%H%M') 7200 3600 1209600 3600
        3600   IN NS dns.${CLUSTER_NAME}.${OCP_DOMAIN}.
        *      IN CNAME api.${CLUSTER_NAME}.${OCP_DOMAIN}.
    DNSEOF
    
    cat > 99-dns-daemonset.yaml << DNSEOF
    apiVersion: apps/v1
    kind: DaemonSet
    metadata:
      labels:
        app: snodns
      name: snodns
      namespace: snodns
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: snodns
      template:
        metadata:
          labels:
            app: snodns
        spec:
          nodeSelector:
            node-role.kubernetes.io/control-plane: ""
          tolerations:
          - key: node-role.kubernetes.io/master
            operator: Exists
          - key: node-role.kubernetes.io/control-plane
            operator: Exists
          - key: node.kubernetes.io/unreachable
            effect: NoSchedule
          - key: node.kubernetes.io/unreachable
            effect: NoSchedule
          containers:
          - name: coredns
            image: docker.io/coredns/coredns:1.10.1
            args:
            - -conf
            - /etc/coredns/Corefile
            securityContext:
              allowPrivilegeEscalation: false
              runAsNonRoot: true
              seccompProfile:
                type: RuntimeDefault
              capabilities:
                drop:
                - ALL
            livenessProbe:
              httpGet:
                path: /health
                port: 8080
            readinessProbe:
              httpGet:
                path: /ready
                port: 8181
            env:
              - name: HOST_IP
                valueFrom:
                  fieldRef:
                    fieldPath: status.hostIP
            volumeMounts:
            - name: config
              mountPath: /etc/coredns
          volumes:
          - name: config
            configMap:
              name: dnsconfig
    DNSEOF
    
    cat > 99-dns-service.yaml << DNSEOF
    apiVersion: v1
    kind: Service
    metadata:
      namespace: snodns
      labels:
        app: snodns
      name: snodns
    spec:
      ports:
      - name: dns-udp
        targetPort: 5353
        port: 53
        protocol: UDP
      - name: dns-tcp
        targetPort: 5353
        port: 53
        protocol: TCP
      selector:
        app: snodns
      type: LoadBalancer
      # Hardcoded ClusterIP because CoreDNS doesn't accept hostnames for forwarding
      clusterIP: 172.30.0.53
      externalIPs:
      - ${CTRL_IP}
    DNSEOF
    
    cat > 99-dns-forward.yaml << DNSEOF
    apiVersion: operator.openshift.io/v1
    kind: DNS
    metadata:
      name: default
    spec:
      servers:
      - name: clusterdomain
        zones:
        - ${CLUSTER_NAME}.${OCP_DOMAIN}
        forwardPlugin:
          upstreams:
          - 172.30.0.53
    DNSEOF
  4. Generate Ignition customizations and custom MachineConfigs.

    We apply customizations here to add the control-plane node to the local host file as api and api-int (these must resolve for cluster initialization to complete) and disable suspend on lid close (to allow the laptops to run normally with their lids closed). The /etc/hosts modification is done by a one-shot systemd unit, because trying to directly set the contents of /etc/hosts via MachineConfig would conflict with the internal image registtry, which also publishes an entry in /etc/hosts.

    CTRL_HOSTNAME=$(base64 --wrap 0 <<< ${CTRL_NAME/.*/})
    
    HOSTS_ENTRY=$(printf '%s\tapi.%s api-int.%s\n' "${CTRL_IP}" "${CLUSTER_NAME}.${OCP_DOMAIN}" "${CLUSTER_NAME}.${OCP_DOMAIN}" | base64 --wrap 0 )
    
    LOGIND_CONFIG=$(base64 --wrap 0 << 'EOF'
    [Login]
    HandleLidSwitch=ignore
    HandleLidSwitchExternalPower=ignore
    HandleLidSwitchDocked=ignore
    EOF
    )
    
    API_SCRIPT=$(base64 --wrap 0 << SCRIPT_EOF
    #!/usr/bin/bash
    CTRL_IP=${CTRL_IP}
    CLUSTER_DOMAIN="${CLUSTER_NAME}.${OCP_DOMAIN}"
    if ! grep --silent '# Added by api-resolver.service' /etc/hosts ; then
      printf '%s\\tapi.%s api-int.%s # Added by api-resolver.service\n' "\${CTRL_IP}" "\${CLUSTER_DOMAIN}" "\${CLUSTER_DOMAIN}" >> /etc/hosts
    fi
    SCRIPT_EOF
    )
    
    cat > custom-sno.ign << IGN_EOF
    {
      "ignition": {
        "version": "3.3.0",
        "config": {
          "merge": [
            {
              "source": "${IGN_URL}"
            }
          ]
        }
      },
      "storage": {
        "files": [
          {
            "overwrite": true,
            "path": "/etc/hostname",
            "contents": {
              "source": "data:text/plain;charset=utf-8;base64,${CTRL_HOSTNAME}"
            }
          },
          {
            "overwrite": true,
            "path": "/etc/systemd/logind.conf.d/99-ignore_lid_switch.conf",
            "contents": {
              "source": "data:text/plain;charset=utf-8;base64,${LOGIND_CONFIG}"
            }
          },
          {
            "overwrite": false,
            "path": "/etc/hosts",
            "append": [
              {
              "source": "data:text/plain;charset=utf-8;base64,${HOSTS_ENTRY}"
              }
            ]
          }
        ]
      }
    }
    IGN_EOF
    
    cat > 01-mc-control-plane.yaml << MC_EOF
    apiVersion: machineconfiguration.openshift.io/v1
    kind: MachineConfig
    metadata:
      labels:
        machineconfiguration.openshift.io/role: "master"
      name: 01-sno-control-plane-customization
    spec:
      config:
        ignition:
          version: 3.2.0
        storage:
          files:
          - path: /etc/systemd/logind.conf.d/99-ignore_lid_switch.conf
            overwrite: true
            contents:
              source: 'data:text/plain;charset=utf-8;base64,${LOGIND_CONFIG}'
          - path: /var/usrlocal/bin/api-resolver.sh
            overwrite: true
            mode: 488 # Decimal value of octal 750
            user:
              name: root
            group:
              name: root
            contents:
              source: 'data:text/plain;charset=utf-8;base64,${API_SCRIPT}'
        systemd:
          units:
          - name: local-storage-pv.service
            enabled: true
            mask: false
            contents: |
              [Unit]
              Description=Create /var/local-storage dir(s) for "local" PVs
    
              [Service]
              Type=oneshot
              ExecStart=/usr/bin/mkdir --mode 750 --parents /var/local-storage/registry
              ExecStart=/usr/bin/chmod 750 /var/local-storage /var/local-storage/registry
              ExecStart=/usr/sbin/restorecon -F /var/local-storage /var/local-storage/registry
              RemainAfterExit=yes
    
              [Install]
              WantedBy=multi-user.target
          - name: api-resolver.service
            enabled: true
            mask: false
            contents: |
              [Unit]
              Description=Add OpenShift API hostnames to /etc/hosts
    
              [Service]
              Type=oneshot
              ExecStart=/var/usrlocal/bin/api-resolver.sh
              RemainAfterExit=yes
    
              [Install]
              WantedBy=multi-user.target
    MC_EOF
    
    cat > 01-mc-worker.yaml << MC_EOF
    apiVersion: machineconfiguration.openshift.io/v1
    kind: MachineConfig
    metadata:
      labels:
        machineconfiguration.openshift.io/role: "worker"
      name: 01-sno-worker-customization
    spec:
      config:
        ignition:
          version: 3.2.0
        storage:
          files:
          - path: /etc/systemd/logind.conf.d/99-ignore_lid_switch.conf
            overwrite: true
            contents:
              source: 'data:text/plain;charset=utf-8;base64,${LOGIND_CONFIG}'
            overwrite: true
          - path: /var/usrlocal/bin/api-resolver.sh
            overwrite: true
            mode: 488 # Decimal value of octal 750
            user:
              name: root
            group:
              name: root
            contents:
              source: 'data:text/plain;charset=utf-8;base64,${API_SCRIPT}'
        systemd:
          units:
          - name: api-resolver.service
            enabled: true
            mask: false
            contents: |
              [Unit]
              Description=Add OpenShift API hostnames to /etc/hosts
    
              [Service]
              Type=oneshot
              ExecStart=/var/usrlocal/bin/api-resolver.sh
              RemainAfterExit=yes
    
              [Install]
              WantedBy=multi-user.target
    MC_EOF
  5. Enable internal registry and configure it to use local storage with a single replica.

    This is very much not a recommended configuration, but should be fine for a lab as we have no cloud platform or object storage available.

    cat > 10-local-sc.yaml << EOF
    apiVersion: storage.k8s.io/v1
    kind: StorageClass
    metadata:
      name: local-storage
    provisioner: kubernetes.io/no-provisioner
    volumeBindingMode: WaitForFirstConsumer
    EOF
    
    cat > 10-registry-pv.yaml << EOF
    apiVersion: v1
    kind: PersistentVolume
    metadata:
      name: image-registry
    spec:
      capacity:
        storage: 100Gi
      volumeMode: Filesystem
      accessModes:
      - ReadWriteOnce
      persistentVolumeReclaimPolicy: Delete
      storageClassName: local-storage
      local:
        path: /var/local-storage/registry
      nodeAffinity:
        required:
          nodeSelectorTerms:
          - matchExpressions:
            - key: kubernetes.io/hostname
              operator: In
              values:
              - ${CTRL_NAME}
    EOF
    
    cat > 10-registry-pvc.yaml << EOF
      apiVersion: v1
      kind: PersistentVolumeClaim
      metadata:
        name: image-registry
        namespace: openshift-image-registry
      spec:
        storageClassName: local-storage
        volumeName: image-registry
        accessModes:
        - ReadWriteOnce
        volumeMode: Filesystem
        resources:
          requests:
            storage: 100Gi
    EOF
    
    cat > 10-registry-config.yaml << EOF
    apiVersion: imageregistry.operator.openshift.io/v1
    kind: Config
    metadata:
      name: cluster
    spec:
      managementState: Managed
      replicas: 1
      rolloutStrategy: Recreate
      storage:
        managementState: Managed
        pvc:
          claim: image-registry
    EOF
  6. Generate default installation manifests and then copy custom manifests into the created folder and generate the Ignition file for single-node OpenShift. Point the OpenShift client to the installer-created kubeconfig file for authentication.

    cp install-config.yaml "${WORKDIR}/install"
    ./bin/openshift-install create manifests --dir="${WORKDIR}/install"
    cp -v 99-dns-*.yaml 01-mc-*.yaml 10-registry-*.yaml "${WORKDIR}/install/manifests"
    ./bin/openshift-install create single-node-ignition-config --dir="${WORKDIR}/install"
    export KUBECONFIG="${WORKDIR}/install/auth/kubeconfig

Install OpenShift on the control-plane node

  1. Copy the CoreOS Live ISO image to a flash drive (the example below assumes the destination flash drive is attached as /dev/sdb).

    sudo dd if=dl/rhcos-live.x86_64.iso bs=1MiB oflag=direct of=/dev/sdb
  2. Enable local firewall access to Python WebServer (if not already enabled)

    INT=$(ip route | awk '/^default via/ && $4 == "dev" {print $5}')
    ZONE=$(firewall-cmd --get-zone-of-interface=${INT})
    sudo firewall-cmd --zone ${ZONE} --add-port 8000/tcp
  3. Copy Ignition files to new temporary directory and start Python Webserver (Python 3.x). Boot the CoreOS Live USB image on the intended control-plane node and specify the kernel argument specified below. You may stop the web server with CTRL-C after the custom-sno.ign and bootstrap-in-place-for-live-iso.ign files have been served to Ignition on the control-plane node. If installing on a laptop or other system with a WiFi adapter installed, you may need to blacklist the WiFi driver module to prevent it from crashing due to missing firmware (CoreOS does not currently support WiFi adapters) by additionally adding module_blacklist=mac80211,cfg80211,<WIFI_DRIVER_MODULE_NAME> to the kernel boot parameters.

    IGNTMPDIR=$(mktemp --directory) && \
    cp custom-sno.ign install/bootstrap-in-place-for-live-iso.ign ${IGNTMPDIR}/ && \
    echo -e "--------------\nStarting Python web server, press CTRL-C to stop. After booting the CoreOS Live USB image, add the following kernel boot arguments:\nignition.config.url=http://$(ip route | awk '/^default via/ && $8 == "src" {print $9}'):8000/custom-sno.ign\nAdd kernel arguments with --append-karg <KERNEL_ARGUMENT>\n--------------" && \
    (pushd ${IGNTMPDIR} && python -m http.server)
    rm -rf ${IGNTMPDIR} && \
    unset IGNTMPDIR
  4. The installation process can take 30-45 minutes or more, depending on the hardware involved. If you are able to resolve the API hostname of the cluster (api.<CLUSTER_NAME>.<CLUSTER_DOMAIN>), you can follow the progress of the installation with the following command (it will complain if it cannot resolve the API hostname):

    ./bin/openshift-install wait-for install-complete --dir "${WORKDIR}/install" --log-level debug
  5. To follow the progress of the install in detail (or if you cannot resolve the API endpoint hostname), you may SSH to the control-plane node and watch the system journal. The system will reboot at least twice during the installation and you will be disconnected each time this occurs. The SSH host key will change after the first reboot, as the system reboots into a freshly installed copy of Red Hat CoreOS.

    ssh-keygen -R lab-t530.lan && \
    ssh -o StrictHostKeyChecking=no [email protected] 'journalctl --follow'
  6. After the first reboot has completed, you are free to remove the Live USB drive from the control-plane node.

OPTIONAL: Add a worker node to a "single-node" OpenShift cluster

  1. Run the "Define key variables and create working directory" step from installation section above.

  2. Define future worker node hostname.

    export WORKER_NAME=lab-e5530.lan
  3. Log into the cluster as a user with cluster-admin privileges.

  4. Create a custom Ignition file for the new worker node.

    This is a "stub" configuration that will be merged with the configuration served by the cluster machine-config-server.

    WORKER_IGN_URL=http://$(ip route | awk '/^default via/ && $8 == "src" {print $9}'):8000/mcs-worker.ign
    
    WORKER_HOSTNAME=$(base64 --wrap 0 <<< ${WORKER_NAME/.*/})
    
    HOSTS_ENTRY=$(printf '%s\tapi.%s api-int.%s\n' "${CTRL_IP}" "${CLUSTER_NAME}.${OCP_DOMAIN}" "${CLUSTER_NAME}.${OCP_DOMAIN}" | base64 --wrap 0 )
    
    LOGIND_CONFIG=$(base64 --wrap 0 << 'EOF'
    [Login]
    HandleLidSwitch=ignore
    HandleLidSwitchExternalPower=ignore
    HandleLidSwitchDocked=ignore
    EOF
    )
    
    cat > custom-sno-worker.ign << IGN_EOF
    {
      "ignition": {
        "version": "3.3.0",
        "config": {
          "merge": [
            {
              "source": "${WORKER_IGN_URL}"
            }
          ]
        }
      },
      "storage": {
        "files": [
          {
            "overwrite": true,
            "path": "/etc/hostname",
            "contents": {
              "source": "data:text/plain;charset=utf-8;base64,${WORKER_HOSTNAME}"
            }
          },
          {
            "overwrite": true,
            "path": "/etc/systemd/logind.conf.d/99-ignore_lid_switch.conf",
            "contents": {
              "source": "data:text/plain;charset=utf-8;base64,${LOGIND_CONFIG}"
            }
          },
          {
            "overwrite": false,
            "path": "/etc/hosts",
            "append": [
              {
              "source": "data:text/plain;charset=utf-8;base64,${HOSTS_ENTRY}"
              }
            ]
          }
        ]
      }
    }
    IGN_EOF
  5. Copy Ignition files to new temporary directory and start Python Webserver (Python 3.x). Boot the CoreOS Live USB image on the intended worker node and specify the kernel argument specified below. You may stop the web server with CTRL-C after the custom-sno-worker.ign and mcs-worker.ign files have been served to Ignition on the future worker node.

    IGNTMPDIR=$(mktemp --directory) && \
    curl -skLo "${IGNTMPDIR}/mcs-worker.ign" --header 'Accept: application/vnd.coreos.ignition+json;version=3.3.0' "https://${CTRL_IP}:22623/config/worker" && \
    cp custom-sno-worker.ign ${IGNTMPDIR}/ && \
    echo -e "--------------\nStarting Python web server, press CTRL-C to stop. After booting the CoreOS Live USB image on the control-plane node, add the following kernel boot arguments:\ncoreos.inst.ignition_url=http://$(ip route | awk '/^default via/ && $8 == "src" {print $9}'):8000/custom-sno-worker.ign\ncoreos.inst.install_dev=<INSTALL_DISK>\n--------------" && \
    (pushd ${IGNTMPDIR} && python -m http.server)
    rm -rf ${IGNTMPDIR} && \
    unset IGNTMPDIR
  6. After the installation has completed, you are free to remove the CoreOS Live USB drive from the worker node.

  7. Check cluster for outstanding certificate signing requests (CSRs). It may take 7-10 minutes or more (depending on hardware configuration) for the first CSR to be created by the worker node. Sign all node-bootstrapper or system:node requests.

    unset CSR
    while [ -z "${CSR}" ]; do
      sleep 1m
      CSR=$(./bin/oc get CertificateSigningRequest -o go-template='{{range .items}}{{if eq .spec.username "system:serviceaccount:openshift-machine-config-operator:node-bootstrapper"}}{{ print .metadata.name " "}}{{end}}{{end}}{{print "\n"}}')
    done
    for req in ${CSR} ; do
      oc adm certificate approve ${req}
    done
    unset CSR
    while [ -z "${CSR}" ]; do
      sleep 1m
      CSR=$(./bin/oc get CertificateSigningRequest -o go-template='{{range .items}}{{if eq .spec.username "system:node:'"$(base64 -d <<< ${WORKER_HOSTNAME})"'"}}{{ print .metadata.name " "}}{{end}}{{end}}{{print "\n"}}')
    done
    for req in ${CSR} ; do
      oc adm certificate approve ${req}
    done
  8. Get node status. The new worker should transition to Ready status after a few minutes.

    ./bin/oc get nodes

Assorted utility tasks

  • Configure oc/kubectl clients to use the auto-created cluster-admin credential (assuming you are in the ${WORKDIR} directory)

    export KUBECONFIG=$(pwd)/install/auth/kubeconfig
  • Temporarily Configure Fedora or other systemd-resolved client to resolve names via in-cluster DNS using the IP address of the control-plane node.

    INT=$(ip route | awk '/^default via/ && $4 == "dev" {print $5}')
    sudo resolvectl dns ${INT} ${CTRL_IP}
  • Temporarily Add an alias to configure Fedora or other systemd-resolved client to resolve names via in-cluster DNS using the IP address of the control-plane node.

    INT=$(ip route | awk '/^default via/ && $4 == "dev" {print $5}')
    alias cluster-dns="sudo resolvectl dns ${INT} ${CTRL_IP}"
  • Create ISO file with embedded Ignition config

    Process details are from the CoreOS installer documentation

    IGN_FILE=combined-sno.ign
    mkdir -p tmpign && \
    pushd tmpign && \
    cp ../${IGN_FILE} config.ign && \
    touch -d @0 config.ign && chmod 0600 config.ign && \
    echo config.ign | cpio -oH newc -R 0:0 --renumber-inodes --ignore-devno | xz -zc -F xz -C crc32 > ignition.img && \
    popd && \
    IGN_DEST=sno.iso
    IGN_SIZE=$(isoinfo -l -s -i dl/rhcos-live.x86_64.iso | awk '/IGNITION\.IMG;1[[:space:]]*$/ {print $5}') && \
    IGN_OFFSET=$(isoinfo -l -i dl/rhcos-live.x86_64.iso | awk '/IGNITION\.IMG;1[[:space:]]*$/ {print $10}') && \
    cp dl/rhcos-live.x86_64.iso ${IGN_DEST} && \
    dd if=/dev/zero of=${IGN_DEST} bs=2048c seek=${IGN_OFFSET} count=${IGN_SIZE} conv=notrunc && \
    dd if=tmpign/ignition.img of=${IGN_DEST} bs=2048c seek=${IGN_OFFSET} conv=notrunc && \
    rm -rf tmpign
  • Validate an Ignition config file

    podman run --security-opt label=disable --pull always --rm -i quay.io/coreos/ignition-validate:release - < sno-lab-ignition.json
  • Use the following Ignition URL to add a worker

    curl -ko worker.ign https://lab-t530.lan:22623/config/worker
    sudo coreos-installer install --ignition-file worker.ign -- /dev/sda
  • Use the following command to start a CoreDNS container with a Corefile in the current directory

    podman run --security-opt label=disable --rm -v "$(pwd):/data:ro" --workdir /data --env HOST_IP=1.2.3.4 --name dns --publish 127.0.0.1:30053:5353/TCP --publish 127.0.0.1:30053:5353/UDP docker.io/coredns/coredns
  • Embed Ignition file into an already-written Live USB image on /dev/sda

    mkdir -p tmpign && \
    pushd tmpign && \
    cp ../install/bootstrap-in-place-for-live-iso.ign config.ign && \
    touch -d @0 config.ign && chmod 0600 config.ign && \
    echo config.ign | cpio -oH newc -R 0:0 --renumber-inodes --ignore-devno | xz -zc -F xz -C crc32 > ignition.img && \
    popd && \
    IGN_DEST=/dev/sda
    IGN_SIZE=$(sudo isoinfo -l -s -i ${IGN_DEST} | awk '/IGNITION\.IMG;1[[:space:]]*$/ {print $5}') && \
    IGN_OFFSET=$(sudo isoinfo -l -i ${IGN_DEST} | awk '/IGNITION\.IMG;1[[:space:]]*$/ {print $10}') && \
    sudo dd if=/dev/zero of=${IGN_DEST} bs=2048c seek=${IGN_OFFSET} count=${IGN_SIZE} conv=notrunc && \
    sudo dd if=tmpign/ignition.img of=${IGN_DEST} bs=2048c seek=${IGN_OFFSET} conv=notrunc && \
    rm -rf tmpign
  • Run CoreOS container to modify installation ISO files and other tasks

    podman run --security-opt label=disable --pull always --rm -v "$(pwd):/data" --workdir /data quay.io/coreos/coreos-installer:release iso ignition embed --force --ignition-file install/bootstrap-in-place-for-live-iso.ign --output sno_embed.iso rhcos-live.x86_64.iso
  • Sample pod consuming a PVC using a local storage directory

    cat > 10-local-sc.yaml << EOF
    oc delete -n default pod busybox && \
    oc create -f - << 'EOF'
    apiVersion: v1
    kind: Pod
    metadata:
      name: busybox
      namespace: default
    spec:
      containers:
        - name: busybox
          image: docker.io/library/busybox:latest
          command:
          - /bin/sleep
          - infinity
          volumeMounts:
          - mountPath: /var/local-storage
            name: test-vol
      volumes:
        - name: test-vol
          persistentVolumeClaim:
            claimName: image-registry
    EOF
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment