Skip to content

Instantly share code, notes, and snippets.

@dmick
Created May 4, 2023 09:05
Show Gist options
  • Save dmick/5284027139653b79e876d7e1a6bd2321 to your computer and use it in GitHub Desktop.
Save dmick/5284027139653b79e876d7e1a6bd2321 to your computer and use it in GitHub Desktop.
image capture flow:
jenkins job sepia-fog-images runs on the teuthology host
1) clones/updates and sets up teuthology to use teuthology-lock
2) clones/updates ceph-cm-ansible
3) locks a machine of the requested type(s) (or uses hosts passed in as
arguments), setting their descriptions to "Locked to capture FOG image
for Jenkins build ###"
4) uses /usr/local/sbin/set-next-server.sh on the store01 DHCP server to
set the targets to PXE boot from cobbler (rather than fog) and restarts
the dhcpd
5) sshes to [email protected] to set the right cobbler profile for
the host and enable netboot
6) powercycles the hosts in question
7) while the hosts are rebooting, Uses curl and the FOG api to GET an
image id or POST an image template to create the image, and then sets
up fog for image capture
8) sleeps for 10s to allow the hosts to become inaccessible so it can..
9) ..start polling for the sentinel file /ceph-qa-ready which is created
at the very end of the process. (The cobbler install flow is documented
below)
10) If there's an error or /ceph-qa-ready isn't present, retry
for up to 2 hours. If normal completion is seen, set DHCP back to
PXE-from-fog, run ansible-playbook (from teuthology, against the host)
with tools/prep-fog-capture.yml, which removes some files from the
prior installation:
- /etc/udev/rules.d/70-persistent-net.rules
- /.cephlab_net_configured
- /ceph-qa-ready
disables network configuration, kills the /var/lib/ceph mount and removes
from fstab, removes any ssh host keys, unsubscribes from RHEL, removes
a katello.facts file, disables periodic dnf makecache jobs, cleans the
dnf cache, stops ntp/chrony, and sets the hwclock
11) restarts dhcpd
12) waits for any in-progress fog images to complete, pauses the
teuthology queue if there are any
13) powercycles the targets to boot into FOG and capture, and waits for
FOG task completion
14) teuthology-lock --unlock's any locked hosts and unpauses the queue if needed
Cobbler install flow:
1) do a normal preseed/kickstart install with cobbler-defined
preseed/kickstart files. Some extra definitions:
- a smallish set of packages to install
- grub serial console setup
- *** install an /etc/rc.local to run once on first reboot
- install with ext4 on the appropriate drive, without swap
- set up subscription manager
- add the cm user with the admin_users' keys and passwordless sudo
- turn off cobbler PXE boot
2) after rebooting to the fresh install, /etc/rc.local runs:
- search the nics for any active interfaces, and set them up for DHCP; if
they receive no DHCP address, unconfigure them, assumption being they're
not on any network we should configure
- touch /.cephlab_net_configured when done
- try to get a hostname from reverse DNS and configure it
- generate SSH host keys
- ping the cobbler host to make sure it's reachable
- curl the cblr/svc/op/trig/mode/post/system/<hostname>, which will run the
/var/lib/cobbler/triggers/install/post/cephlab_ansible.sh script from
cobbler to the target host
3) cephlab_ansible.sh will (running on the cobbler host):
- use scl on Centos 7 to get python 3.8
- clone/update ceph-cm-ansible and ceph-sepia-secrets (root's ssh key allows
access to the latter on github)
- look for port 22 to be open; there is apparently a way this trigger might
run before the install is done, and if so, 22 won't be available; when it's
run from the /etc/rc.local in step 2 above, it will find 22 open
- create a /var/log/ansible and put log output there in a file named <hostname>
- for CentOS 8 Stream, run tools/convert-to-centos-stream.yml
- if the Cobbler profile is named '*-stock', stop there
- run ansible cephlab.yml, skipping users,pubkeys,zap
4) cephlab.yml runs teuthology.yml for teuthology, testnodes.yml for testnodes,
container-host.yml for docker/podman installation, cobbler.yml for cobbler
hosts, same with paddles and pulpito, and finally, for testnodes, touches the
/ceph-qa-ready sentinel, used by the fog capture process above to notice that
the installation is finished and proceed with the capture process.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment