Skip to content

Instantly share code, notes, and snippets.

@adjam
Last active July 27, 2022 23:52
Show Gist options
  • Save adjam/673897b8d6a3786b845e94f94b7f7884 to your computer and use it in GitHub Desktop.
Save adjam/673897b8d6a3786b845e94f94b7f7884 to your computer and use it in GitHub Desktop.
Ansible action_plugin for downloading apache projects to localhost so they can be deployed to managed hosts. Validates checksum of download.
from __future__ import (absolute_import, division, print_function)
__metaclass__ = type
__doc__ = """Ansible Action Plugin that downloads (various) Apache projects
to the local machine, for easy deployment to managed machines via the `copy`
or `unarchive` modules. Fetches projects from nearest mirror and validates
their sha1 sum against the values downloaded from the main Apache site.
This plugin _requires_ the `requests` module be installed for all HTTPish
activities.
If the `gnupg` module is installed, and the project is configured to use the
'asc' verification method, it will attempt to download and import the KEYS file
from the main Apache server into an (installation-specific) GNUPG configuration
directory (so its separate from your keys) and validate the signature files
(which have the `.asc` extension) for each release against those keys. More
detailed key management (handling updates, etc.) is left as an exercise for the
reader, but a big hint here is that the ansible-specific GNUPG home directory
is `./apache_downloads/.gnupg`. Downloaded keys files will be stored in the
ansible-specific GNUPG configuration directory as `KEYS-${project.name}`, and
such files are only downloaded and imported if they aren't found.
This plugin relies on a YAML file (`apache_projects.yml`) to locate projects,
generate filenames, set verfication methods, etc. as there is some variation
among different projects (e.g. Tomcat filenames look like
`apache-tomcat-{version}.tar.gz`, Solr filenames are like `solr-{version}.tgz`,
and solr is a subproject of Lucene and as such is usually located at
`lucene/solr/` on the mirrors).
`apache_projects.yml` is processed by starting with the name and version
requested, then computed `major_version` (based on the assumption that semver
is in use) and then interpolates all of these variables into the path and
filename variables. The project 'definition' is arrived at by pulling in
`_default` first, and overlaying any values in that definition from the name of
the project.
The final values for the `path`, and `filename` (both of which are crucial for
locating the files on the mirrors!) are then computed by interpolating the
attributes already set on the project.
If a project you need is not defined, simply add it to the YAML file, following
the examples in the accompanying sample. The plugin will try bravely to use
the defaults, based on the project name you supply, but initial testing
suggests this is not likely to Just Work.
The apache_download 'task' defined by this plugin tries to be as idempotent as
possible, so it will only download files it doesn't already have. If you get
corrupted files due to an interrupted operation, just delete them and it will
start over. Note, though, that assuming the target 'tarball' and
checksum/signature file are all downloaded, the tarball will be verified every
time (as this is a purely local operation).
The 'filename' attribute of the result from this action is the full
path to the downloaded archive on the local machine, and can be used in
subsequent tasks (e.g. `unarchive`).
Since, however, you will probably want to run the apache_download` action/task
on localhost (i.e. your 'ansible master') and copy/deploy it to the hosts
you're managing, you'll probably want to use multiple playbooks.
Basic Usage (download_tomcat.yml):
```---
hosts: localhsot
`vars`:
tomcat_version: 8.5.23
- name: "Download Tomcat {{ tomcat_version }}"
apache_download: project=tomcat, version="{{ tomcat_version}}"`
# make output available to subsequent plays
register: tomcat_download
# example 2: force re-download, in case a previous attempt failed in the
# middle or you just like making apache servers work hard
- name "Download Tomcat {{ tomcat_version }}"
apache_download: project=tomcat, version="{{ tomcat_version }}" force=yes
# make output available to subsequent plays
register: tomcat_download
# example 3: download packages to alternate directory (default: ./apache_downloads)
# directory will be created if it does not exist
- name "Download Tomcat {{ tomcat_version }}"
apache_download: project=tomcat, version="{{ tomcat_version }}" output_dir=/home/user/apache_downloads
# make output available to subsequent plays
register: tomcat_download
```
The above playbook specifically targets localhost (the machine running ansible)
for the downloads. In order to use the results in a larger playbook where you
install your preferred version to a fleet of remote servers, you'll need to
write a slightly complicated play:
```
---
# ensures the above play has been run on localhost
- import_playbook: download_tomcat.yml
- hosts: mytomcatserver.university.edu
remote_user: ansible
become: yes
become_method: sudo
tasks:
- name: "Deploys Tomcat"
# note use of hostvars here -- we want the 'tomcat_download' variable
# that was registered on 'localhost'
unarchive: creates=/opt/tomcat/bin/startup.sh copy=yes dest=/opt group=tomcat owner=tomcat src="{{ hostvars['localhost']['tomcat_download'].filename }}"
```
Note that even though we registered 'tomcat_download' in the original playbook, it's not avaialble without qualification to playbooks for other hosts.
Designed for UNIX-like systems (uses tarballs, e.g.)
"""
import hashlib
import os
import re
import shutil
from StringIO import StringIO
import tempfile
import yaml
from ansible.module_utils.parsing import convert_bool
from ansible.plugins.action import ActionBase
try:
from __main__ import display
except ImportError:
from ansible.utils.display import Display
display = Display()
NO_REQUESTS_MODULE = False
try:
import requests
except ImportError:
NO_REQUESTS_MODULE = True
GNUPG_AVAILABLE = False
try:
import gnupg
GNUPG_AVAILABLE = True
except ImportError:
pass
class HashVerifier(object):
"""Generic verifier using hashlib algorithms to check integrity
of a downloaded file."""
def __init__(self, algo):
"""Creates a new verifier using a hash algorith as a checksum
:param algo: the algorithm to use, e.g. `hashlib.sha1`. This
objects will get `call`ed to initialize the digester.
"""
self.hasher = algo()
def verify(self, filename, checksum):
"""Checks that a file's contents matches the checksum content
:param filename: the complete path to the file to be checked.
:param checksum: the contents of the cheksum as a string"""
pat = re.compile(r"(?P<checksum>[a-f0-9]+)\s+\*?(?P<file>\S+)")
fname = os.path.basename(filename)
m = pat.search(checksum)
if m:
expected = m.group('checksum')
target_file = m.group('file')
if target_file != fname:
return False, "checksum file is for " + target_file + " not " + fname
with open(filename, 'rb') as f:
while True:
data = f.read(4096)
if not data:
break
self.hasher.update(data)
computed_digest = self.hasher.hexdigest()
if computed_digest != expected:
return False, "Computed hash ({}) of {} does not match expected value: {}".format(computed_digest, filename, expected)
return True, 'Checksum matches expected value'
return False, 'Checksum file not in expected format'
class GNUPGVerifier(object):
"""Checks downloaded files against PGP signatures"""
def __init__(self, project, gpgdir=None, verify_callback=None):
self.project = project
if gpgdir is None:
gpgdir = os.path.join(os.getcwd(), "apache_downloads", ".gnupg")
self.gpgdir = gpgdir
display.vvv("Using GPG directory {}".format(self.gpgdir))
if not os.path.isdir(gpgdir):
display.vv("Creating GPG directory {}".format(self.gpgdir))
os.makedirs(gpgdir)
self.gpg = gnupg.GPG(gnupghome=self.gpgdir)
keyfile = os.path.join(self.gpgdir, "KEYS-" + project.name)
if not os.path.isfile(keyfile):
display.v("Downloading keyfile for " + project.name)
keydata, encoding = self.download_keys()
display.v("Keydata encoding: {}".format(encoding))
with open(keyfile, 'w') as f:
f.write(keydata.encode(encoding))
self.gpg.import_keys(keydata)
def verify(self, filename, signature):
"""Attempts to verify the signature on a file.
By default, we'll just go for a valid signature, and not
expect that we must fully trust the signer"""
if not hasattr(signature, 'read'):
sigflo = StringIO(signature)
else:
sigflo = signature
verify = self.gpg.verify_file(sigflo, filename)
display.vv("GNUPG Verification")
if verify.trust_level is None:
return False, "Unable to validate signature"
display.vv("\t{}, key id: {}".format(verify.username, verify.key_id))
display.vv("\tsignature id: {}".format(verify.signature_id))
display.vv("\tTrust level {}".format(verify.trust_text))
# default here corresponds to: we have imported the key
# and it's a good signature against that key, but
# we haven't told GNUPG that we know who really owns the key.
if verify and verify.trust_level >= verify.TRUST_UNDEFINED:
return True, "Signature is OK"
return False, "Unable to verify signature"
def _default_keyfile_location(self):
return "/".join([ApacheProject.CHECKSUM_URL_BASE, self.project.name, 'KEYS'])
def download_keys(self):
keyfile = getattr(self.project, 'keyfile_path', self._default_keyfile_location())
if not keyfile.startswith(ApacheProject.CHECKSUM_URL_BASE):
keyfile = ApacheProject.CHECKSUM_URL_BASE + keyfile
display.v("Downloading KEYS for {} from {}".format(self.project.name, keyfile))
resp = requests.get(keyfile)
resp.raise_for_status()
for h in resp.headers:
display.vvv("\t{} => '{}'".format(h, resp.headers.get(h)))
display.vv("Content Type of response: {}".format(resp.headers.get('content-type')))
display.vv("Guessed encoding: {}".format(resp.encoding))
if resp.encoding is None:
# for some reason the server doesn't like to tell us
# this seems to be a reasonable default
display.vv("Forcing encoding to cp1252")
resp.encoding = "cp1252"
return resp.text, resp.encoding
class ApacheProject(object):
"""Encapsulates operations around a project, including locating
download mirrors and generating filenames etc."""
# main URL for checksums and KEYS files
CHECKSUM_URL_BASE = 'https://www.apache.org/dist'
# location of Apache utility that locates
# nearest mirror
MIRROR_LOCATOR_URL = 'https://www.apache.org/dyn/closer.cgi'
# support all kinds of hashes (but not you md5)
# gpg will be available if the gnupg module is installed
VERIFICATION_METHODS = [x for x in hashlib.algorithms if x != 'md5']
definitions = {}
@classmethod
def load_projects(cls, path=None):
if GNUPG_AVAILABLE:
cls.VERIFICATION_METHODS.append('asc')
cls.VERIFICATION_METHODS = tuple(cls.VERIFICATION_METHODS)
if path is None:
curdir = os.path.dirname(os.path.realpath(__file__))
path = os.path.join(curdir, 'apache_projects.yml')
with open(path, 'r') as f:
projects = yaml.load(f)
return projects
def __init__(self, name, version, path=None):
"""Defines a new project.
:param name: (required) the project name, which must exist in
the definitions file.
:param version: (required) the specific version to be downloaded.
:param path: (optional) the path from which to load common project
definitions (YAML). If unspecified, `apache_projects.yml` in the
same directory where this class is defined will be used.
"""
pds = ApacheProject.definitions
if '_default' not in pds:
pds = ApacheProject.load_projects(path)
self.name = name
self.version = version
self.major_version = version.split('.')[0]
defs = pds['_default']
self.definition = defs
if self.name not in pds:
display.warning(
"Unknown project '" + self.name + "', will continue with defaults")
display.warning("Any errors from here on down are probably because of this!")
else:
self.definition.update(pds[self.name])
display.vvv("Using project definition for '{}':".format(self.name))
for k in self.definition:
display.vvv("\t{} : {}".format(k, self.definition[k]))
for prop in ('path', 'filename', 'extension', 'verify_method', 'keyfile_path'):
if prop in self.definition:
setattr(self, prop, self.definition[prop])
self.verification_supported = self.verify_method in self.VERIFICATION_METHODS
# now we interpolate!
self.path = self.path.format(**self.__dict__)
self.filename = self.filename.format(**self.__dict__)
if 'keyfile_path' in self.definition:
self.keyfile_path = self.keyfile_path.format(**self.__dict__)
self.mirror_data = None
def locate_mirror(self):
"""Fetches preferred download mirror for this project"""
if self.mirror_data is None:
parts = [ApacheProject.MIRROR_LOCATOR_URL, self.path]
url = "/".join(parts)
display.vv("Fetching mirrors from " + url)
resp = requests.get(url, params={"as_json": 1})
resp.raise_for_status()
display.vv("Fetched mirror data from " + resp.url)
self.mirror_data = resp.json()
self.download_url = self._create_mirror_url()
return self.download_url
def _create_mirror_url(self):
base = self.mirror_data['preferred'] + self.mirror_data['path_info']
display.vvv("Base URL is {}".format(base))
if base.endswith('/'):
return base + self.filename
return base + '/' + self.filename
@property
def checksum_filename(self):
return "{filename}.{verify_method}".format(**self.__dict__)
@property
def checksum_url(self):
return "/".join([ApacheProject.CHECKSUM_URL_BASE,
self.path,
self.checksum_filename])
def get_verifier(self):
if self.verify_method == 'asc' and GNUPG_AVAILABLE:
return GNUPGVerifier(self)
else:
if hasattr(hashlib, self.verify_method):
return HashVerifier(getattr(hashlib, self.verify_method))
raise ValueError("No verifier available for '" + self.verify_method + "'")
class ActionModule(ActionBase):
'''Downloads projects from Apache, using mirrors and verifying
checksums'''
def _arg(self, arg, default=None):
"""Reduce visual noise when getting task arguments"""
return self._task.args.get(arg, default)
def error_out(self, msg):
"""Makes returning failure results less noisy"""
self._result.update({
'failed': True,
'msg': msg
})
return self._result
def _do_download(self, project, dest_file):
dest_base = os.path.dirname(dest_file)
if not os.path.isdir(dest_base):
display.v("Creating output directory " + dest_base)
os.makedirs(dest_base)
download_url = project.locate_mirror()
display.v("Downloading {} {} from {}".format(
project.name,
project.version,
download_url)
)
with tempfile.TemporaryFile(suffix='-apachedl') as tempout:
resp = requests.get(download_url, stream=True)
resp.raise_for_status()
for chunk in resp.iter_content(4096):
tempout.write(chunk)
tempout.seek(0)
with open(dest_file, 'wb') as output:
shutil.copyfileobj(tempout, output)
self._result['filename'] = dest_file
def verification_warn(self, project):
display.error("Project {} uses {} to verify downloads, which is not supported by this plugin (yet)".format(project.name, project.verify_method))
display.error("Supported methods:")
for vmeth in ApacheProject.VERIFICATION_METHODS:
display.error("\t" + vmeth)
def run(self, tmp=None, task_vars=None):
if task_vars is None:
task_vars = {}
self._result = super(ActionModule, self).run(tmp, task_vars)
if NO_REQUESTS_MODULE:
return self.error_out('This plugin requires the requests module')
project_name = self._arg('project')
version = self._arg('version')
if project_name is None:
return self.error_out("'project' argument is required")
if version is None:
return self.error_out("'version' argument is required")
project = ApacheProject(project_name, version)
default_output_dir = os.path.join(os.getcwd(), 'apache_downloads')
output_dir = self._arg('output_dir', default_output_dir)
dest_file = os.path.abspath(os.path.join(output_dir, project.filename))
if not project.verification_supported:
self.verification_warn(project)
check_file = os.path.join(output_dir, project.checksum_filename)
force = convert_bool.boolean(self._arg('force', False))
display.vv("Project: " + project.name)
display.vv("Version: " + project.version)
display.vv("Filename: " + project.filename)
display.vv("Force download: {}".format(force))
if not os.path.exists(dest_file) or force:
try:
self._do_download(project, dest_file)
except requests.HTTPError, e:
display.error("Unable to fetch {} from {}: {}".format(
project.name,
project.locate_mirror(),
str(e)
)
)
return self.error_out("unable to fetch project from Apache mirror")
else:
display.vv("File already downloaded")
self._result['filename'] = dest_file
if not os.path.exists(check_file) or force:
display.v("Fetching checksum file from {}".format(project.checksum_url))
try:
resp = requests.get(project.checksum_url)
resp.raise_for_status()
with open(check_file, 'w') as f:
f.write(resp.text)
except requests.HTTPError, e:
display.error("Unable to retrieve checksum file: {}".format(str(e)))
display.error("It is possible we are using the wrong path to download it")
display.error("Check warning messages, they may provide more insight")
return self.error_out("Unable to verify download against .sha1 sum")
display.v("Checking downloaded checksum {} against {}".format(
check_file, dest_file)
)
if project.verification_supported:
with open(check_file) as f:
checksum = f.read()
check, msg = project.get_verifier().verify(dest_file, checksum)
if not check:
self._result.update({'failed': True, 'msg': msg})
else:
self._result.update({'failed': True, 'msg': "Unable to verify download"})
return self._result
---
_default:
path: "{name}/{version}"
extension: tar.gz
filename: "{name}-{version}.{extension}"
verify_method: sha1
accumulo:
filename: "{name}-{version}-bin.{extension}"
# note this method (GPG signature verification) is not supported yet
verify_method: 'asc'
solr:
path: "lucene/solr/{version}"
extension: tgz
# this isn't really for real, it's
# just a test for a hash algo that's not sha1
spark:
extension: tgz
path: '{name}/{name}-{version}'
verify_method: sha512
tomcat:
path: "tomcat/tomcat-{major_version}/v{version}/bin"
filename: "apache-tomcat-{version}.{extension}"
verify_method: asc
keyfile_path: "/tomcat/tomcat-{major_version}/KEYS"
zookeeper:
path: "zookeeper/zookeeper-{version}"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment