Last active
July 27, 2022 23:52
-
-
Save adjam/673897b8d6a3786b845e94f94b7f7884 to your computer and use it in GitHub Desktop.
Ansible action_plugin for downloading apache projects to localhost so they can be deployed to managed hosts. Validates checksum of download.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from __future__ import (absolute_import, division, print_function) | |
__metaclass__ = type | |
__doc__ = """Ansible Action Plugin that downloads (various) Apache projects | |
to the local machine, for easy deployment to managed machines via the `copy` | |
or `unarchive` modules. Fetches projects from nearest mirror and validates | |
their sha1 sum against the values downloaded from the main Apache site. | |
This plugin _requires_ the `requests` module be installed for all HTTPish | |
activities. | |
If the `gnupg` module is installed, and the project is configured to use the | |
'asc' verification method, it will attempt to download and import the KEYS file | |
from the main Apache server into an (installation-specific) GNUPG configuration | |
directory (so its separate from your keys) and validate the signature files | |
(which have the `.asc` extension) for each release against those keys. More | |
detailed key management (handling updates, etc.) is left as an exercise for the | |
reader, but a big hint here is that the ansible-specific GNUPG home directory | |
is `./apache_downloads/.gnupg`. Downloaded keys files will be stored in the | |
ansible-specific GNUPG configuration directory as `KEYS-${project.name}`, and | |
such files are only downloaded and imported if they aren't found. | |
This plugin relies on a YAML file (`apache_projects.yml`) to locate projects, | |
generate filenames, set verfication methods, etc. as there is some variation | |
among different projects (e.g. Tomcat filenames look like | |
`apache-tomcat-{version}.tar.gz`, Solr filenames are like `solr-{version}.tgz`, | |
and solr is a subproject of Lucene and as such is usually located at | |
`lucene/solr/` on the mirrors). | |
`apache_projects.yml` is processed by starting with the name and version | |
requested, then computed `major_version` (based on the assumption that semver | |
is in use) and then interpolates all of these variables into the path and | |
filename variables. The project 'definition' is arrived at by pulling in | |
`_default` first, and overlaying any values in that definition from the name of | |
the project. | |
The final values for the `path`, and `filename` (both of which are crucial for | |
locating the files on the mirrors!) are then computed by interpolating the | |
attributes already set on the project. | |
If a project you need is not defined, simply add it to the YAML file, following | |
the examples in the accompanying sample. The plugin will try bravely to use | |
the defaults, based on the project name you supply, but initial testing | |
suggests this is not likely to Just Work. | |
The apache_download 'task' defined by this plugin tries to be as idempotent as | |
possible, so it will only download files it doesn't already have. If you get | |
corrupted files due to an interrupted operation, just delete them and it will | |
start over. Note, though, that assuming the target 'tarball' and | |
checksum/signature file are all downloaded, the tarball will be verified every | |
time (as this is a purely local operation). | |
The 'filename' attribute of the result from this action is the full | |
path to the downloaded archive on the local machine, and can be used in | |
subsequent tasks (e.g. `unarchive`). | |
Since, however, you will probably want to run the apache_download` action/task | |
on localhost (i.e. your 'ansible master') and copy/deploy it to the hosts | |
you're managing, you'll probably want to use multiple playbooks. | |
Basic Usage (download_tomcat.yml): | |
```--- | |
hosts: localhsot | |
`vars`: | |
tomcat_version: 8.5.23 | |
- name: "Download Tomcat {{ tomcat_version }}" | |
apache_download: project=tomcat, version="{{ tomcat_version}}"` | |
# make output available to subsequent plays | |
register: tomcat_download | |
# example 2: force re-download, in case a previous attempt failed in the | |
# middle or you just like making apache servers work hard | |
- name "Download Tomcat {{ tomcat_version }}" | |
apache_download: project=tomcat, version="{{ tomcat_version }}" force=yes | |
# make output available to subsequent plays | |
register: tomcat_download | |
# example 3: download packages to alternate directory (default: ./apache_downloads) | |
# directory will be created if it does not exist | |
- name "Download Tomcat {{ tomcat_version }}" | |
apache_download: project=tomcat, version="{{ tomcat_version }}" output_dir=/home/user/apache_downloads | |
# make output available to subsequent plays | |
register: tomcat_download | |
``` | |
The above playbook specifically targets localhost (the machine running ansible) | |
for the downloads. In order to use the results in a larger playbook where you | |
install your preferred version to a fleet of remote servers, you'll need to | |
write a slightly complicated play: | |
``` | |
--- | |
# ensures the above play has been run on localhost | |
- import_playbook: download_tomcat.yml | |
- hosts: mytomcatserver.university.edu | |
remote_user: ansible | |
become: yes | |
become_method: sudo | |
tasks: | |
- name: "Deploys Tomcat" | |
# note use of hostvars here -- we want the 'tomcat_download' variable | |
# that was registered on 'localhost' | |
unarchive: creates=/opt/tomcat/bin/startup.sh copy=yes dest=/opt group=tomcat owner=tomcat src="{{ hostvars['localhost']['tomcat_download'].filename }}" | |
``` | |
Note that even though we registered 'tomcat_download' in the original playbook, it's not avaialble without qualification to playbooks for other hosts. | |
Designed for UNIX-like systems (uses tarballs, e.g.) | |
""" | |
import hashlib | |
import os | |
import re | |
import shutil | |
from StringIO import StringIO | |
import tempfile | |
import yaml | |
from ansible.module_utils.parsing import convert_bool | |
from ansible.plugins.action import ActionBase | |
try: | |
from __main__ import display | |
except ImportError: | |
from ansible.utils.display import Display | |
display = Display() | |
NO_REQUESTS_MODULE = False | |
try: | |
import requests | |
except ImportError: | |
NO_REQUESTS_MODULE = True | |
GNUPG_AVAILABLE = False | |
try: | |
import gnupg | |
GNUPG_AVAILABLE = True | |
except ImportError: | |
pass | |
class HashVerifier(object): | |
"""Generic verifier using hashlib algorithms to check integrity | |
of a downloaded file.""" | |
def __init__(self, algo): | |
"""Creates a new verifier using a hash algorith as a checksum | |
:param algo: the algorithm to use, e.g. `hashlib.sha1`. This | |
objects will get `call`ed to initialize the digester. | |
""" | |
self.hasher = algo() | |
def verify(self, filename, checksum): | |
"""Checks that a file's contents matches the checksum content | |
:param filename: the complete path to the file to be checked. | |
:param checksum: the contents of the cheksum as a string""" | |
pat = re.compile(r"(?P<checksum>[a-f0-9]+)\s+\*?(?P<file>\S+)") | |
fname = os.path.basename(filename) | |
m = pat.search(checksum) | |
if m: | |
expected = m.group('checksum') | |
target_file = m.group('file') | |
if target_file != fname: | |
return False, "checksum file is for " + target_file + " not " + fname | |
with open(filename, 'rb') as f: | |
while True: | |
data = f.read(4096) | |
if not data: | |
break | |
self.hasher.update(data) | |
computed_digest = self.hasher.hexdigest() | |
if computed_digest != expected: | |
return False, "Computed hash ({}) of {} does not match expected value: {}".format(computed_digest, filename, expected) | |
return True, 'Checksum matches expected value' | |
return False, 'Checksum file not in expected format' | |
class GNUPGVerifier(object): | |
"""Checks downloaded files against PGP signatures""" | |
def __init__(self, project, gpgdir=None, verify_callback=None): | |
self.project = project | |
if gpgdir is None: | |
gpgdir = os.path.join(os.getcwd(), "apache_downloads", ".gnupg") | |
self.gpgdir = gpgdir | |
display.vvv("Using GPG directory {}".format(self.gpgdir)) | |
if not os.path.isdir(gpgdir): | |
display.vv("Creating GPG directory {}".format(self.gpgdir)) | |
os.makedirs(gpgdir) | |
self.gpg = gnupg.GPG(gnupghome=self.gpgdir) | |
keyfile = os.path.join(self.gpgdir, "KEYS-" + project.name) | |
if not os.path.isfile(keyfile): | |
display.v("Downloading keyfile for " + project.name) | |
keydata, encoding = self.download_keys() | |
display.v("Keydata encoding: {}".format(encoding)) | |
with open(keyfile, 'w') as f: | |
f.write(keydata.encode(encoding)) | |
self.gpg.import_keys(keydata) | |
def verify(self, filename, signature): | |
"""Attempts to verify the signature on a file. | |
By default, we'll just go for a valid signature, and not | |
expect that we must fully trust the signer""" | |
if not hasattr(signature, 'read'): | |
sigflo = StringIO(signature) | |
else: | |
sigflo = signature | |
verify = self.gpg.verify_file(sigflo, filename) | |
display.vv("GNUPG Verification") | |
if verify.trust_level is None: | |
return False, "Unable to validate signature" | |
display.vv("\t{}, key id: {}".format(verify.username, verify.key_id)) | |
display.vv("\tsignature id: {}".format(verify.signature_id)) | |
display.vv("\tTrust level {}".format(verify.trust_text)) | |
# default here corresponds to: we have imported the key | |
# and it's a good signature against that key, but | |
# we haven't told GNUPG that we know who really owns the key. | |
if verify and verify.trust_level >= verify.TRUST_UNDEFINED: | |
return True, "Signature is OK" | |
return False, "Unable to verify signature" | |
def _default_keyfile_location(self): | |
return "/".join([ApacheProject.CHECKSUM_URL_BASE, self.project.name, 'KEYS']) | |
def download_keys(self): | |
keyfile = getattr(self.project, 'keyfile_path', self._default_keyfile_location()) | |
if not keyfile.startswith(ApacheProject.CHECKSUM_URL_BASE): | |
keyfile = ApacheProject.CHECKSUM_URL_BASE + keyfile | |
display.v("Downloading KEYS for {} from {}".format(self.project.name, keyfile)) | |
resp = requests.get(keyfile) | |
resp.raise_for_status() | |
for h in resp.headers: | |
display.vvv("\t{} => '{}'".format(h, resp.headers.get(h))) | |
display.vv("Content Type of response: {}".format(resp.headers.get('content-type'))) | |
display.vv("Guessed encoding: {}".format(resp.encoding)) | |
if resp.encoding is None: | |
# for some reason the server doesn't like to tell us | |
# this seems to be a reasonable default | |
display.vv("Forcing encoding to cp1252") | |
resp.encoding = "cp1252" | |
return resp.text, resp.encoding | |
class ApacheProject(object): | |
"""Encapsulates operations around a project, including locating | |
download mirrors and generating filenames etc.""" | |
# main URL for checksums and KEYS files | |
CHECKSUM_URL_BASE = 'https://www.apache.org/dist' | |
# location of Apache utility that locates | |
# nearest mirror | |
MIRROR_LOCATOR_URL = 'https://www.apache.org/dyn/closer.cgi' | |
# support all kinds of hashes (but not you md5) | |
# gpg will be available if the gnupg module is installed | |
VERIFICATION_METHODS = [x for x in hashlib.algorithms if x != 'md5'] | |
definitions = {} | |
@classmethod | |
def load_projects(cls, path=None): | |
if GNUPG_AVAILABLE: | |
cls.VERIFICATION_METHODS.append('asc') | |
cls.VERIFICATION_METHODS = tuple(cls.VERIFICATION_METHODS) | |
if path is None: | |
curdir = os.path.dirname(os.path.realpath(__file__)) | |
path = os.path.join(curdir, 'apache_projects.yml') | |
with open(path, 'r') as f: | |
projects = yaml.load(f) | |
return projects | |
def __init__(self, name, version, path=None): | |
"""Defines a new project. | |
:param name: (required) the project name, which must exist in | |
the definitions file. | |
:param version: (required) the specific version to be downloaded. | |
:param path: (optional) the path from which to load common project | |
definitions (YAML). If unspecified, `apache_projects.yml` in the | |
same directory where this class is defined will be used. | |
""" | |
pds = ApacheProject.definitions | |
if '_default' not in pds: | |
pds = ApacheProject.load_projects(path) | |
self.name = name | |
self.version = version | |
self.major_version = version.split('.')[0] | |
defs = pds['_default'] | |
self.definition = defs | |
if self.name not in pds: | |
display.warning( | |
"Unknown project '" + self.name + "', will continue with defaults") | |
display.warning("Any errors from here on down are probably because of this!") | |
else: | |
self.definition.update(pds[self.name]) | |
display.vvv("Using project definition for '{}':".format(self.name)) | |
for k in self.definition: | |
display.vvv("\t{} : {}".format(k, self.definition[k])) | |
for prop in ('path', 'filename', 'extension', 'verify_method', 'keyfile_path'): | |
if prop in self.definition: | |
setattr(self, prop, self.definition[prop]) | |
self.verification_supported = self.verify_method in self.VERIFICATION_METHODS | |
# now we interpolate! | |
self.path = self.path.format(**self.__dict__) | |
self.filename = self.filename.format(**self.__dict__) | |
if 'keyfile_path' in self.definition: | |
self.keyfile_path = self.keyfile_path.format(**self.__dict__) | |
self.mirror_data = None | |
def locate_mirror(self): | |
"""Fetches preferred download mirror for this project""" | |
if self.mirror_data is None: | |
parts = [ApacheProject.MIRROR_LOCATOR_URL, self.path] | |
url = "/".join(parts) | |
display.vv("Fetching mirrors from " + url) | |
resp = requests.get(url, params={"as_json": 1}) | |
resp.raise_for_status() | |
display.vv("Fetched mirror data from " + resp.url) | |
self.mirror_data = resp.json() | |
self.download_url = self._create_mirror_url() | |
return self.download_url | |
def _create_mirror_url(self): | |
base = self.mirror_data['preferred'] + self.mirror_data['path_info'] | |
display.vvv("Base URL is {}".format(base)) | |
if base.endswith('/'): | |
return base + self.filename | |
return base + '/' + self.filename | |
@property | |
def checksum_filename(self): | |
return "{filename}.{verify_method}".format(**self.__dict__) | |
@property | |
def checksum_url(self): | |
return "/".join([ApacheProject.CHECKSUM_URL_BASE, | |
self.path, | |
self.checksum_filename]) | |
def get_verifier(self): | |
if self.verify_method == 'asc' and GNUPG_AVAILABLE: | |
return GNUPGVerifier(self) | |
else: | |
if hasattr(hashlib, self.verify_method): | |
return HashVerifier(getattr(hashlib, self.verify_method)) | |
raise ValueError("No verifier available for '" + self.verify_method + "'") | |
class ActionModule(ActionBase): | |
'''Downloads projects from Apache, using mirrors and verifying | |
checksums''' | |
def _arg(self, arg, default=None): | |
"""Reduce visual noise when getting task arguments""" | |
return self._task.args.get(arg, default) | |
def error_out(self, msg): | |
"""Makes returning failure results less noisy""" | |
self._result.update({ | |
'failed': True, | |
'msg': msg | |
}) | |
return self._result | |
def _do_download(self, project, dest_file): | |
dest_base = os.path.dirname(dest_file) | |
if not os.path.isdir(dest_base): | |
display.v("Creating output directory " + dest_base) | |
os.makedirs(dest_base) | |
download_url = project.locate_mirror() | |
display.v("Downloading {} {} from {}".format( | |
project.name, | |
project.version, | |
download_url) | |
) | |
with tempfile.TemporaryFile(suffix='-apachedl') as tempout: | |
resp = requests.get(download_url, stream=True) | |
resp.raise_for_status() | |
for chunk in resp.iter_content(4096): | |
tempout.write(chunk) | |
tempout.seek(0) | |
with open(dest_file, 'wb') as output: | |
shutil.copyfileobj(tempout, output) | |
self._result['filename'] = dest_file | |
def verification_warn(self, project): | |
display.error("Project {} uses {} to verify downloads, which is not supported by this plugin (yet)".format(project.name, project.verify_method)) | |
display.error("Supported methods:") | |
for vmeth in ApacheProject.VERIFICATION_METHODS: | |
display.error("\t" + vmeth) | |
def run(self, tmp=None, task_vars=None): | |
if task_vars is None: | |
task_vars = {} | |
self._result = super(ActionModule, self).run(tmp, task_vars) | |
if NO_REQUESTS_MODULE: | |
return self.error_out('This plugin requires the requests module') | |
project_name = self._arg('project') | |
version = self._arg('version') | |
if project_name is None: | |
return self.error_out("'project' argument is required") | |
if version is None: | |
return self.error_out("'version' argument is required") | |
project = ApacheProject(project_name, version) | |
default_output_dir = os.path.join(os.getcwd(), 'apache_downloads') | |
output_dir = self._arg('output_dir', default_output_dir) | |
dest_file = os.path.abspath(os.path.join(output_dir, project.filename)) | |
if not project.verification_supported: | |
self.verification_warn(project) | |
check_file = os.path.join(output_dir, project.checksum_filename) | |
force = convert_bool.boolean(self._arg('force', False)) | |
display.vv("Project: " + project.name) | |
display.vv("Version: " + project.version) | |
display.vv("Filename: " + project.filename) | |
display.vv("Force download: {}".format(force)) | |
if not os.path.exists(dest_file) or force: | |
try: | |
self._do_download(project, dest_file) | |
except requests.HTTPError, e: | |
display.error("Unable to fetch {} from {}: {}".format( | |
project.name, | |
project.locate_mirror(), | |
str(e) | |
) | |
) | |
return self.error_out("unable to fetch project from Apache mirror") | |
else: | |
display.vv("File already downloaded") | |
self._result['filename'] = dest_file | |
if not os.path.exists(check_file) or force: | |
display.v("Fetching checksum file from {}".format(project.checksum_url)) | |
try: | |
resp = requests.get(project.checksum_url) | |
resp.raise_for_status() | |
with open(check_file, 'w') as f: | |
f.write(resp.text) | |
except requests.HTTPError, e: | |
display.error("Unable to retrieve checksum file: {}".format(str(e))) | |
display.error("It is possible we are using the wrong path to download it") | |
display.error("Check warning messages, they may provide more insight") | |
return self.error_out("Unable to verify download against .sha1 sum") | |
display.v("Checking downloaded checksum {} against {}".format( | |
check_file, dest_file) | |
) | |
if project.verification_supported: | |
with open(check_file) as f: | |
checksum = f.read() | |
check, msg = project.get_verifier().verify(dest_file, checksum) | |
if not check: | |
self._result.update({'failed': True, 'msg': msg}) | |
else: | |
self._result.update({'failed': True, 'msg': "Unable to verify download"}) | |
return self._result |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
--- | |
_default: | |
path: "{name}/{version}" | |
extension: tar.gz | |
filename: "{name}-{version}.{extension}" | |
verify_method: sha1 | |
accumulo: | |
filename: "{name}-{version}-bin.{extension}" | |
# note this method (GPG signature verification) is not supported yet | |
verify_method: 'asc' | |
solr: | |
path: "lucene/solr/{version}" | |
extension: tgz | |
# this isn't really for real, it's | |
# just a test for a hash algo that's not sha1 | |
spark: | |
extension: tgz | |
path: '{name}/{name}-{version}' | |
verify_method: sha512 | |
tomcat: | |
path: "tomcat/tomcat-{major_version}/v{version}/bin" | |
filename: "apache-tomcat-{version}.{extension}" | |
verify_method: asc | |
keyfile_path: "/tomcat/tomcat-{major_version}/KEYS" | |
zookeeper: | |
path: "zookeeper/zookeeper-{version}" |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment