Skip to content

Instantly share code, notes, and snippets.

@jessereynolds
Last active February 24, 2017 13:11
Show Gist options
  • Save jessereynolds/ab59c1035a1a20241cbf to your computer and use it in GitHub Desktop.
Save jessereynolds/ab59c1035a1a20241cbf to your computer and use it in GitHub Desktop.

Distributed Node Classification in Puppet Enterprise

You have geographically separated puppet masters that need to be semi-autonomous whilst having node classifier groups updated from a central source of truth. But you don't always have a reliable, or fast, connection back to the central point. To spell this out a bit more, you want / need to have the following:

  • a single source of truth for classification that all masters consume.
  • masters able to keep compiling with the latest available classification data even when the wan links are down for a while.
  • avoid doing NC requests over the WAN due to data size, latency, and reliability constraints.
  • the lightest and most reliable distributed puppet architecture possible while maintaining a single source of truth for node classification data.
  • regions to be semi-autonomous, readonly, and update from a central point.
  • be aware of failing updates.

Some possible approaches / components:

  • use replicated nc database but avoid puppet db replication (if our postgres and puppet config modules can configure separate postgres instance for NC or puppet db)
  • use a distributed object store (eg redis, memcached) to replicate classification data
  • polling daemon to retrieve classification data and write to yaml file or distributed object store
  • caching proxy for classification data (local daemon) (but puppet already caches node terminus results and can re-use this with node_cache_terminus=yaml so this is possibly equivalent)
  • enable reading of the node classification cache with node_cache_terminus=yaml - but will updates to classification data be observed by the master in a timely fashion or will it just use the cached data even when the NC service is available?
  • classifier groups synchronisation with status shown in wrapper of central console UI

Investigations

Have puppet master use cached node classification data

node_cache_terminus=yaml - the default is 'write_only_yaml' which of course doesn't allow puppet master to use the cached nodes. Tested as follows:

  • install PE 2015.3.0 rc4
  • puppet agent -t on master - no errors
  • stop puppet agent systemctl stop puppet
  • modify /etc/puppetlabs/puppet/classifier.yaml to point to an invalid api endpoint (change port from 4433 to 44333)
  • restart puppet server systemctl restart pe-puppetserver
  • puppet agent -t on master - errors expected "Error: Could not retrieve catalog from remote server: Error 400 on SERVER: Failed when searching for node master.vm: Classification of master.vm failed due to a Node Manager service error."
  • edit /etc/puppetlabs/puppet/puppet.conf and add the following to the master section: node_cache_terminus=yaml
  • puppet agent -t on master - success! no errors.

The above shows that the puppet master can be configured to use a cached copy of the classification data for nodes for which it already has data. What happens if there is no cached node classification data? Lets see:

  • rm /opt/puppetlabs/server/data/puppetserver/yaml/node/master.vm.yaml
  • modify /etc/puppetlabs/puppet/classifier.yaml to point to an invalid api endpoint (change port from 4433 to 44333)
  • restart puppet server systemctl restart pe-puppetserver
  • puppet run gets the same error as before "Error: Could not retrieve catalog from remote server: Error 400 on SERVER: Failed when searching for node master.vm: Classification of master.vm failed due to a Node Manager service error."

Work out when puppet master invalidates the cached node classification data

With node classification cache readable (node_cache_terminus=yaml) when does the puppet master invalidate this cache and update the node's classification cache?

Basic plan:

  • Delete cached classificaion data for test node
  • Puppet run - observe new cache file being created
  • Change classification for test node in the console (add a class)
  • Puppet run - what happens? (request to classifier end point? new cache file? or classification change not enforced on test node?)
  • If no change, repeat puppet runs over a period of time. Increase verbosity of puppet master's logs and try and discern what it's doing. If it has an expiry on the cache, where is that configured?

Adding class puppet_enterprise::symlinks to PE Master group:

  • rm cache file
  • puppet run
  • add class puppet_enterprise::symlinks to the PE Master group
  • puppet run - observe the class has not been added to the cache file (although the cache file has been updated with latest facts)
  • rm cache file
  • puppet run - observe that the class has now been added to the cache file

With an invalid parameter (causing compilation failure):

  • add an invalid parameter to a class (eg java_args: "-Dfoo-bar" in puppet_enterprise::profile::master in "PE Master" group)
  • rm the cache
  • puppet run - errors
  • revert the nc change (remove the java_args parameter)
  • puppet run - still errors
  • rm the cache
  • puppet run - back to normal

Summary:

  • if node classification cache file is present, master will use it and make no requests of the node classifier
  • the only way to invalidate the cache that I'm aware of is to delete the node's cache file

Can puppet db be configured to use a separate postgres instance?

So that it can have replication disabled.

Prototype classification groups sync using api

Use the groups endpoint on the central and satellite node classifiers to one-way push out classification groups:

  • fetch all groups on central nc
  • delete all groups on satellite nc
  • recreate all groups on satellite nc using groups from central nc

Adding classification groups with classes that haven't been sync'd from the master yet

What happens if you add a group to the node classifier via the api, and you reference classes that it doesn't know about yet?

PEE - view status of classifier sync

Wrapper for the Console UI that adds classification sync status. Geoff is on this.

References

Resources

@GeoffWilliams
Copy link

How to PEE

  1. Make the locations in /etc/puppetlabs/nginx/conf.d/proxy.conf look like this:
location /pee
{
root /var/www;
}
location /
{
proxy_pass http://127.0.0.1:4430;
proxy_redirect http://127.0.0.1:4430 /;
proxy_read_timeout 120;
proxy_hide_header X-Frame-Options;
proxy_set_header X-SSL-Subject $ssl_client_s_dn;
proxy_set_header X-Client-DN $ssl_client_s_dn;
proxy_set_header X-Client-Verify $ssl_client_verify;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
}
  1. Create content
    Put something like:
<html>
<head><title>Puppet Enterprise Enterprise</title></head>
<body>
<h1>Puppet Enterprise Enterprise</h1>
<div style="border: 1px solid black;">
Classifier sync status <span id="sync_status" style="font-color: red;">FUBAR</span>
</div>
<iframe width="100%" height="80%" src="https://192.168.33.10" />
</body>
</html>

in /var/www/pee/index.html

  1. Restart nginx
  2. Browse to https://PUPPETSERVER/pee and you should see the page you added

@GeoffWilliams
Copy link

The above does also expose you to XSS attacks that were impossible before... a nice way of doing this would be some kind of UI plugin mechanism

@jessereynolds
Copy link
Author

Note, Jeff McCune has written a great tool named ncio that extracts classification from one PE master and can write it out to a file, or transform it so it can be pushed to a second PE master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment