You have geographically separated puppet masters that need to be semi-autonomous whilst having node classifier groups updated from a central source of truth. But you don't always have a reliable, or fast, connection back to the central point. To spell this out a bit more, you want / need to have the following:
- a single source of truth for classification that all masters consume.
- masters able to keep compiling with the latest available classification data even when the wan links are down for a while.
- avoid doing NC requests over the WAN due to data size, latency, and reliability constraints.
- the lightest and most reliable distributed puppet architecture possible while maintaining a single source of truth for node classification data.
- regions to be semi-autonomous, readonly, and update from a central point.
- be aware of failing updates.
Some possible approaches / components:
- use replicated nc database but avoid puppet db replication (if our postgres and puppet config modules can configure separate postgres instance for NC or puppet db)
- use a distributed object store (eg redis, memcached) to replicate classification data
- polling daemon to retrieve classification data and write to yaml file or distributed object store
- caching proxy for classification data (local daemon) (but puppet already caches node terminus results and can re-use this with
node_cache_terminus=yaml
so this is possibly equivalent) - enable reading of the node classification cache with
node_cache_terminus=yaml
- but will updates to classification data be observed by the master in a timely fashion or will it just use the cached data even when the NC service is available? - classifier groups synchronisation with status shown in wrapper of central console UI
node_cache_terminus=yaml
- the default is 'write_only_yaml' which of course doesn't allow puppet master to use the cached nodes. Tested as follows:
- install PE 2015.3.0 rc4
puppet agent -t
on master - no errors- stop puppet agent
systemctl stop puppet
- modify
/etc/puppetlabs/puppet/classifier.yaml
to point to an invalid api endpoint (change port from4433
to44333
) - restart puppet server
systemctl restart pe-puppetserver
puppet agent -t
on master - errors expected "Error: Could not retrieve catalog from remote server: Error 400 on SERVER: Failed when searching for node master.vm: Classification of master.vm failed due to a Node Manager service error."- edit
/etc/puppetlabs/puppet/puppet.conf
and add the following to the master section:node_cache_terminus=yaml
puppet agent -t
on master - success! no errors.
The above shows that the puppet master can be configured to use a cached copy of the classification data for nodes for which it already has data. What happens if there is no cached node classification data? Lets see:
rm /opt/puppetlabs/server/data/puppetserver/yaml/node/master.vm.yaml
- modify
/etc/puppetlabs/puppet/classifier.yaml
to point to an invalid api endpoint (change port from4433
to44333
) - restart puppet server
systemctl restart pe-puppetserver
- puppet run gets the same error as before "Error: Could not retrieve catalog from remote server: Error 400 on SERVER: Failed when searching for node master.vm: Classification of master.vm failed due to a Node Manager service error."
With node classification cache readable (node_cache_terminus=yaml
) when does the puppet master invalidate this cache and update the node's classification cache?
Basic plan:
- Delete cached classificaion data for test node
- Puppet run - observe new cache file being created
- Change classification for test node in the console (add a class)
- Puppet run - what happens? (request to classifier end point? new cache file? or classification change not enforced on test node?)
- If no change, repeat puppet runs over a period of time. Increase verbosity of puppet master's logs and try and discern what it's doing. If it has an expiry on the cache, where is that configured?
Adding class puppet_enterprise::symlinks
to PE Master group:
- rm cache file
- puppet run
- add class
puppet_enterprise::symlinks
to the PE Master group - puppet run - observe the class has not been added to the cache file (although the cache file has been updated with latest facts)
- rm cache file
- puppet run - observe that the class has now been added to the cache file
With an invalid parameter (causing compilation failure):
- add an invalid parameter to a class (eg java_args: "-Dfoo-bar" in puppet_enterprise::profile::master in "PE Master" group)
- rm the cache
- puppet run - errors
- revert the nc change (remove the java_args parameter)
- puppet run - still errors
- rm the cache
- puppet run - back to normal
Summary:
- if node classification cache file is present, master will use it and make no requests of the node classifier
- the only way to invalidate the cache that I'm aware of is to delete the node's cache file
So that it can have replication disabled.
Use the groups endpoint on the central and satellite node classifiers to one-way push out classification groups:
- fetch all groups on central nc
- delete all groups on satellite nc
- recreate all groups on satellite nc using groups from central nc
What happens if you add a group to the node classifier via the api, and you reference classes that it doesn't know about yet?
- set synchronization period to 0 (never synchronise)
- use the groups endpoint to create a new group referencing a new class
Wrapper for the Console UI that adds classification sync status. Geoff is on this.
How to PEE
/etc/puppetlabs/nginx/conf.d/proxy.conf
look like this:Put something like:
in
/var/www/pee/index.html