Last active
June 4, 2018 18:38
-
-
Save chrisruffalo/e78fb45e4d56dbffffe252aea2c8e519 to your computer and use it in GitHub Desktop.
A proposal for a mechanism for creating locks around critical sections in CFME.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
############################################################################################################### | |
# Author: Chris Ruffalo <[email protected]> | |
# | |
# ------------------------------------------------------------------------------------------------------------- | |
# Description: | |
# ------------------------------------------------------------------------------------------------------------- | |
# In some situations in CFME automation it is necessary to access a resource that does not | |
# support shared access. This code was initially developed to work with a web API that could | |
# return the same result to multiple workflows and end up with virtual machines having identical | |
# properties where they should've been unique. | |
# | |
# To achieve this goal a locking scheme was created so that a workflow could execute an automation step | |
# that would acquire a lock using VMDB properties for the VMDB object that would be serving as the root | |
# of the lock. | |
# | |
# The overall intent of this implementation is to do the best job possible of protecting limited critical | |
# sections for simultaneous execution. While this implementation tries to follow the rough semantics of | |
# various types of locks it does not posses the power to provide similar guarantees. | |
# | |
# ------------------------------------------------------------------------------------------------------------- | |
# Operation: | |
# ------------------------------------------------------------------------------------------------------------- | |
# When speaking of locks in CFME it is important to realize that these act more like a psuedo-lock and | |
# do not follow the exact semantics of an actual semaphore as usually seen in many programming languages. | |
# Instead this implementation uses the VMDB as a simple provider for locks by setting a property on the | |
# objects that will serve as the root for the operation. | |
# | |
# When the lock is first acquired it sets a property (configured by the 'key' input) to a given value | |
# (configured by the input 'value'). On some object types this may be through `set_option` and on | |
# others it may be through `custom_set`. All of this depends on the type of object. | |
# | |
# After setting the lock this method quieries for all active records of the same type that have a property | |
# 'key' whose value matches 'value'. There is some other logic to allow for the lock to be expired without | |
# being manually released as well. | |
# | |
# When a lock cannot be acquired the property is set back to nil and a random backoff occurs. The presence | |
# of this random backoff assures that if two methods mutually acquire the lock they should also mutually | |
# backoff for different amounts of time. | |
# | |
# This operation results in a slow but moderately safe locking mechanism. | |
# | |
# ------------------------------------------------------------------------------------------------------------- | |
# Configuration Inputs: | |
# ------------------------------------------------------------------------------------------------------------- | |
# action - Action to perform. Accepted values are 'acquire' and 'release'. The default value is 'acquire'. | |
# | |
# object - Key on the root object to use to lookup the objec that will be the target of the lock. Defaults | |
# to the value of $evm.root['vmdb_object_type']. This can be used to create a lock on a type | |
# or value that is not the target of the current operation. (Ex: 'vm' or 'miq_provision') Supported | |
# types are 'vm', 'miq_request', 'service', and 'miq_provision'. Other types will throw an error. | |
# | |
# key - Key to use to set the lock value on the target object. A good example might be 'network_name', | |
# 'enclave', 'group' or whatever the unique description of the locked resource should be. | |
# | |
# value - Value to use that describes any subgroup that is being locked. This allows this locking code | |
# to operate on a smaller subsection of the provided resource. For example, if using 'network_name' | |
# the value might be something like 'production' or 'sandbox'. | |
# | |
# expires - One of 'true' to use the default expire time (300 seconds), 'false' to have the lock never | |
# expire automatically, or an integer representing the time until expiration (ex: 1000). If | |
# an expiration value is given the lock will be automatically removed or ignored after the | |
# time elapses which gives some safety for processes that can fail after acquiring a lock. | |
# | |
# backoff - Time, in seconds, to wait (at minimum) before attempting to reacquire the lock. The | |
# default value is 15. Prevents the process from tripping over itself and creating churn | |
# as the process is retried. | |
# | |
# random - Maximum random amount of seconds to add to the backoff time. This helps processes that are in | |
# the backoff state from colliding by spreading out the time when they will restart. | |
# | |
# allowed - The number of simultaneous locks allowed. The default value is 1 which makes this behave like | |
# a mutex. Setting a number of greater than 1 makes the lock behave like a counting latch. | |
# | |
############################################################################################################### | |
# constants | |
ACTION_ACQUIRE = 'acquire' | |
ACTION_RELEASE = 'release' | |
DEFAULT_ACTION = ACTION_ACQUIRE | |
DEFAULT_OBJECT_KEY = 'vmdb_object_type' | |
DEFAULT_VALUE = 'default_lock' | |
DEFAULT_EXPIRES = 'false' | |
DEFAULT_EXP_TIME = 300 | |
DEFAULT_BACKOFF = 15 | |
DEFAULT_RANDOM = 45 | |
DEFAULT_ALLOWED = 1 | |
VALUE_PROPERTY = :value | |
EXPIRES_PROPERTY = :expires_after | |
# instance scoped variables | |
@method = $evm.current_method | |
@org = $evm.root['tenant'].name | |
@debug = true #$evm.root['debug'] || false | |
########################### Utility ########################### | |
def log(level, msg) | |
$evm.log(level, "#{@org} - #{@method} :: #{msg}") | |
end | |
############################ Query ############################ | |
# get active vms that don't match the given id | |
def get_active_vms(vm_id) | |
return $evm.vmdb(:vm).where("id != ?", vm_id).select { |vm| | |
vm.archived == false && vm.orphaned == false | |
} | |
end | |
# get active provisioning requests that don't match the given id | |
def get_active_provs(prov_id) | |
return $evm.vmdb(:miq_provision).where("id != ? and state in ('active','queued','pending')", prov_id) | |
end | |
# get active requests that don't match the given id | |
def get_active_requests(req_id) | |
return $evm.vmdb(:miq_request).where("id != ? and request_state in ('active','queued','pending')", prov_id) | |
end | |
# get | |
def get_active_services(srv_id) | |
return $evm.vmdb(:service).where("id != ? and retired = false", srv_id) | |
end | |
# get locked objects and select valid locked objects based on the key and value of the lock as well | |
# as the current time compared to the expiration time if present | |
def get_locked(object, type, key, value) | |
# default to empty list | |
locked = [] | |
# get values from vmdb | |
log(:info, "Selecting (active) objects according to type=#{type}") if @debug | |
case type | |
when 'vm' | |
locked = get_active_vms(object.id) | |
when 'service' | |
locked = get_active_services(object.id) | |
when 'miq_provision' | |
locked = get_active_provs(object.id) | |
when 'miq_request' | |
locked = get_active_requests(object.id) | |
else | |
raise "Other object types are not supported by this method." | |
end | |
log(:info, "Selected #{locked.size} objects of type=#{type}") if @debug | |
# select only the values that are locked (object[key] == value) and | |
# that have a non-expired lock | |
selected = locked.select { |item| | |
# get the value hash and return false if nil or empty | |
value_hash = prop_get(item, type, key) | |
next false if value_hash.nil? || value_hash.empty? | |
# check expiration if it exists | |
if value_hash.key?(EXPIRES_PROPERTY) | |
# if the expiration happened before now | |
if value_hash[EXPIRES_PROPERTY] < Time.now | |
# force unlock on that item because it is expired | |
unlock(item, key) | |
# do not select because it is expired | |
next false | |
end | |
end | |
# check value if other checks have passed and use it to determine if the lock is the same | |
value_hash.key?(VALUE_PROPERTY) && value_hash[VALUE_PROPERTY] == value | |
} | |
# log | |
log(:info, "Found #{selected.size} locked items from #{locked.size} items") if @debug | |
return selected | |
end | |
############################ Prop ############################# | |
# get lock property for any object | |
def prop_get(object, type, key) | |
case type | |
when 'vm', 'service' | |
return object.custom_get(key) | |
when 'miq_provision', 'miq_request' | |
return object.get_option(key) | |
else | |
raise "Other object types are not supported by this method." | |
end | |
end | |
# set lock property for any object | |
def prop_set(object, type, key, value) | |
when 'vm', 'service' | |
return object.custom_set(key, value) | |
when 'miq_provision', 'miq_request' | |
return object.set_option(key, value) | |
else | |
raise "Other object types are not supported by this method." | |
end | |
end | |
# unset lock property for any object | |
def prop_unset(object, type, key) | |
prop_set(object, type, key, nil) | |
end | |
############################ Lock ############################# | |
# locks the object by setting the key and value on the target | |
# object as well as (optionally) the expiration time | |
def lock(object, type, key, value, expires) | |
# lock value and log message | |
value_hash = {VALUE_PROPERTY => value} | |
log_msg = "Setting lock #{type}.#{key} => '#{value}'" | |
expire_time = Time.now | |
# parse out expires value based on expiration | |
if 'false'.casecmp(expires) == 0 | |
expire_time = nil | |
elsif 'true'.casecmp(expires) == 0 | |
expire_time = expire_time + DEFAULT_EXP_TIME | |
elsif expires.to_i | |
expire_time = expire_time + expires.to_i | |
end | |
# if expire time is available add the value | |
# to the value hash before setting | |
unless expire_time.nil? | |
value_hash[EXPIRES_PROPERTY] = expire_time | |
log_msg = "#{log_msg}, expires at #{expire_time}" | |
end | |
# first set property | |
prop_set(object, type, key, value_hash) | |
log(:info, log_msg) | |
end | |
# unlocks the object (deletes the key from its options) | |
def unlock(object, type, key) | |
unless key.present? | |
return | |
end | |
# remove property from object | |
prop_unset(object, type, key) | |
# log | |
log(:info, "Removed lock #{type || object.type}.#{key}") | |
end | |
# performs abort/retry action | |
def do_backoff(object, key, value, backoff, random, allowed, current_acquired) | |
# get random time | |
random_time = rand(random) + backoff | |
# set message | |
msg = "backing off #{object.type}.#{key} => '#{value}' for #{random_time} seconds because lock count is at #{current_acquired} of #{allowed}" | |
log(:info, msg) | |
# set retry for process | |
$evm.root['ae_result'] = 'retry' | |
$evm.root['ae_reason'] = msg | |
$evm.root['ae_retry_interval'] = "#{random_time}.seconds" | |
# exit from here | |
exit MIQ_OK | |
end | |
# acquires the lock, checks to see if we are under the allowed | |
# count of acquired locks, backs off if it cannot acquire the | |
# lock | |
def acquire(object, type, key, value, expires, backoff, random, allowed) | |
# get lock | |
lock(object, type, key, value, expires) | |
# get other locked items | |
locked = get_locked(object, type, key, value) | |
# this is the same, or nearly the same, as locked.size > allowed - 1 which | |
# is because we know that we have at least one lock that won't show up in | |
# the query at this point | |
if locked.size >= allowed | |
# immediately unlock | |
unlock(object, type, key) | |
# do backoff and attempt to reacquire lock later | |
do_backoff(object, key, value, backoff, random, allowed, locked.size) | |
end | |
# otherwise proceed in locked state | |
end | |
############################ Body ############################ | |
@object = nil | |
@input_hash = {} | |
begin | |
# inputs/configuration | |
['action','object','key','value','expires','backoff','random','allowed'].each do |k| | |
@input_hash[k] = $evm.object[k] | |
end | |
# set to defaults if defaults are not available | |
@input_hash['action'] ||= DEFAULT_ACTION | |
@input_hash['object'] ||= $evm.root[DEFAULT_OBJECT_KEY] | |
@input_hash['value'] ||= DEFAULT_VALUE | |
@input_hash['expires'] ||= DEFAULT_EXPIRES | |
@input_hash['backoff'] ||= DEFAULT_BACKOFF | |
@input_hash['random'] ||= DEFAULT_RANDOM | |
@input_hash['allowed'] = @input_hash['allowed'].to_i > 0 ? @input_hash['allowed'].to_i : DEFAULT_ALLOWED | |
#log configuration values | |
@input_hash.each { |key, value| | |
log(:info, "Config: #{key}=>#{value}") if @debug | |
} | |
type_target = @input_hash['object'] | |
# ensure that there is a target object key | |
raise "No value found for input object's key, aborting" unless type_target.present? | |
# ensure that target object is supported | |
unless 'vm'.casecmp(type_target) == 0 | |
|| 'miq_provision'.casecmp(type_target) == 0 | |
|| 'miq_request'.casecmp(type_target) == 0 | |
|| 'service'.casecmp(type_target) == 0 | |
raise "Unexpected value '#{type_target}' given for the target object type. Expected 'vm', 'service', 'miq_request', or 'miq_provision'." | |
end | |
# ensure that we have an object | |
@object = $evm.root[type_target] | |
raise "Could not get object from $evm.root[#{@input_hash['object']}] to use as lock target, aborting" if @object.nil? | |
# ensure that we have a target key | |
raise "No input key found, aborting" unless @input_hash['key'].present? | |
# decide what action to take | |
case @input_hash['action'] | |
when ACTION_ACQUIRE | |
acquire(@object, type_target, @input_hash['key'], @input_hash['value'], @input_hash['expires'], @input_hash['backoff'], @input_hash['random'], @input_hash['allowed']) | |
when ACTION_RELEASE | |
unlock(@object, type_target, @input_hash['key']) | |
else | |
raise "The action #{@input_hash['action']} cannot be performed. Expected 'acquire' or 'release'." | |
end | |
# confirm exit OK | |
exit MIQ_OK | |
rescue => err | |
log(:error, "could not obtain lock, exiting with error => #{err}") | |
log(:error, "stack trace: #{err.backtrace.join("\n")}") | |
# make sure to always unlock during an error | |
unless @object.nil? | |
unlock(@object, @input_hash['object'], @input_hash['key']) | |
end | |
# nothing else to be done if an error happens while getting the lock | |
exit MIQ_ABORT | |
end |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment