Last active
January 10, 2016 18:54
-
-
Save trscavo/055ffcd76952bd9603fe to your computer and use it in GitHub Desktop.
A bash script that probes a sequence of Shibboleth IdPs to determine which are based on the Shibboleth IdP V2 software
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/bash | |
####################################################################### | |
# Copyright 2015--2016 InCommon, LLC. | |
# | |
# Licensed under the Apache License, Version 2.0 (the "License"); | |
# you may not use this file except in compliance with the License. | |
# You may obtain a copy of the License at | |
# | |
# http://www.apache.org/licenses/LICENSE-2.0 | |
# | |
# Unless required by applicable law or agreed to in writing, software | |
# distributed under the License is distributed on an "AS IS" BASIS, | |
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | |
# See the License for the specific language governing permissions and | |
# limitations under the License. | |
####################################################################### | |
script_version="1.2" | |
user_agent_string="Shibboleth IdP Probe ${script_version}" | |
####################################################################### | |
# help message | |
####################################################################### | |
display_help () { | |
/bin/cat <<- HELP_MSG | |
${user_agent_string} | |
Given a list of identifiers (usually entityIDs), determine which | |
of those identifiers correspond to Shibboleth IdP V2 deployments. | |
Non-IdPs are ignored. Non-Shibboleth IdPs are also ignored. This | |
script probes Shibboleth IdP deployments only. | |
Usage: ${0##*/} [-hvq] [-t CONNECT_TIME -m MAX_TIME] (-u MDQ_BASE_URL | -f MD_PATH) [-b BIN_DIR] [-d OUT_DIR] [ID ...] | |
The script optionally takes a sequence of identifiers on the command | |
line. If none are given, the script takes its input from stdin. | |
The script iterates over all input identifiers. For each identifier, | |
if the corresponding entity is a Shibboleth IdP, the script sends a | |
Shibboleth IdP V2 Status request to a well-known endpoint location at | |
that IdP. If the HTTP response code is 200 and the response starts with | |
"ok", then we know the IdP is based on the Shibboleth IdP V2 software. | |
If, OTOH, the HTTP response code is 404, it is likely the IdP is based | |
on the Shibboleth IdP V3 software (since V3 has no such Status endpoint). | |
Other results are inconclusive. | |
Options: | |
-h Display this message | |
-v Write verbose messages to stdout | |
-q Run quietly (i.e., write no messages to stdout) | |
-t Time (in secs) to connect to the host | |
-m Maximum time (in secs) of a complete probe | |
-u Base URL of a Metadata Query Server | |
-f Path to a local metadata file | |
-b Path to a directory containing one or more scripts | |
-d Path to an output directory | |
Option -h is mutually exclusive of all other options. Options | |
-q and -v are mutually exclusive of each other. Options -u and -f | |
are mutually exclusive of each other as well. | |
The argument of the -t option is the TCP connect time, that is, | |
the maximum time (in secs) allotted to the TCP connection. Note | |
that the TCP connect time includes the time it takes to do a | |
DNS name lookup. Since the latter is unconstrained, it may | |
consume all available TCP connect time. Thus the TCP connect | |
time should be kept small (on the order of a few seconds) since | |
larger values will slow this script considerably. | |
The argument of the -m option is the maximum total time (in secs) | |
allotted to each probe. A reasonable value is a few seconds | |
beyond the TCP connect time. Any value less than the TCP connect | |
time causes the script to immediately fail. | |
Entity metadata is required to process each identifier. Metadata is | |
obtained in one of two ways, by consulting a Metadata Query Server | |
just-in-time or by using a pre-provisioned metadata aggregate. These | |
correspond to options -u and -f, respectively. Exactly one of these | |
options is required. | |
Option -f takes an optional file argument (MD_PATH), the absolute | |
path to a local SAML metadata file. The script searches this file for | |
a corresponding entity descriptor as it processes each identifier. | |
Option -u takes an optional URI argument (MDQ_BASE_URL), the base | |
URL of a Metadata Query Server (i.e., a server that conforms to the | |
Metadata Query Protocol). The base URL is used to construct an MDQ | |
request URL, which the script uses to request entity metadata | |
just-in-time. | |
The script requires a helper script (md_tools.sh) to resolve entity | |
metadata. By default, the helper script is assumed to be in the same | |
directory as this script. If not, use option -b to specify the | |
directory containing the helper script. | |
STDOUT | |
By default, the script outputs an abbreviated log to stdout (but | |
this may be suppressed by use of the -q option). A line of | |
standard output has the following space-delimited fields: | |
1) code: a curl exit code | |
2) output: a curl output string | |
3) statusURL: the URL of the probed Status endpoint | |
4) SHIBV: Shibboleth version indicator | |
See the curl man page (http://linux.die.net/man/1/curl) for a | |
brief description of possible exit codes. | |
The output string has the following format: | |
response:999;dns:9.999;tcp:9.999;ssl:9.999;total:9.999 | |
The response in the output string is the HTTP response code of the | |
probed web server. If the probe does not complete, the HTTP response | |
will be 000. The remaining four values in the output string are times | |
(in secs) computed by curl: | |
dns is the elapsed time up to and including the DNS lookup | |
(curl time_namelookup variable) | |
tcp is the elapsed time up to and including the TCP connection | |
(curl time_connect variable) | |
ssl is the elapsed time up to and including the SSL exchange | |
(curl time_appconnect variable) (only curl 7.19.0 and later) | |
total is the total elapsed time of the probe | |
(curl time_total variable) | |
See the curl man page (curl --write-out option) for detailed | |
explanations of these timings. | |
By definition, a probe succeeds if its exit code is 0. For our | |
purposes, a probe completely fails if its exit code is either 6 | |
or 7. (Exit code 6 indicates a DNS lookup failure while code 7 | |
means the host is unreachable on the network.) A probe that times | |
out (exit code 28) is labeled as nonresponsive. All other exit codes | |
are regarded as indeterminate. | |
The statusURL is the actual URL probed by this script. It is | |
computed from an HTTP endpoint location in metadata. | |
The Shibboleth version indicator (SHIBV) takes on one of three | |
values: SHIB2, SHIB3, or SHIB?. These strings indicate Shibboleth | |
IdP V2, Shibboleth IdP V3, or an unknown version of the Shibboleth | |
IdP software, respectively. | |
Note: This script detects Shibboleth IdP V2 deployments with high | |
probability, that is, there is little or no chance of a false | |
positive. However, some V2 deployments may evade this script (for | |
various reasons) and thus be reported as "SHIB?". Similarly, the | |
script detects Shibboleth IdP V3 deployments with reasonable | |
likelihood, but there is a significant chance of a false positive | |
in this case. | |
FILES | |
The script writes a number of output files if (and only if) the | |
-d option is specified on the command line. The output files are | |
written to the given OUT_DIR. | |
${NO_SAML2_HTTP_ENDPOINT_FILENAME} | |
A list of IdPs that do not expose a suitable SAML2 HTTP endpoint | |
location in metadata. A suitable endpoint supports one of the | |
following SAML2 HTTP bindings: HTTP-Redirect, HTTP-POST, or | |
HTTP-POST-SimpleSign. An IdP that supports SAML1 only will | |
necessarily appear on this list, and will therefore not be probed. | |
A line in the output file has the following space-delimited fields: | |
1) entityID: the entityID of the IdP | |
2) registrarID: the registrar ID | |
The entityID is the name of the IdP. An entityID is an arbitrary URI, | |
as given by the entityID XML attribute on the <md:EntityDescriptor> | |
element in SAML metadata. | |
The registrarID is the name of the registrar that registered the IdP | |
metadata in the first place. By convention, a registrar ID is an | |
arbitrary URI, as given by the registrationAuthority XML attribute | |
on the <mdrpi:RegistrationInfo> element in SAML metadata. Since the | |
latter element is optional in metadata, this field may be blank in | |
the log file (which is why it is always the last field on any given | |
output line). | |
${NOT_SHIB_FILENAME} | |
A list of non-Shibboleth IdPs, determined by inspecting a suitable | |
SAML2 HTTP endpoint location in metadata. Such IdPs are not probed | |
by this script. | |
A line in the output file has the following space-delimited fields: | |
1) location: a SAML2 HTTP endpoint location | |
2) entityID: the entityID of the IdP | |
3) registrarID: the registrar ID | |
The location field gives the HTTP endpoint location used to identify | |
the IdP. If the location URL indicates the IdP is a Shibboleth IdP, | |
the statusURL is computed from the HTTP location on the fly. | |
The entityID and the registrarID fields are the same as in the | |
previous output file. | |
${SHIB_LOG_FILENAME} | |
A log of each probe. Each line records the result of the probe of | |
a single Shibboleth IdP. A line in the log file has the following | |
space-delimited fields: | |
1) code: a curl exit code | |
2) output: a curl output string | |
3) statusURL: the URL of the probed Status endpoint | |
4) location: a SAML2 HTTP endpoint location | |
5) entityID: the entityID of the Shibboleth IdP | |
6) registrarID: the registrar ID | |
The code, output, and statusURL fields are the same as those printed | |
to stdout. | |
The location, entityID, and registrarID fields are the same as in the | |
previous output file. | |
${SHIB2_LOG_FILENAME} | |
A log of each probe made to a Shibboleth IdP V2 deployment. If the HTTP | |
response code is 200 and the response body starts with "ok", then we | |
know the deployment is based on the Shibboleth IdP V2 software. | |
The format of this file is identical to the format of the previous file. | |
${SHIB3_LOG_FILENAME} | |
A log of each probe made to a Shibboleth IdP V3 deployment. If the HTTP | |
response code is 404, the deployment is likely based on the Shibboleth | |
IdP V3 software. | |
The format of this file is identical to the format of the previous file. | |
${SHIB_UNKNOWN_LOG_FILENAME} | |
A log of each probe made to an IdP deployment based on an unknown | |
version of the Shibboleth IdP software. | |
The format of this file is identical to the format of the previous file. | |
Examples: ${0##*/} -h | |
${0##*/} -t ${connect_timeout_default} -m ${max_time_default} \$id | |
cat \$id_file | ${0##*/} -v -t 4 -m 6 | |
${0##*/} -q -f /path/to/md_file.xml \$id1 \$id2 \$id3 | |
Note that the second example above is the same as no options at all. | |
HELP_MSG | |
} | |
####################################################################### | |
# Bootstrap | |
####################################################################### | |
script_bin=${0%/*} # equivalent to dirname $0 | |
script_name=${0##*/} # equivalent to basename $0 | |
connect_timeout_default=2 | |
max_time_default=4 | |
# output file_names | |
NO_SAML2_HTTP_ENDPOINT_FILENAME="idps-no-saml2-http-endpoint.txt" | |
NOT_SHIB_FILENAME="idps-not-shibboleth.txt" | |
SHIB_LOG_FILENAME="idps-shibboleth-log.txt" | |
SHIB2_LOG_FILENAME="idps-shibboleth2-log.txt" | |
SHIB3_LOG_FILENAME="idps-shibboleth3-log.txt" | |
SHIB_UNKNOWN_LOG_FILENAME="idps-shibboleth-version-unknown-log.txt" | |
init_out_files () { | |
local out_dir=$1 # TODO | |
local exit_status | |
# create the dir if necessary | |
if [ ! -d "$out_dir" ]; then | |
mkdir "$out_dir" | |
exit_status=$? | |
if [ $exit_status -ne 0 ]; then | |
echo "ERROR: $FUNCNAME failed to create dir: $out_dir" >&2 | |
exit $exit_status | |
fi | |
fi | |
# output files | |
NO_SAML2_HTTP_ENDPOINT_FILE="$out_dir/$NO_SAML2_HTTP_ENDPOINT_FILENAME" | |
NOT_SHIB_FILE="$out_dir/$NOT_SHIB_FILENAME" | |
SHIB_LOG_FILE="$out_dir/$SHIB_LOG_FILENAME" | |
SHIB2_LOG_FILE="$out_dir/$SHIB2_LOG_FILENAME" | |
SHIB3_LOG_FILE="$out_dir/$SHIB3_LOG_FILENAME" | |
SHIB_UNKNOWN_LOG_FILE="$out_dir/$SHIB_UNKNOWN_LOG_FILENAME" | |
} | |
####################################################################### | |
# Process command-line options and arguments | |
####################################################################### | |
help_mode=false; quiet_mode=false; verbose_mode=false | |
md_query_mode=false; md_file_mode=false | |
local_opts=; connect_timeout=; max_time= | |
while getopts ":hqvt:m:u:f:b:d:" opt; do | |
case $opt in | |
h) | |
help_mode=true | |
;; | |
q) | |
quiet_mode=true | |
verbose_mode=false | |
#local_opts="$local_opts -$opt" | |
exec 1>/dev/null # redirect stdout to the bit bucket | |
;; | |
v) | |
quiet_mode=false | |
verbose_mode=true | |
local_opts="$local_opts -$opt" | |
;; | |
t) | |
connect_timeout="$OPTARG" | |
;; | |
m) | |
max_time="$OPTARG" | |
;; | |
u) | |
md_query_mode=true | |
md_file_mode=false | |
mdq_base_url="$OPTARG" | |
;; | |
f) | |
md_query_mode=false | |
md_file_mode=true | |
md_path="$OPTARG" | |
;; | |
b) | |
bin_dir="$OPTARG" | |
;; | |
d) | |
out_dir="$OPTARG" | |
;; | |
\?) | |
echo "ERROR: $script_name: Unrecognized option: -$OPTARG" >&2 | |
exit 2 | |
;; | |
:) | |
echo "ERROR: $script_name: Option -$OPTARG requires an argument" >&2 | |
exit 2 | |
;; | |
esac | |
done | |
if $help_mode; then | |
display_help | |
exit 0 | |
fi | |
# determine the metadata source | |
if $md_query_mode; then | |
if [ -z "$mdq_base_url" ]; then | |
echo "ERROR: $script_name: option -u requires an argument" >&2 | |
exit 2 | |
fi | |
$verbose_mode && printf "$script_name using base URL: %s\n" "$mdq_base_url" | |
# global var for getEntityFromServer function | |
MDQ_BASE_URL="$mdq_base_url" | |
elif $md_file_mode; then | |
if [ -z "$md_path" ]; then | |
echo "ERROR: $script_name: option -f requires an argument" >&2 | |
exit 2 | |
fi | |
if [ ! -f "$md_path" ]; then | |
echo "ERROR: $script_name: file does not exist: $md_path" >&2 | |
exit 2 | |
fi | |
$verbose_mode && printf "$script_name using metadata file: %s\n" "$md_path" | |
# global var for getEntityFromFile function | |
MD_PATH="$md_path" | |
else | |
echo "ERROR: $script_name: one of options -u or -f required" >&2 | |
exit 2 | |
fi | |
# determine the bin directory | |
if [ -n "$bin_dir" ]; then | |
if [ ! -d "$bin_dir" ]; then | |
echo "ERROR: $script_name: directory does not exist: $bin_dir" >&2 | |
exit 2 | |
fi | |
BIN_DIR="$bin_dir" | |
else | |
BIN_DIR="$script_bin" | |
fi | |
$verbose_mode && printf "$script_name using bin directory: %s\n" "$BIN_DIR" | |
# determine the output directory | |
if [ -z "$out_dir" ]; then | |
DO_NOT_PRINT_OUT_FILES=true | |
$verbose_mode && printf "$script_name not printing output files\n" | |
else | |
DO_NOT_PRINT_OUT_FILES=false | |
init_out_files "$out_dir" | |
$verbose_mode && printf "$script_name using output dir: %s\n" "$out_dir" | |
fi | |
# check consistency of timeout options (both or neither are required) | |
if [ -z "$connect_timeout" -a -z "$max_time" ]; then | |
connect_timeout=$connect_timeout_default | |
max_time=$max_time_default | |
elif [ -n "$connect_timeout" -a -n "$max_time" ]; then | |
if [ ! "${connect_timeout}" -gt 0 ] ; then | |
echo "ERROR: $script_name: connect timeout must be a positive integer: ${connect_timeout}" >&2 | |
exit 2 | |
fi | |
if [ ! "${max_time}" -gt "${connect_timeout}" ]; then | |
echo "ERROR: $script_name: max time must be greater than the connect timeout: ${max_time}" >&2 | |
exit 2 | |
fi | |
else | |
echo "ERROR: $script_name: both (or neither) options -t and -m are required" >&2 | |
exit 2 | |
fi | |
if $verbose_mode; then | |
printf "$script_name using connect timeout: %d secs\n" $connect_timeout | |
printf "$script_name using max time: %d secs\n" $max_time | |
fi | |
shift $(( OPTIND - 1 )) | |
##################################################################### | |
# Initialization | |
##################################################################### | |
# create a temporary directory | |
TMP_DIR=$( mktemp -d 2>/dev/null || mktemp -d -t "${script_name%%.*}" ) | |
if [ ! -d "$TMP_DIR" ] ; then | |
printf "ERROR: Unable to create temporary dir\n" >&2 | |
exit 2 | |
fi | |
$verbose_mode && printf "$script_name creating temp dir: %s\n" "$TMP_DIR" | |
# temp files | |
HTTP_RESPONSE_FILE="${TMP_DIR}/http_response.txt" | |
# read the input into a temporary file | |
IN_FILE="${TMP_DIR}/tmp_infile.txt" | |
if [ "$#" -gt 0 ]; then | |
# read input from the command line | |
while (( "$#" )); do | |
# copy command-line arg into the temp file | |
echo "$1" >> "$IN_FILE" | |
shift | |
done | |
else | |
# read input from stdin | |
/bin/cat - > "$IN_FILE" | |
fi | |
$verbose_mode && printf "$script_name processing temp input file: %s\n" "$IN_FILE" | |
# load metadata tools | |
md_tools_script="$BIN_DIR/md_tools.sh" | |
source "$md_tools_script" >&2 | |
exit_status=$? | |
if [ $exit_status -ne 0 ]; then | |
echo "ERROR: ${script_name} failed to source script ${md_tools_script}" >&2 | |
exit $exit_status | |
fi | |
##################################################################### | |
# Functions | |
##################################################################### | |
clean_up_files () { | |
$DO_NOT_PRINT_OUT_FILES && return | |
# clean up | |
/bin/rm -f "$NO_SAML2_HTTP_ENDPOINT_FILE" | |
/bin/rm -f "$NOT_SHIB_FILE" | |
/bin/rm -f "$SHIB_LOG_FILE" | |
/bin/rm -f "$SHIB2_LOG_FILE" | |
/bin/rm -f "$SHIB3_LOG_FILE" | |
/bin/rm -f "$SHIB_UNKNOWN_LOG_FILE" | |
} | |
print_no_saml2_http_endpoint_logfile () { | |
$DO_NOT_PRINT_OUT_FILES && return | |
local entityID=$1 | |
local registrarID=$2 | |
printf "%s %s\n" "$entityID" "$registrarID" >> "$NO_SAML2_HTTP_ENDPOINT_FILE" | |
} | |
print_not_shib_logfile () { | |
$DO_NOT_PRINT_OUT_FILES && return | |
local location=$1 | |
local entityID=$2 | |
local registrarID=$3 | |
printf "%s %s %s\n" "$location" "$entityID" "$registrarID" >> "$NOT_SHIB_FILE" | |
} | |
print_logfile () { | |
$DO_NOT_PRINT_OUT_FILES && return | |
local logfile=$1 | |
printf "%s %s %s " "$status_code" "$output" "$statusURL" >> "$logfile" | |
printf "%s %s %s\n" "$location" "$entityID" "$registrarID" >> "$logfile" | |
} | |
##################################################################### | |
# Main processing | |
##################################################################### | |
clean_up_files | |
if $verbose_mode; then | |
num_entityIDs=$( /bin/cat $IN_FILE | wc -l ) | |
printf "$script_name processing %d entityIDs\n" $num_entityIDs | |
fi | |
# compute curl command-line options | |
curl_opts="--connect-timeout ${connect_timeout} --max-time ${max_time}" | |
curl_opts="${curl_opts} --insecure --tlsv1" | |
# iterate over all entityIDs in the file | |
/bin/cat $IN_FILE | while read entityID; do | |
# get the entity descriptor for this entityID | |
if $md_file_mode; then | |
entityDescriptor=$( getEntityFromFile $entityID ) | |
else | |
entityDescriptor=$( getEntityFromServer $entityID ) | |
fi | |
return_code=$? | |
if [ "$return_code" -ne 0 ]; then | |
echo "ERROR: $script_name: unable to obtain metadata for entityID: $entityID" >&2 | |
[ "$return_code" -gt 1 ] && exit 1 | |
continue | |
fi | |
# short-circuit the while-loop if this is not an IdP | |
if ! echo "$entityDescriptor" | grep -Fq 'IDPSSODescriptor '; then | |
echo "WARNING: $script_name: entity is not an IdP: $entityID" >&2 | |
continue | |
fi | |
# extract the registrar ID from the entity descriptor | |
registrarID=$( echo "$entityDescriptor" \ | |
| grep -F -m 1 ' registrationAuthority=' \ | |
| sed -e 's/^.* registrationAuthority="\([^"]*\)".*$/\1/' | |
) | |
# extract a SAML2 HTTP endpoint location from the entity descriptor | |
for binding in Redirect POST POST-SimpleSign; do | |
location=$( echo "$entityDescriptor" \ | |
| grep -E '<(md:)?SingleSignOnService ' \ | |
| grep -F -m 1 ' Binding="urn:oasis:names:tc:SAML:2.0:bindings:HTTP-'$binding'"' \ | |
| sed -e 's/^.* Location="\([^"]*\)".*$/\1/' | |
) | |
# terminate the for-loop if a location was found | |
[ -n "$location" ] && break | |
done | |
# if there is no SAML2 HTTP endpoint location, short-circuit the while-loop | |
if [ -z "$location" ]; then | |
print_no_saml2_http_endpoint_logfile "$entityID" "$registrarID" | |
echo "INFO: $script_name: IdP has no SAML2 HTTP endpoint location: $entityID" | |
continue | |
fi | |
# if the endpoint location indicates non-Shibboleth, short-circuit the while-loop | |
if [[ "$location" != */$binding/SSO ]]; then | |
print_not_shib_logfile "$location" "$entityID" "$registrarID" | |
echo "INFO: $script_name: entity is not a Shibboleth IdP: $entityID" | |
continue | |
fi | |
# compute the Status URL for a typical Shibboleth V2 IdP | |
statusURL=$( echo "$location" \ | |
| sed -e 's/'$binding'\/SSO$//' -e 's/SAML2\/$//' -e 's/Shibboleth\/$//' -e 's/\/$/\/Status/' | |
) | |
# request the Status URL | |
output=$( /usr/bin/curl --silent \ | |
--output "$HTTP_RESPONSE_FILE" \ | |
$curl_opts \ | |
--write-out 'response:%{http_code};dns:%{time_namelookup};tcp:%{time_connect};ssl:%{time_appconnect};total:%{time_total}' \ | |
"$statusURL" | |
) | |
status_code=$? | |
# If the response code is 200 and the response is "ok", then the IdP is a | |
# a Shibboleth V2 IdP. If the response code is 404, then the IdP is probably | |
# a Shibboleth V3 IdP. All other results are indeterminate. | |
response_code=$( echo "$output" | sed -e 's/^response:\([^;]*\).*$/\1/' ) | |
if [[ "$response_code" == 200 ]]; then | |
if cat "$HTTP_RESPONSE_FILE" | /usr/bin/head -n 1 | grep -q '^ok'; then | |
print_logfile "$SHIB2_LOG_FILE" | |
printf "%s %s %s %s\n" "$status_code" "$output" "$statusURL" SHIB2 | |
else | |
print_logfile "$SHIB_UNKNOWN_LOG_FILE" | |
printf "%s %s %s %s\n" "$status_code" "$output" "$statusURL" SHIB? | |
fi | |
elif [[ "$response_code" == 404 ]]; then | |
print_logfile "$SHIB3_LOG_FILE" | |
printf "%s %s %s %s\n" "$status_code" "$output" "$statusURL" SHIB3 | |
else | |
print_logfile "$SHIB_UNKNOWN_LOG_FILE" | |
printf "%s %s %s %s\n" "$status_code" "$output" "$statusURL" SHIB? | |
fi | |
print_logfile "$SHIB_LOG_FILE" | |
done | |
exit 0 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment