Skip to content

Instantly share code, notes, and snippets.

View ramkumardevanathan's full-sized avatar

Ramkumar Devanathan ramkumardevanathan

View GitHub Profile
@ramkumardevanathan
ramkumardevanathan / gist:d25bb91d8ab170d3b656
Created November 25, 2014 03:31
simple gist to get docker ps output as csv file
docker ps -a --no-trunc | awk -F " +" '{$1=$1}1' OFS="\t"
# The CPU bottleneck symptom default is influenced mostly by the overall
# cpu utilization. Note that cpu utilization may be high even though
# there is no bottleneck. The run queue is an indicator processes are
# waiting for cpu resources, and that the cpu may be bottlenecked.
symptom CPU_Bottleneck type=CPU
rule GBL_CPU_TOTAL_UTIL > 75 prob 25
rule GBL_CPU_TOTAL_UTIL > 85 prob 25
rule GBL_CPU_TOTAL_UTIL > 90 prob 25
rule GBL_RUN_QUEUE > 2 prob 25
# The Disk bottleneck symptom default is influenced mostly by the busiest
# disk's utilization. The disk request queue is an indicator that processes
# may be waiting for disk resources.
symptom Disk_Bottleneck type=DISK
rule GBL_DISK_UTIL_PEAK > 50 prob GBL_DISK_UTIL_PEAK
rule GBL_DISK_REQUEST_QUEUE > 3 prob 25
alarm Disk_Bottleneck > 50 for 5 minutes
type = "Disk"
start
# The Memory bottleneck symptom default is triggered by a combination
# of several metrics. Excessive page outs can be an indicator of memory
# pressure when the memory utilization is high, however memory-mapped
# file writes also generate pageouts. Under heavy memory pressure, data
# will start to be swapped out.
symptom Memory_Bottleneck type=MEMORY
rule GBL_MEM_UTIL > 95 prob 30
rule GBL_MEM_UTIL > 98 prob 20
rule GBL_MEM_PAGEOUT_BYTE_RATE > 200 prob 20
rule GBL_MEM_SWAPOUT_BYTE_RATE > 0 prob 20
@ramkumardevanathan
ramkumardevanathan / default alarmdef rule for network card bottleneck
Created August 28, 2016 09:37
network bottleneck (+network error rate check)
# The Network bottleneck symptom default relies on general throughput
# metrics. Not all network interfaces report collision data. To be
# useful as a bottleneck indicator, the rate thresholds should be
# adjusted based on values seen in historical data for a particular
# system or network. For example, 100mbit networks cannot handle as
# high packet rates without a bottleneck than can gigabit networks.
symptom Network_Bottleneck type=NETWORK
rule GBL_NFS_CALL_RATE > 500 prob 25
rule GBL_NET_COLLISION_PCT > 10 prob 10
rule GBL_NET_COLLISION_PCT > 25 prob 20
##############################################################################
#
# This alarm monitors two file systems and sends an alert when the space
# utilization on a given filesystem exceeds the given value.
#
# Initialize the variables root_util and var_util which will hold the
# current FS_SPACE_UTIL value. The first time they are accessed, they will
# be initialized to zero. Loop through the filesystem each interval and save
# the FS_SPACE_UTIL for each one. Send an alert if the space utilization
# exceeds the given threshold. A repeat alert will be sent every 30 minutes
@ramkumardevanathan
ramkumardevanathan / gist:fffb5cd5cc9dc95033a13d0e4ae8196d
Created July 10, 2019 14:50
CF template to deploy StarCluster
{
"Description" : "An example template which launches and bootstraps a cluster of eight CC2 EC2 instances for high performance computational tasks using spot pricing. Includes StarCluster, Grid Engine and NFS.",
"AWSTemplateFormatVersion" : "2010-09-09",
"Parameters" : {
"AccountNumber" : {
"Description" : "Twelve digit AWS account number.",
"Type" : "String",
"NoEcho" : "True"