Skip to content

Instantly share code, notes, and snippets.

@thiagokokada
Last active August 29, 2015 14:19
Show Gist options
  • Save thiagokokada/c3e5a4465d8d9ae00158 to your computer and use it in GitHub Desktop.
Save thiagokokada/c3e5a4465d8d9ae00158 to your computer and use it in GitHub Desktop.
Testing if-guards in OpenMP
max_vector
max_vector.c

Testing if-guards with OpenMP

Introduction

This program was made to test the influence of if-guards (checking a value before entering the critical section) in parallel code, as part of a question in MAC5742 course in IME-USP. This involve the use of a Python script to automatically generate a C program that tries to find the max value in a vector, compiles it, run it and do a statiscal analysis of the data. The original C code is from this article from MSDN, with some modifications to allow time analysis and bigger vectors.

The program assumes a GNU/Linux environment with a reasonable recently GCC and Linux kernel. It needs Python 3.4 to run since it uses statistics library. You may get it working with Python 2.7-3.3 by using backports.statistics, but I didn't test it. And it may run with other compilers/OS by making some small modifications.

Tests and results

The script max_vector.py does the majority of the job. It creates the C source-file max_vector.c (not included in this repo) from a template, already adapts the code for each test (vector size, memory allocation, number of if-guards), compiles it and runs it with each configuration. You can see basic usage by running it as:

python3 max_vector.py -h

The resulting C code is compiled without optimizations (-O0), so the compiler should not influence in the final result. The program runs a couple of times (10 by default) to get some relevant data to statistic analysis. The vector is allocated with increasing integer values based on their indix. This may not be a real world example, but this way we can force the code to do the largest number of comparisons. Of course, all this parameters may be changed in the source code.

You will find in this repository three more files: run_on_aguia.sh script and test_max_ifs.log/test_zero_ifs.log log files. The first is a script to run two different tests in a relatively powerful computer in IME-USP called aguia. It's powered with two Intel(R) Xeon(R) CPU [email protected], each with 6 physical cores and 12 threads (resulting in 12 physical cores and 24 threads) and 32GB of RAM.

The first test compares the performance of not using if-guards at all (instead making the whole code inside the parallel-for critical). It's shows that running the code without the if-guards to be up to 2 order of magnitude slower than any other test configuration. Since it would be too slow to run it with bigger vector sizes, we choose to run this test with a smaller vector. The result of this test on aguia can be seem in file test_zero_ifs.log.

The second test compares the performance of different if-guards configuration, from 1 up to 10. It shows that too many if-guards to be harmful, but not in a meaningful way (the difference between the best and worst times is less than 0.3 seconds if you consider the standard deviations between measurements). And it only shows that the ideal value of if-guards seems to be less or equal than 8, since everything else is within the standard deviation. Since this test is very fast, it was run with a very big vector (using 2000000000 positions, or almost 16GiB of RAM from our test machine). This is almost the limit for this hardware, since anything bigger than this could easily consume all available RAM and would need more changes in the code (using long instead of int for vector allocation code, for example), so we thought this to be sufficient. The result of this test on aguia can be seem in file test_max_ifs.log.

Final considerations

The conclusion is that only 1 if-guards seems to be sufficient in critical-regions for performance, at least on current high-performance memory-sharing machines. The result may be different with the use of more complex data structures, like Strings or Structs.

The MIT License (MIT)
Copyright (c) 2015 Thiago Kenji Okada
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
set term postscript enhanced color
set encoding iso_8859_1
set xlabel "Number of if-guards"
set ylabel "Execution time (seconds)"
set style line 1 lw 2
set style line 2 lw 2
# ploting test_zero_ifs graph
set xtics -1,1,11
set xrange [-1:11]
set output '| ps2pdf - test_zero_ifs.pdf'
plot "test_zero_ifs.data" using 1:2:3 title "Comparisson between if-guards (Test 1)" with errorbars ls 1
# ploting test_max_ifs graph
set xtics 0,1,11
set xrange [0:11]
set output '| ps2pdf - test_max_ifs.pdf'
plot "test_max_ifs.data" using 1:2:3 title "Comparisson between if-guards (Test 2)" with errorbars ls 2
#!/usr/env/python3
import re
import statistics
import subprocess
import sys
from distutils.util import strtobool
from string import Template
template = Template(
r"""#include <omp.h>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#define SEED 1
#define VECTOR_SIZE $vector_size
int main(int argc, char *argv[])
{
long int *vector;
int i, max;
struct timespec start, finish;
double elapsed;
/* Seed the pseudo-random generator with a constant number so we can
* replicate the experiments. */
srand(SEED);
/* We could allocate this vector directly in the code, but since we
* can use very long vectors this may be bigger than stack size. Of
* course, in this case we would have a stack overflow. Using malloc
* to force the allocation on memory solves this problem (at the
* expense of slower memory access, but shouldn't be a problem in this
* experiment). */
vector = (long int *) malloc(VECTOR_SIZE * sizeof(long int));
if (vector == NULL) {
fprintf(stderr, "Could not allocate enough memory!\n");
exit(EXIT_FAILURE);
}
for (i = 0; i < VECTOR_SIZE; ++i) {
vector[i] = $vector_alloc;
}
/* This method only works on Linux from version 2.6.28 onwards, but it's
* the most precise way to measure time.
* See "man 2 clock_gettime" and http://stackoverflow.com/a/2962914 for
* details.
* If you need to run this program in other plataforms, you may need
* another way to measure time. But don't use gettimeofday(), this
* function was not made to measure time. */
clock_gettime(CLOCK_MONOTONIC_RAW, &start);
max = vector[0];
#pragma omp parallel for
for (i = 1; i < VECTOR_SIZE; ++i) {
/* Using $n_ifs ifs */
$ifs{
#pragma omp critical
{
if(vector[i] > max) {
max = vector[i];
}
}
}
}
printf("Max value: %d\n", max);
clock_gettime(CLOCK_MONOTONIC_RAW, &finish);
elapsed = (finish.tv_sec - start.tv_sec);
elapsed += (finish.tv_nsec - start.tv_nsec) / 1000000000.0;
printf("Total elapsed time: %lf\n", elapsed);
exit(EXIT_SUCCESS);
}
""")
def generate_code(vector_size=1024, n_ifs=0, outfile="max_vector.c",
vector_alloc=None):
if not vector_alloc:
vector_alloc = "rand()"
else:
vector_alloc = "i * {}".format(int(vector_alloc))
ifs = "if (vector[i] > max) " * n_ifs
source_code = template.substitute(vector_size=vector_size,
ifs=ifs, n_ifs=n_ifs,
vector_alloc=vector_alloc)
with open(outfile, "w") as source_file:
source_file.write(source_code)
def compile_code(source_file="max_vector.c", outfile="max_vector",
optimization=0, compiler="gcc", fopenmp=True):
cmd = [compiler, "-O{}".format(optimization), "-o", outfile, source_file]
if fopenmp:
cmd.append("-fopenmp")
subprocess.call(cmd)
def run_code(program_name="max_vector", repetitions=3):
if repetitions < 2:
raise ValueError("number_of_times should be at least 2")
result_times = []
for i in range(repetitions):
output = subprocess.check_output("./" + program_name)
time = re.findall("Total elapsed time: (.*)", output.decode())[0]
result_times.append(float(time))
return result_times
if __name__ == "__main__":
try:
include_zero_ifs = strtobool(sys.argv[1])
vector_size = int(sys.argv[2])
repetitions = int(sys.argv[3])
max_n_ifs = int(sys.argv[4])
except IndexError:
include_zero_ifs = 1
vector_size = 100000
repetitions = 10
max_n_ifs = 10
except ValueError:
sys.exit("usage: {} INCLUDE_ZERO_IFS VECTOR_SIZE REPETITIONS MAX_N_IFS"
.format(sys.argv[0]))
print("Running with vector_size={}".format(vector_size), end="\n\n")
for n_ifs in range(include_zero_ifs, max_n_ifs + 1):
print("Running tests with number_of_ifs={}".format(n_ifs))
# Since we are interested in the increase of performance according to
# the number of if-guards, we are creating a vector with increasing
# numbers instead of random.
generate_code(vector_size=vector_size, n_ifs=n_ifs, vector_alloc=2)
compile_code()
result_times = run_code(repetitions=repetitions)
mean_times = statistics.mean(result_times)
stdev_times = statistics.stdev(result_times)
print("Results after {} repetitions".format(repetitions))
print("Mean={}".format(mean_times))
print("Stdev={}".format(stdev_times), end="\n\n")
#!/bin/bash
for logfile in *.log; do
basename=$(basename -s .log ${logfile})
awk '
BEGIN {
RS=""
FS="\n"
OFS="\t"
print "number_of_ifs", "mean", "stdev"
}
NR != 1 {
gsub("Running tests with number_of_ifs=", "", $1)
gsub("Results after [0-9]+ repetitions", "", $2)
gsub("Mean=", "", $3)
gsub("Stdev=", "", $4)
# print $1, $3, $3 - (0.95 * $4), $3 + (0.95 * $4)
print $1, $3, 0.95 * $4
}' ${logfile} > ${basename}.data
echo "Finished processing ${logfile}. Resulting file is ${basename}.data."
done
echo "Plotting graphs."
gnuplot max_vector.gp
#!/bin/sh
# Test 1: this is just to show how slow is if number_of_ifs is 0. Since
# this test is so slow we run it with a smaller vector.
VECTOR_SIZE=10000000
REPETITIONS=30
MAX_N_IFS=10
python3 -u max_vector.py FALSE ${VECTOR_SIZE} ${REPETITIONS} ${MAX_N_IFS} | tee test_zero_ifs.log
# Test 2: testing bigger vectors with the rest of configurations, to see
# if there is a performance improvement with increasing the number of ifs.
# WARNING: with the following setup, max_vector allocates lots of memory
# (my laptop crashed after allocating over 6GB of memory). Unless you have
# lots of memory like our aguia cluster (that has 32GB of memory), I don't
# recommend running this program with the settings below.
# The largest value here is probably MAX_INT, or 2147483647, since anything
# larger than this would not even compile. Of course, you could change the
# code to accept a long value instead and the maximum would change to
# 9223372036854775807, but with the value below we are already using almost
# 16GiB of RAM to calculate.
VECTOR_SIZE=2000000000
REPETITIONS=30
MAX_N_IFS=10
python3 -u max_vector.py TRUE ${VECTOR_SIZE} ${REPETITIONS} ${MAX_N_IFS} | tee test_max_ifs.log
number_of_ifs mean stdev
1 4.077462 0.186467
2 4.0943832 0.217625
3 4.138659033333334 0.243405
4 4.157179066666667 0.218832
5 4.1921231 0.201012
6 4.2292190666666665 0.177328
7 4.2028757 0.191749
8 4.361061433333333 0.169482
9 4.4696335 0.220793
10 4.532657733333334 0.216173
Running with vector_size=2000000000
Running tests with number_of_ifs=1
Results after 30 repetitions
Mean=4.077462
Stdev=0.1962807242604502
Running tests with number_of_ifs=2
Results after 30 repetitions
Mean=4.0943832
Stdev=0.22907929765994228
Running tests with number_of_ifs=3
Results after 30 repetitions
Mean=4.138659033333334
Stdev=0.25621591277946154
Running tests with number_of_ifs=4
Results after 30 repetitions
Mean=4.157179066666667
Stdev=0.23034927326636
Running tests with number_of_ifs=5
Results after 30 repetitions
Mean=4.1921231
Stdev=0.2115917429945517
Running tests with number_of_ifs=6
Results after 30 repetitions
Mean=4.2292190666666665
Stdev=0.1866612982445484
Running tests with number_of_ifs=7
Results after 30 repetitions
Mean=4.2028757
Stdev=0.20184155385678318
Running tests with number_of_ifs=8
Results after 30 repetitions
Mean=4.361061433333333
Stdev=0.17840162233406526
Running tests with number_of_ifs=9
Results after 30 repetitions
Mean=4.4696335
Stdev=0.23241350675521985
Running tests with number_of_ifs=10
Results after 30 repetitions
Mean=4.532657733333334
Stdev=0.2275507173723807
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
number_of_ifs mean stdev
1 2.9228521 0.126274
2 2.9066992666666667 0.0963213
3 2.9123384000000003 0.0988245
4 2.9428944 0.0950482
5 2.9241033666666665 0.0785975
6 2.9227638666666667 0.136429
7 2.920264766666667 0.0860136
8 2.9573495666666667 0.157104
9 2.9332350000000003 0.177683
10 2.9301745666666665 0.0861207
Running with vector_size=2000000000
Running tests with number_of_ifs=1
Results after 30 repetitions
Mean=2.9228521
Stdev=0.13291960762893879
Running tests with number_of_ifs=2
Results after 30 repetitions
Mean=2.9066992666666667
Stdev=0.10139086745956112
Running tests with number_of_ifs=3
Results after 30 repetitions
Mean=2.9123384000000003
Stdev=0.10402576897679629
Running tests with number_of_ifs=4
Results after 30 repetitions
Mean=2.9428944
Stdev=0.10005071456140767
Running tests with number_of_ifs=5
Results after 30 repetitions
Mean=2.9241033666666665
Stdev=0.08273422304500672
Running tests with number_of_ifs=6
Results after 30 repetitions
Mean=2.9227638666666667
Stdev=0.1436092251065569
Running tests with number_of_ifs=7
Results after 30 repetitions
Mean=2.920264766666667
Stdev=0.09054058280803998
Running tests with number_of_ifs=8
Results after 30 repetitions
Mean=2.9573495666666667
Stdev=0.16537223082857516
Running tests with number_of_ifs=9
Results after 30 repetitions
Mean=2.9332350000000003
Stdev=0.1870345476606413
Running tests with number_of_ifs=10
Results after 30 repetitions
Mean=2.9301745666666665
Stdev=0.0906533359477792
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
number_of_ifs mean stdev
0 3.0187754 0.337857
1 0.03997166666666666 0.00435966
2 0.040733733333333334 0.00446794
3 0.0400266 0.00453253
4 0.040493 0.00462049
5 0.042162733333333334 0.00430125
6 0.04147686666666666 0.00471233
7 0.0403342 0.00424692
8 0.0425857 0.00413146
9 0.045170133333333334 0.00425583
10 0.044149100000000004 0.00525438
Running with vector_size=10000000
Running tests with number_of_ifs=0
Results after 30 repetitions
Mean=3.0187754
Stdev=0.35563893090203846
Running tests with number_of_ifs=1
Results after 30 repetitions
Mean=0.03997166666666666
Stdev=0.0045891153781554665
Running tests with number_of_ifs=2
Results after 30 repetitions
Mean=0.040733733333333334
Stdev=0.004703097247510435
Running tests with number_of_ifs=3
Results after 30 repetitions
Mean=0.0400266
Stdev=0.004771081519840708
Running tests with number_of_ifs=4
Results after 30 repetitions
Mean=0.040493
Stdev=0.004863670786272585
Running tests with number_of_ifs=5
Results after 30 repetitions
Mean=0.042162733333333334
Stdev=0.004527631145367933
Running tests with number_of_ifs=6
Results after 30 repetitions
Mean=0.04147686666666666
Stdev=0.0049603466420397885
Running tests with number_of_ifs=7
Results after 30 repetitions
Mean=0.0403342
Stdev=0.004470438398620662
Running tests with number_of_ifs=8
Results after 30 repetitions
Mean=0.0425857
Stdev=0.004348905095260898
Running tests with number_of_ifs=9
Results after 30 repetitions
Mean=0.045170133333333334
Stdev=0.004479823308810395
Running tests with number_of_ifs=10
Results after 30 repetitions
Mean=0.044149100000000004
Stdev=0.00553092224865706
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment