This program was made to test the influence of if-guards (checking a value before entering the critical section) in parallel code, as part of a question in MAC5742 course in IME-USP. This involve the use of a Python script to automatically generate a C program that tries to find the max value in a vector, compiles it, run it and do a statiscal analysis of the data. The original C code is from this article from MSDN, with some modifications to allow time analysis and bigger vectors.
The program assumes a GNU/Linux environment with a reasonable recently GCC and Linux kernel. It needs Python 3.4 to run since it uses statistics library. You may get it working with Python 2.7-3.3 by using backports.statistics, but I didn't test it. And it may run with other compilers/OS by making some small modifications.
The script max_vector.py
does the majority of the job. It creates the C source-file max_vector.c
(not included in this repo) from a template, already adapts the code for each test (vector size, memory allocation, number of if-guards), compiles it and runs it with each configuration. You can see basic usage by running it as:
python3 max_vector.py -h
The resulting C code is compiled without optimizations (-O0
), so the compiler should not influence in the final result. The program runs a couple of times (10 by default) to get some relevant data to statistic analysis. The vector is allocated with increasing integer values based on their indix. This may not be a real world example, but this way we can force the code to do the largest number of comparisons. Of course, all this parameters may be changed in the source code.
You will find in this repository three more files: run_on_aguia.sh
script and test_max_ifs.log/test_zero_ifs.log
log files. The first is a script to run two different tests in a relatively powerful computer in IME-USP called aguia
. It's powered with two Intel(R) Xeon(R) CPU [email protected], each with 6 physical cores and 12 threads (resulting in 12 physical cores and 24 threads) and 32GB of RAM.
The first test compares the performance of not using if-guards at all (instead making the whole code inside the parallel-for critical). It's shows that running the code without the if-guards to be up to 2 order of magnitude slower than any other test configuration. Since it would be too slow to run it with bigger vector sizes, we choose to run this test with a smaller vector. The result of this test on aguia
can be seem in file test_zero_ifs.log
.
The second test compares the performance of different if-guards configuration, from 1 up to 10. It shows that too many if-guards to be harmful, but not in a meaningful way (the difference between the best and worst times is less than 0.3 seconds if you consider the standard deviations between measurements). And it only shows that the ideal value of if-guards seems to be less or equal than 8, since everything else is within the standard deviation. Since this test is very fast, it was run with a very big vector (using 2000000000 positions, or almost 16GiB of RAM from our test machine). This is almost the limit for this hardware, since anything bigger than this could easily consume all available RAM and would need more changes in the code (using long instead of int for vector allocation code, for example), so we thought this to be sufficient. The result of this test on aguia
can be seem in file test_max_ifs.log
.
The conclusion is that only 1 if-guards seems to be sufficient in critical-regions for performance, at least on current high-performance memory-sharing machines. The result may be different with the use of more complex data structures, like Strings or Structs.