Measuring & improving performance of scientific software.
- Static: variable types are declared at initialization and cannot change. [
C++
] - Duck: if it looks, walks, & quacks like a duck, treat it like a duck. [
Python
,R
]
Developer time vs. compute time.
If you already know Python
or R
but not C++
, it may not be worth the time spent learning and debugging C++
.
- CPU
- Memory
- I/O
The solution to the problem depends on what's causing the bottleneck. Switching to a compiled language often helps for CPU-bound programs, may or may not help for memory-bound, and likely will not help for I/O-bound programs.
Analyze the performance of your code.
- Run it on a high-performance machine. [
Great Lakes
,XSEDE
] - Write it in a compiled language.
Julia
is a good compromise betweenC++
&Python
.
- Re-write only the limiting steps in a compiled language
Rcpp
Cython
- Make sure you're using the language as optimally as possible.
R
: vectorize withapply
functions instead of usingfor
loops.Python
: use comprehensions instead offor
loops to build lists, generators, dicts, & sets.
- Multiprocessing.
- Workflow manager:
Snakemake
,Nextflow
. If you have steps with separate I/O files. Python
library:multiprocessing
- Sometimes the gains are minimal due to the additional overhead of handling multiple processors.
R
library:parallel
- Workflow manager:
- Write compressed binary instead of plain text files for I/O-bound software.
- Pickling objects.
Python
library:jsonpickle
.
- Pickling objects.