Measuring & improving performance of scientific software.
- Static: variable types are declared at initialization and cannot change. [
C++] - Duck: if it looks, walks, & quacks like a duck, treat it like a duck. [
Python,R]
Developer time vs. compute time.
If you already know Python or R but not C++, it may not be worth the time spent learning and debugging C++.
- CPU
- Memory
- I/O
The solution to the problem depends on what's causing the bottleneck. Switching to a compiled language often helps for CPU-bound programs, may or may not help for memory-bound, and likely will not help for I/O-bound programs.
Analyze the performance of your code.
- Run it on a high-performance machine. [
Great Lakes,XSEDE] - Write it in a compiled language.
Juliais a good compromise betweenC++&Python.
- Re-write only the limiting steps in a compiled language
RcppCython
- Make sure you're using the language as optimally as possible.
R: vectorize withapplyfunctions instead of usingforloops.Python: use comprehensions instead offorloops to build lists, generators, dicts, & sets.
- Multiprocessing.
- Workflow manager:
Snakemake,Nextflow. If you have steps with separate I/O files. Pythonlibrary:multiprocessing- Sometimes the gains are minimal due to the additional overhead of handling multiple processors.
Rlibrary:parallel
- Workflow manager:
- Write compressed binary instead of plain text files for I/O-bound software.
- Pickling objects.
Pythonlibrary:jsonpickle.
- Pickling objects.