Getting started with Intel Python Distribution

Today I got to know about this Intel Distribution for Python and, after browsing a bit on their web site, I was interested to run one of the benchmarks by myself and play a bit with it to compare the results.

Here is how I got two different environments ready for tests and play:

The Intel Python:

$ docker run -i -t -p 8888:8888 intelpython/intelpython3_full /bin/bash \
-c "/opt/conda/bin/conda install jupyter -y --quiet && mkdir /opt/notebooks && /opt/conda/bin/jupyter \
notebook --notebook-dir=/opt/notebooks --ip='*' --port=8888 --no-browser --allow-root"

Jupyter Notebook Kernel: Python 3.7.7 (default, Mar 13 2020, 13:32:22) [GCC 7.3.0]

"Plain" Anaconda

$ docker run -i -t -p 8887:8887 continuumio/anaconda3 /bin/bash \
-c "/opt/conda/bin/conda install jupyter -y --quiet && mkdir /opt/notebooks && /opt/conda/bin/jupyter \
notebook --notebook-dir=/opt/notebooks --ip='*' --port=8887 --no-browser --allow-root"

Jupyter Notebook Kernel: Python 3.7.6 (default, Jan 8 2020, 19:59:22) [GCC 7.3.0]

Executing the code

When the containers are ready you can access the Jupyter notebooks using your browser. In the terminal you'll find the address with containing a token, like this: http://127.0.0.1:8888/?token=a4b4631ec9f42b7d9d321fed71deff5cc4ed68e8a093dd12. Notice that we are using different ports per environment. That's how you are going to know which environment you are executing the code.

Copy and paste this code in a Notebook cell for each environment and execute it. You will immediately see the difference.

Example

I have adapted the example below that simulates a stochastic differential equation from the IPython Cookbook, Second Edition and executed it for the two different profiles Intel Python and "Plain" Anaconda. The results are interesting but not surprising.

%%timeit

import numpy as np
import numpy as np
try:
    import mkl_random as rnd
except ImportError:
    import numpy.random as rnd
import matplotlib.pyplot as plt

%matplotlib inline


sigma = 1.  # Standard deviation.
mu = 10.  # Mean.
tau = .05  # Time constant.

dt = .001  # Time step.
T = 1.  # Total time.
n = int(T / dt)  # Number of time steps.
t = np.linspace(0., T, n)  # Vector of times.

sigma_bis = sigma * np.sqrt(2. / tau)
sqrtdt = np.sqrt(dt)

ntrials = 1000000
X = np.zeros(ntrials)

bins = np.linspace(-2., 14., 100)
fig, ax = plt.subplots(1, 1, figsize=(8, 4))
for i in range(n):
    # We update the process independently for
    # all trials
    X += dt * (-(X - mu) / tau) + \
        sigma_bis * sqrtdt * rnd.randn(ntrials)
    # We display the histogram for a few points in
    # time
    if i in (5, 50, 900):
        hist, _ = np.histogram(X, bins=bins)
        ax.plot((bins[1:] + bins[:-1]) / 2, hist,
                {5: '-', 50: '.', 900: '-.', }[i],
                label=f"t={i * dt:.2f}")
    ax.legend()

Notes

More about random number generation: https://software.intel.com/en-us/blogs/2016/06/15/faster-random-number-generation-in-intel-distribution-for-python
The benchmarks page: https://software.intel.com/en-us/distribution-for-python/benchmarks
This is the list of packages and a good way to see what the Intel Distribution in fact adds: https://software.intel.com/en-us/articles/complete-list-of-packages-for-the-intel-distribution-for-python
Accelerating Scientific Python with Intel Optimizations Paper: http://conference.scipy.org/proceedings/scipy2017/pdfs/oleksandr_pavlyk.pdf

soeirosantos/intel_python_getting_started.md