fspaolo / when_to_use.md

Last active August 26, 2020 06:33

When to use a specific tool

Brief list describing scenarios calling for specific tools.

Spark
If you have (larger-than-memory) petabytes of JSON/XML/CSV files, a simple workflow, and a thousand node cluster

Dask
If you have (larger-than-memory) 10s-1000s of gigabytes of binary or numeric data (e.g., HDF5, netCDF4, CSV.gz), complex algorithms, and a (single) large multi-core workstation

fspaolo / install_spark.md

Last active February 24, 2017 11:44

Install Apache Spark

Install Apache Spark (OSX 10.6)

You need the package manager Homebrew (if not installed), see http://brew.sh/

ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"

Get Spark 'Source Code':

http://spark.apache.org/downloads.html

fspaolo / parallelization.md

Last active October 26, 2020 01:59

Simple parallelization using mpi4py

For basic applications, MPI is as easy to use as any other message-passing system. The sample scripts below contain the complete communications skeleton for a data (or embarrassingly) parallel problem using the mpi4py package.

Within the code is a description of the few functions needed to write typical parallel applications.

mpi-submit.py - Parallel application with simple partitioning: unbalanced load.

mpi-submit2.py - Parallel application with master/slave scheme: dynamically balanced load.

fspaolo / macosx.md

Last active June 16, 2017 17:38

Software install on Mac OS X

Install Fonts for Matplotlib (Helvetica)

Download a font converter (no need to install, unzip and launch the GUI):

http://peter.upfold.org.uk/projects/dfontsplitter

fspaolo / remote.md

Last active March 23, 2018 18:16

Working on remote machines

Some (trivial?) commands to work on a remote machine using the SSH, FTP and HTTP protocols.

SSH certificate

Generate SSH keys for using a SSH connection without having to enter the password every time you log in.

Generate the key (local machine) [Skip this step if you have already the keys in your machine!]:

fspaolo / triton.md

Last active December 19, 2015 10:39

Triton - Commands and Software install

Triton, TCC and TSCC, are clusters at the San Diego Supercomputer Center (SDSC): http://www.sdsc.edu/services/hpc/hpc_systems.html#tscc

Some commands

Account balance:

fspaolo / xgrid.md

Last active December 9, 2025 22:35

Grid computing with Apple's Xgrid

How to set up a grid of computers using the Mac OS X desktop version.

The Xgrid agent (worker)

Any machine can be an agent. Configure the agent using the Sharing Pane of System Preferences. See Managing Xgrid Agents.

fspaolo / argsf90.md