glennklockwood/ior-summary.md

Created June 3, 2020 00:38

Star (0) You must be signed in to star a gist
Fork (0) You must be signed in to fork a gist

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/glennklockwood/046df9ff9aa381da20432083f27e7f73.js"></script>
Save glennklockwood/046df9ff9aa381da20432083f27e7f73 to your computer and use it in GitHub Desktop.

Download ZIP

NERSC Cori IOR Acceptance

Raw

ior-summary.md

Acceptance Test Options

The Cori acceptance test for both the Lustre file system (cscratch) and the burst buffer used IOR to obtain the peak numbers that were advertised.

DataWarp Phase I

DataWarp Phase I used 4480 processes (ppn=4) with the following IOR command-line options:

./IOR -a MPIIO -g -t 512k -b 8g -o $DW_JOB_STRIPED/IOR_file -v
./IOR -a POSIX -F -e -g -t 512k -b 8g -o $DW_JOB_STRIPED/IOR_file -v
./IOR -a POSIX -F -e -g -t 4k -b 1g -o $DW_JOB_STRIPED/IOR_file -v -z

To summarize,

832,451.89 MiB/sec (POSIX file-per-process write)
862,616.35 MiB/sec (POSIX file-per-process read)
334,627.84 MiB/sec (MPI-IO shared-file write)
765,847.30 MiB/sec (MPI-IO shared-file read)
12,527,427.06 IOP/sec (POSIX file-per-process write)
12,591,977.74 IOP/sec (POSIX file-per-process read)

DataWarp Phase II

The Phase II IOR runs used between 44,000 and 44,080 processes (again, ppn=4) with the following IOR command-line options:

./IOR -a POSIX -F -e -g -t 1M -b 8G -o $DW_JOB_STRIPED/IOR_file -v
./IOR -a MPIIO -g -t 1M -b 8G -o $DW_JOB_STRIPED/IOR_file -v
./IOR -a POSIX -F -e -g -t 4k -b 1g -o $DW_JOB_STRIPED/IOR_file -v -z

To summarize, the peak numbers were

1,493,373.74 MiB/sec (POSIX file-per-process write)
1,663,914.47 MiB/sec (POSIX file-per-process read)
1,300,578.87 MiB/sec (MPI-IO shared-file write; independent I/O per IOR log)
1,259,295.00 MiB/sec (MPI-IO shared-file read; independent I/O per IOR log)
13,135,292.56 IOP/sec (POSIX file-per-process write)
28,260,132.42 IOP/sec (POSIX file-per-process read)

Lustre Phase I/II

For the peak Lustre performance (Phase I), we did

./IOR -w -a POSIX -F -C -e -g -k -b 4m -t 4m -s 1638 -o $SCRATCH/IOR_file -v

960 nodes, 4 processes per node
4 MiB transfer and block size
24 TiB total write size (which determined the segment count)
-w and -r were run as separate sruns (to ensure cache was dropped)

./IOR -w -a MPIIO -c -C -g -b 8m -t 8m -k -H -v -s $((12*1024*1024/8/(960*32))) -o $SCRATCH/IOR_file

960 nodes, 32 processes per node
8 MiB transfers and block size
12 TiB total write size
-w and -r run separately
collective buffering explicitly disabled
stripe size set to 8 MiB
stripe count set to the total number of OSTs, from lfs df $SCRATCH

To summarize:

716,886.15 MiB/sec (POSIX file-per-process write)
646,835.37 MiB/sec (POSIX file-per-process read)
344,016.32 MiB/sec (MPI-IO shared-file write)
614,328.95 MiB/sec (MPI-IO shared-file read)

The same tests were performed at Phase II acceptance, but the Lustre performance was diminished due to filling of OSTs. To summarize, 960 nodes gave:

562,701.01 MiB/sec (POSIX file-per-process write, KNL with 8 ppn, buffered I/O)
389,576.59 MiB/sec (POSIX file-per-process read, KNL with 8 ppn, buffered I/O)
624,666.33 MiB/sec (POSIX file-per-process write, KNL with 8 ppn, direct I/O)
397,262.65 MiB/sec (POSIX file-per-process read, KNL with 8 ppn, direct I/O)
478,170.92 MiB/sec (POSIX file-per-process write, Haswell with 4 ppn) - 66% of the Phase I acceptance
346,969.69 MiB/sec (POSIX file-per-process read, Haswell with 4 ppn) - 53% of the Phase I acceptance

There were no peak I/O numbers for MPI-IO shared-file I/O for Phase 2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment