Created
July 9, 2010 19:57
-
-
Save jvoorhis/469969 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
2010 07 09 10:30 | |
Galois Tech Talk: Requirements and Performance of Data Intensive, Irregular Applications | |
Iavor S. Diatchki | |
Pacific NW National Laboratory | |
Analytics | |
Security | |
Anomaly detection | |
Semantic web | |
Trends | |
Data is graph based | |
Linear algebra methods fall down | |
favors special architectures | |
Cray XMT used in speaker's research | |
http://en.wikipedia.org/wiki/Cray_XMT | |
http://www.cray.com/products/xmt/ | |
BSD Microkernel schedules only user code on Threadstorm CPUs | |
500 MHz maximal processor count from 256 to 8192, and maximum memory to 128 TB | |
single core may run up to 15 programs simultaneously | |
system IO performed by Opterons running Linux | |
code manually moved to compute cores via software | |
global address space | |
sponsored by Cray | |
low diameter graphs | |
work explodes | |
difficult to partition | |
high % of nodes visited | |
scale-free graphs | |
few nodes are highly connected hubs | |
difficult to partition | |
concentrates in few nodes | |
graph methods | |
paths (shortest path, min/max, etc) | |
strutcures (SCC, etc) | |
groups (coloring, etc) | |
orderings (topological, etc) | |
graph properties | |
degree distribution – normal vs scale-free | |
planar / non-planar | |
static / dynamic | |
weighted / unweighted | |
typed / untyped edges | |
challenges | |
memory intensive | |
low locality | |
hide latency w/ parallelism | |
low computation:communication | |
fine grained synchronization (per entity) | |
work is imbalanced | |
favors work-stealing? | |
software implications | |
!! place data near computation | |
access data in order / reuse | |
!! partition into tasks | |
minimize communication / synchronization | |
avoid modifying shared data | |
multithreading | |
avoid stalling cpu pipeline | |
context switch on memory cache miss | |
high memory bandwidth | |
many threads per core, small thread state | |
PNNL: software tools to facilitate use of local memory | |
Maximum matching | |
matching is a subset of edges with no incident vertices | |
maximize cost function | |
Hoepman's Algorithm | |
greedy, fast, terminating | |
Parallel partitioning | |
Halappanavar, Dobrina, Pothen | |
partition graph on shared memory machine | |
independent computation followed by shared computation | |
Tools | |
METIS Partitioner | |
DataFlow algorithm for maximum matching | |
nodes propose by setting signals on edges | |
state is Maybe Bit per (edge x vertex) | |
fine grain synchronization | |
performance insensitive to structure / weight | |
!! deadlock prone | |
pre-emption is too expensive (XMT does not pre-empt by default!) | |
manually order tasks to guarantee progress | |
heaviest edge will always match | |
add is-active field to vertices | |
if CAS succeeds, make recursive call to visit connected vertex | |
would benefit from CPS? | |
techniques extend to graph coloring, other trad algorithms | |
Summary | |
unstrutured / irregular data is graph-centric | |
dataflow techniques useful for parallelizing graph algorithms | |
parallel architecture increases productivity | |
Q/A | |
PNNL not researching FPGA / GPGPU yet | |
other research in progress, future area of interest |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment