Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save jvoorhis/469969 to your computer and use it in GitHub Desktop.
Save jvoorhis/469969 to your computer and use it in GitHub Desktop.
2010 07 09 10:30
Galois Tech Talk: Requirements and Performance of Data Intensive, Irregular Applications
Iavor S. Diatchki
Pacific NW National Laboratory
Analytics
Security
Anomaly detection
Semantic web
Trends
Data is graph based
Linear algebra methods fall down
favors special architectures
Cray XMT used in speaker's research
http://en.wikipedia.org/wiki/Cray_XMT
http://www.cray.com/products/xmt/
BSD Microkernel schedules only user code on Threadstorm CPUs
500 MHz maximal processor count from 256 to 8192, and maximum memory to 128 TB
single core may run up to 15 programs simultaneously
system IO performed by Opterons running Linux
code manually moved to compute cores via software
global address space
sponsored by Cray
low diameter graphs
work explodes
difficult to partition
high % of nodes visited
scale-free graphs
few nodes are highly connected hubs
difficult to partition
concentrates in few nodes
graph methods
paths (shortest path, min/max, etc)
strutcures (SCC, etc)
groups (coloring, etc)
orderings (topological, etc)
graph properties
degree distribution – normal vs scale-free
planar / non-planar
static / dynamic
weighted / unweighted
typed / untyped edges
challenges
memory intensive
low locality
hide latency w/ parallelism
low computation:communication
fine grained synchronization (per entity)
work is imbalanced
favors work-stealing?
software implications
!! place data near computation
access data in order / reuse
!! partition into tasks
minimize communication / synchronization
avoid modifying shared data
multithreading
avoid stalling cpu pipeline
context switch on memory cache miss
high memory bandwidth
many threads per core, small thread state
PNNL: software tools to facilitate use of local memory
Maximum matching
matching is a subset of edges with no incident vertices
maximize cost function
Hoepman's Algorithm
greedy, fast, terminating
Parallel partitioning
Halappanavar, Dobrina, Pothen
partition graph on shared memory machine
independent computation followed by shared computation
Tools
METIS Partitioner
DataFlow algorithm for maximum matching
nodes propose by setting signals on edges
state is Maybe Bit per (edge x vertex)
fine grain synchronization
performance insensitive to structure / weight
!! deadlock prone
pre-emption is too expensive (XMT does not pre-empt by default!)
manually order tasks to guarantee progress
heaviest edge will always match
add is-active field to vertices
if CAS succeeds, make recursive call to visit connected vertex
would benefit from CPS?
techniques extend to graph coloring, other trad algorithms
Summary
unstrutured / irregular data is graph-centric
dataflow techniques useful for parallelizing graph algorithms
parallel architecture increases productivity
Q/A
PNNL not researching FPGA / GPGPU yet
other research in progress, future area of interest
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment