- BOLT: A Practical Binary Optimizer for Data Centers and Beyond
- Optimizing Function Placement for Large-Scale Data-Center Applications
- AutoFDO: Automatic Feedback-DirectedOptimization for Warehouse-Scale Applications
- HHVM JIT: A Profile-Guided, Region-Based Compilerfor PHP and Hack
- Branch Prediction for Free
- Static Branch Frequency and Program Profile Analysis
- Profile Guided Context-Sensitive Program Analysis
- Profile Guided Code Positioning
- Improved Basic Block Reordering
- Codestitcher: Inter-procedural Basic Block Layout Optimization
- Basic-block Reordering Using Neural Networks
- An Overview of Software Performance Analysis Tools and Techniques: From GProf to DTrace
- Coz: finding code that counts with causal profiling
- REV.NG: A Unified Binary Analysis Frameworkto Recover CFGs and Function Boundaries
- Binary Rewriting without Control Flow Recovery
- Regular Expressions and State Graphs for Automata: McNaughton-Yamada-Thompson algorithm.
-
The BSD Packet Filter: A New Architecture for User-level Packet Capture
- Minimizes packet copying by implementing filters in the kernel.
- Uses a register based virtual machine.
- CFG for filter abstraction instead of tree mode.
-
Paving the Way for NFV: Simplifying Middlebox Modifications Using StateAlyzr
- Static analysis of middlebox code to identify variables related to middlebox state.
- Careful data flow analysis to improve precision without losing soundness.
- Improves precision by identifying variables that are only used at middlebox initialization and the variables that are actually needed for packet processing.
- Identifies state variables that are read only or updatable. Based on this developers can decide which variables can get simultaneous access.
-
Batchy: Batch-scheduling Data Flow Graphs with Service-level Objectives
- Processing in batches can utilize CPU cache, data level parallelism.
- Queuing can remove batch fragmentation. Batchy describes efficient scheduling for queuing.
- Finds weight of WFQ/CFS scheduler based on the static batch size.
-
- Arbitrary memory operations by remote NIC cpu.
- Reduces network load and client CPU usage by reducing the number of operations for memory workloads.