EasyPerf 1 August 2021 Twitter Space

Guest Thomas Dullien

https://twitter.com/halvarflake

https://optimyze.cloud/

Ofensive security is understanding large scale legacy systems.
Whole stack analysis required.
Tool built for Microsoft patch analysis (bindiff) is useful for seeing compiler changes
Weird machines, exploitability, and provable unexploitability
Two ARM cores in a modern USB cable.
Security is human conflict - performance is environmental impact.
Moore's law has many versions, cost per transitor is not falling.
What is a software vendor? Snowflake, Spotify? Vendors now have hardware and performance costs.
Hyperscalers in 2009 started building performance monitoring - led to TPU and Youtube transcoding hardware. FFMPEG was a huge bill for Google.
Torch/Tensorflow are Python interfaces to GPUs. Going to see more specialized accelerator libraries.
All cloud providers including Azure are adopting ARM.
Telemetry can build better chips. Apple and Amazon in house processors. Intel at a disadvantage - has to buy the data.
There will be a C level excecutive in charge of digital operating expense.
Utility computing has low margins. Cloud providers make profit from value added services.
Compiler design integration will improve.
There is no typical workload, but there are classes: Java, database, Python calling into c++. Linux is ubiqutios. AWS Graviton is possible because of open source. One recompile between CPU architectures.
Most money spent on 3rd party open source packages.
Bottlenecks are mostly the database. Amazon Linux has a default that spends 15% on getting clocks. Kubernetes does a disk quota that eats CPU.
There is way more Java in the routine enterprise. Allocation overhead is huge - 15%.
Public traded SAAS companies. Gross margin is growing linearly with market cap.
(Thomas' mobile battery died, short ironic intermission) - Denis discussion of power use.
Open source alternative to Thomas' Prodfiler - Pyroscope Secret sauce for logger is bccutils https://github.com/iovisor/bcc/blob/master/tools/profile.py
Netflix invested in Brendan Gregg's system tracing.
Car manufactuers know cost down to a screw. We need to measure at that granularity to optimize cost.
Google and Facebook built fleet wide profilers.
std::map data visible etc. https://abseil.io/
FB and Google compile with frame pointers to help profiling.
Optimyze.cloud profiler works on the operating system inrerupt to log all stack traces. File ID and offset pairs. Index of major package debug symbols - can upload your own symbols.
Extra info for JVM, Ruby, Python virtual machine traces.
Last branch record in hardare could help (Denis).
ELF format tutorial
5 minute DWARF 5 overview
ld and gold linkers
lld linker
Regex JIT was turned off. Lots of time spent on UTF-8 when it was ASCII. Low hanging fruit on many web back ends.
AB testing of compiler flags - especialy with LTO/PGO. Hard problem to isolate root cause.
LTO can increase code size with inlining - create cache issues - because of bad profiling data.
REST API in the Fall for PGO with profiling data.
Too much time in serialization/deserialization. Python <-> Java in Spark workloads eating 30%.
Microservices need both profiling and distributed tracing. Akita OpenTelemetry
hash callstack, keep a counter for a histogram
reduce sampling rate on larger fleets
have some workers increase sample rate for more fine grain - target less than 1% of CPU
fork() heavy code has a lot of stack profiling overhead. Like full Linux kernel compile.
Want IO profiling.
Want better memory profiling.
(My mobile battery died. Lol) My phone while charging also shutdown when opening the Twitter app after a short recharge. That application when running Twitter spaces is an extreme power hog.

chadbrewbaker/dulien.md