Guest Thomas Dullien
https://twitter.com/halvarflake
- Ofensive security is understanding large scale legacy systems.
- Whole stack analysis required.
- Tool built for Microsoft patch analysis (bindiff) is useful for seeing compiler changes
- Weird machines, exploitability, and provable unexploitability
- Two ARM cores in a modern USB cable.
- Security is human conflict - performance is environmental impact.
- Moore's law has many versions, cost per transitor is not falling.
- What is a software vendor? Snowflake, Spotify? Vendors now have hardware and performance costs.
- Hyperscalers in 2009 started building performance monitoring - led to TPU and Youtube transcoding hardware. FFMPEG was a huge bill for Google.
- Torch/Tensorflow are Python interfaces to GPUs. Going to see more specialized accelerator libraries.
- All cloud providers including Azure are adopting ARM.
- Telemetry can build better chips. Apple and Amazon in house processors. Intel at a disadvantage - has to buy the data.
- There will be a C level excecutive in charge of digital operating expense.
- Utility computing has low margins. Cloud providers make profit from value added services.
- Compiler design integration will improve.
- There is no typical workload, but there are classes: Java, database, Python calling into c++. Linux is ubiqutios. AWS Graviton is possible because of open source. One recompile between CPU architectures.
- Most money spent on 3rd party open source packages.
- Bottlenecks are mostly the database. Amazon Linux has a default that spends 15% on getting clocks. Kubernetes does a disk quota that eats CPU.
- There is way more Java in the routine enterprise. Allocation overhead is huge - 15%.
- Public traded SAAS companies. Gross margin is growing linearly with market cap.
- (Thomas' mobile battery died, short ironic intermission) - Denis discussion of power use.
- Open source alternative to Thomas' Prodfiler - Pyroscope Secret sauce for logger is bccutils https://github.com/iovisor/bcc/blob/master/tools/profile.py
- Netflix invested in Brendan Gregg's system tracing.
- Car manufactuers know cost down to a screw. We need to measure at that granularity to optimize cost.
- Google and Facebook built fleet wide profilers.
- std::map data visible etc. https://abseil.io/
- FB and Google compile with frame pointers to help profiling.
- Optimyze.cloud profiler works on the operating system inrerupt to log all stack traces. File ID and offset pairs. Index of major package debug symbols - can upload your own symbols.
- Extra info for JVM, Ruby, Python virtual machine traces.
- Last branch record in hardare could help (Denis).
- ELF format tutorial
- 5 minute DWARF 5 overview
- ld and gold linkers
- lld linker
- Regex JIT was turned off. Lots of time spent on UTF-8 when it was ASCII. Low hanging fruit on many web back ends.
- AB testing of compiler flags - especialy with LTO/PGO. Hard problem to isolate root cause.
- LTO can increase code size with inlining - create cache issues - because of bad profiling data.
- REST API in the Fall for PGO with profiling data.
- Too much time in serialization/deserialization. Python <-> Java in Spark workloads eating 30%.
- Microservices need both profiling and distributed tracing. Akita OpenTelemetry
- hash callstack, keep a counter for a histogram
- reduce sampling rate on larger fleets
- have some workers increase sample rate for more fine grain - target less than 1% of CPU
- fork() heavy code has a lot of stack profiling overhead. Like full Linux kernel compile.
- Want IO profiling.
- Want better memory profiling.
- (My mobile battery died. Lol) My phone while charging also shutdown when opening the Twitter app after a short recharge. That application when running Twitter spaces is an extreme power hog.