-
Do a quick performance check in 60 seconds
-
Use a number of different tools available in unix
-
Use flamegraphs of the callstack if you have access to them
-
Best performance winds are elimiating unnecessary wrok, for example a thread stack in a loop, eliminating bad config
-
Mantras: Don't do it (elimiate); do it again (caching); do it less (polling), do it when they're not looking, do it concurrently, do it more cheaply
-
Latency is an essential performance metric - the time for an operation to complete
- Operation request
- Database query
- File system operation
- We can improve latency by decreasing disk reads, aka caching
Counter --> Statistics --> Metrics --> Alerts
Profiling tools allow us to take simple measures of CPUs, including flamegraphs, which show us CPU footprint.
The x-axis shows the stack profile population, sorted alphabetically (it is not the passage of time), and the y-axis shows stack depth, counting from zero at the bottom. Each rectangle represents a stack frame. The wider a frame is is, the more often it was present in the stacks. The top edge shows what is on-CPU, and beneath it is its ancestry. Original flame graphs use random colors to help visually differentiate adjacent frames. Variations include inverting the y-axis (an "icicle graph"), changing the hue to indicate code type, and using a color spectrum to convey an additional dimension.
Tracing - Event-based recording where data is saved for later analysis.
Also here: if you only have a bit of time to profile your system.
In 60 seconds you can get a high level idea of system resource usage and running processes by running the following ten commands. Look for errors and saturation metrics, as they are both easy to interpret, and then resource utilization. Saturation is where a resource has more load than it can handle, and can be exposed either as the length of a request queue, or time spent waiting
Don't only use top because you don't know other tools, creates a streetlight effect.
uptime
dmesg | tail
vmstat 1
mpstat -P ALL 1
pidstat 1
iostat -xz 1
free -m
sar -n DEV 1
sar -n TCP,ETCP 1
top
- IOPS - input/output per second, data trasnfer
- Latency - measure of time of operations spent waiting
- Saturation - Degree which a resource has been queued
- Hit ratio: number of times needed data is found in cache versus total access (hits+ misses)
Good -- Fast -- Cheap ; high-performance -- Ontime -- inexpensive
File system size: small records perform better for I/O; larger record sizes will improve streaming workloads
-
Performance tuning is most effective when done closest to the work performed
-
**MRU **- most recently used
-
LRU - least recently used
-
MFU - most frequently used
-
LFU - least recently used
Cold cache - empty, populated with unwanted data. Hit ratio is zero as it begins to warm up. Warm cache - populated with useful data but doesn't have a large enough hit ratio
Cold --> Warm --> Hot
Ratio improving
Cache tuning: Aim to cache as high in the stack as possible, closer to where the work is, performed directly reduces the operational overload of cache hits.
p. 61: performance Mantras
State the goals of the study and define system boundaries
List system services and possible outcomes
Select performance metrics
List system and workload parameters
Select factors and their values
Select the workload
Design the experiments
Analyze and interpret the data
Present the results
If necessary, start over
Disk utilization can become a problem even before it hits 100%. To find the bottleneck:
- Measure rate of server requests, monitor this rate over tme
- Measure hardware and software resource usage
- Express server requests in terms of resource used
- Extrapolate severer requests for each resource
Constraints:
**Hardware: **
- CPU Utilization
- Memory Usage
- Disk IOPS
- Disk Throughput
- Disk Capacity
**Software: **
- Virtual memory usage
- Proess/tasks
- File descriptions
Sharding - a common strategy for databases where data split into logical components, each managed by its own database
p. 106 - CPU versus IO bound:
- CPU: Performing heavy compute like science and math
- IO-bound: performing io like web servers and file servers, low latency is important