Created
April 4, 2017 22:13
-
-
Save portante/86fc8fecd6499760e0249a672463168b to your computer and use it in GitHub Desktop.
Thoughts About Logging Trade-offs
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| > We certainly don't want to log all that with every message. Given that | |
| > UUID is UU, can't that be squirreled away in a lookup table somewhere else | |
| > out of the fast path? | |
| Unfortunately, it seems we have a clash of two valid concerns with logging: | |
| 1. efficiency: How do emit logs efficiently such that the process of emitting, collecting, and shipping them off does not over-burden the environment making it unusable | |
| 1. utility: How do we include sufficient metadata surrounding the logs so that they are useful when a entity consumes them | |
| These are age-old trade-offs that we are discussing. | |
| At the point where somebody is consuming these logs, some number of components in the system will have already paid the cost of CPU, network bandwidth, or memory to provide the data needed to make the logs useful. | |
| However, if we don't construct the system right, we may be substantially lowering the utility of the logs in order to be the most efficient handling the logs. | |
| Where we place log enrichment with metadata is key. Too far away from the source, enrichment can be cost prohibitive in memory and CPU time needed to perform the enrichment; too close to the source, network bandwidth and local storage considerations can become the dominating cost in the system. | |
| The trends today are towards much faster networks and much larger memory/storage. It seems prudent to stay away from choices that increase CPU processing intensive JOIN operations in centralized environments, and work to perform the enrichment closer to the sources. | |
| A possible model to consider: | |
| - collector: gathers raw logs with minimal metadata added sufficient to: | |
| - reconstruct total order of logs emitted | |
| - timestamp of each log | |
| - UUID of entity emitting logs | |
| - *GOAL*: collector emits logs with good utilization of available network bandwidth, low memory usage, and CPU usage, and minimal local storage | |
| - enricher: uses UUID of entity in log to add metadata to logs to avoid CPU intensive JOIN operations later | |
| - one enricher handles many collectors | |
| - one enricher uses various compression techniques to bundle aggregate log data for efficient bandwidth, but requires higher bandwidth to trade off against larger JOINs later | |
| - *GOAL*: enricher use CPU intensive JOIN operations to enrich logs |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment