These instructions are adapted from the comprehensive Manta Debugging Guide, with added clarification. See the debugging guide for more information on everything related to log-based debugging.
Each zone maintains a log file for the current hour (a real-time log) at /var/log/muskie.log
. At the top of every hour, the past hour's log file (now a historical log) is uploaded to the datacenter's manta at /poseidon/stor/logs/COMPONENT/YYYY/MM/DD/HH/SHORTZONE.log
, where:
COMPONENT
varies based on the component you’re looking forYYYY
,MM
,DD
, andHH
represent the year, month, day, and hour for the entries in the log fileSHORTZONE
is the first 8 characters of the zone’s uuid.
To retrieve logs from all muskie instances from the current hour thus far, we can run manta-oneach
from the headnode:
manta-oneach -s webapi 'cat /var/log/muskie.log'
In practice, it may be more useful to place an upper bound on the number of log entries we're retrieving (using tail
) and filter to include error-level logs only (using grep
). Here's a representative example, which also extracts specific fields from the log entries using json
, sorts the results, and reports a count of how many times each tuple of fields appears:
manta-oneach -s webapi 'tail -n 900 /var/log/muskie.log | grep "handled: 5" |
json -gaH res.statusCode route err.message | sort | uniq -c'
Note that it may also be useful to tee
or redirect to a file, so the results don't get lost in your terminal scrollback.
To retrieve logs from all muskie instances outside of the current hour, we can use mfind
and mls
. This can be done anywhere you've set up the node-manta
sdk to connect to your manta deployment.
Here's a representative example for December 5, 2018:
mfind -t o /poseidon/stor/logs/muskie/2018/12/05 | mjob create -o -m 'cat'
As with the current-hour logs, you can (and should) replace cat
with whatever filtering and sorting pipeline you'd like. In production deployments, an hour's worth of logs can be so large you almost certainly don't want the whole thing!
You can adjust the mfind
invocation as needed to scan a broader or more narrow time range. You can also use the -n
argument to mfind
to select log files for a particular zone over the given time range:
mfind -n f6817865.log -t o /poseidon/stor/logs/muskie/2018/12/05 |
mjob create -o -m 'cat'
In manta deployments where jobs don't work, you can instead route the output of mfind
into mget
and process it locally, like so:
mfind -t o /poseidon/stor/logs/muskie/2018/12/05 | while read f; do mget $f; done
Note that in this case, you should use head
instead of tail
to grab a set number of lines of log output, so mget
will return early once the specified number of lines have been read. Here's an example:
mfind -t o /poseidon/stor/logs/muskie/2018/12/05 | while read f; do mget $f | head -n 1000; done
- The archival process for historical logs first rotates the logs to new files under
/var/log/manta/upload
. A few minutes later, these are uploaded to Manta and then removed from the local filesystem. If the upload fails, the files are kept in/var/log/manta/upload
for up to two days. In extreme situations where Manta has been down for over an hour, you may find recent historical log files in/var/log/manta/upload
, and you can scan them similar to the live log files usingmanta-oneach
.