Last active
May 21, 2024 22:17
-
-
Save rubyroobs/77cc19f8985a12ca81e7d53789c144c5 to your computer and use it in GitHub Desktop.
a starting point for investigating anomalous contributions in git repositories
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# ruby's git repo investigation zsh one-liner-ish thingy | |
# (a starting point for investigating anomalous contributions in git repositories) | |
echo "$(find . -type f ! -size 0 ! -path './.git*' -exec grep -IL . "{}" \;)" | \ | |
sed -e "s/^\.\///g" | \ | |
while read line; \ | |
do \ | |
echo ">>>>>>>>$line"; \ | |
echo "$(git log --follow --find-renames=40% --pretty=format:"%ad%x0A%h%x0A%an%x20<%ae>%x0A%s" -- "$line" | head -n 4)"; \ | |
commitdates="$(git log --follow --find-renames=40% --pretty=format:"%ae" -- "$line" | head -n 1 | xargs -I {} git log --author={} --pretty=format:"%ad")"; \ | |
echo "$(echo -n "$commitdates" | grep -c '^') commits authored by email (first commit on $(echo -n "$commitdates" | tail -n 1))"; \ | |
echo "========binwalk"; \ | |
binwalk --disasm --signature --opcodes --extract --matryoshka --depth=32 --rm $line; \ | |
echo "^^^^^^^^strings"; \ | |
echo "$(strings $line | grep -E "^.*([a-z]{3,}|[A-Z]{3,}|[A-Z][a-z]{2,}).*$" | sort | uniq -c | sort -nr | awk '{$1="";print}' | sed 's/^.//' | head -n 25)"; \ | |
echo "<<<<<<<<"; \ | |
done | less # or save it to a file with ">> output.txt", up to you! | |
# before/after running you might want to clean up the repo to HEAD state | |
# if there's untracked files (especially matroyshka extractions etc) the commit data might just be completely wrong | |
# WARNING: THIS WILL DELETE ALL UNTRACKED FILES IN YOUR GIT REPO!!!!! | |
# git clean -fxd | |
# | |
# what it does: | |
# - for each binary files in the current git HEAD | |
# - prints the filenamename | |
# - get the last commit changing it and prints commit sha, date, author name/email and commit msg | |
# - does very crude stats about the authors contribution from that email (first date and number of commits - bear in mind this couuuuld be spoofed) | |
# - prints binwalk output running in matroyshka extraction mode (it'll keep extracted nested archives etc...) | |
# - print top 25 strings that look of interest (ABC, abc, Abc or longer words) | |
# - pipe it to less or whatever you prefer | |
# | |
# caveats/warnings: | |
# - only tested on macos zsh with my local env | |
# - i dont remember which grep/git/posix utils etc i have | |
# - incredibly unoptimizied | |
# - you will need at least binwalk, git, strings for this? | |
# | |
# revisions: | |
# - better instructions, clean up slightly and expand it over multiple lines (thank you Raven!) | |
# - add strings and committer stats | |
# - add sample output for easier sharing | |
# - added another sample output showing some of the benign files in the project containing an actual compressed ELF binary | |
# | |
# future ideas: | |
# - structured outputting (🥲) | |
# - automate running this on repositories that security critical software is dependent on (like sshd -> systemd -> xz) | |
# - use git(hub|lab) APIs to enhance the contributor information shown | |
# - cluster results by contributor | |
# - time bound investigation (get recently changed files from git) and include non-binary but anomalous extensions | |
# |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# This is what the current revision would output for one of the commits adding the exploit-carrying test fixtures in CVE-2024-3094 | |
# (see https://www.openwall.com/lists/oss-security/2024/03/29/4 for more info) | |
# | |
# Sadly this looks pretty similar to other random binary fixtures in the repository (example below), but may inspire your own tweaks for other things to add~ | |
>>>>>>>>tests/files/good-2cat.xz | |
Fri Feb 23 23:09:59 2024 +0800 | |
cf44e4b7 | |
Jia Tan <redacted> | |
Tests: Add a few test files. | |
451 commits authored by email (first commit on Fri Jan 28 20:47:55 2022 +0800) | |
========binwalk | |
Scan Time: 2024-03-30 15:15:13 | |
Target File: /fakepath/xz/tests/files/good-2cat.xz | |
MD5 Checksum: b5615ce7d8388853cdb843b86d92fd57 | |
DECIMAL HEXADECIMAL DESCRIPTION | |
-------------------------------------------------------------------------------- | |
Scan Time: 2024-03-30 15:15:13 | |
Target File: /fakepath/xz/tests/files/good-2cat.xz | |
MD5 Checksum: b5615ce7d8388853cdb843b86d92fd57 | |
Signatures: 445 | |
DECIMAL HEXADECIMAL DESCRIPTION | |
-------------------------------------------------------------------------------- | |
0 0x0 xz compressed data | |
68 0x44 xz compressed data | |
Scan Time: 2024-03-30 15:15:13 | |
Target File: /fakepath/xz/tests/files/_good-2cat.xz.extracted/0 | |
MD5 Checksum: 2d99a8021719ca8257afa1fde6c29902 | |
Signatures: 445 | |
DECIMAL HEXADECIMAL DESCRIPTION | |
-------------------------------------------------------------------------------- | |
Scan Time: 2024-03-30 15:15:13 | |
Target File: /fakepath/xz/tests/files/_good-2cat.xz.extracted/44 | |
MD5 Checksum: 30cfe120f91122e304a5db4e71d3c112 | |
Signatures: 445 | |
DECIMAL HEXADECIMAL DESCRIPTION | |
-------------------------------------------------------------------------------- | |
^^^^^^^^strings | |
__World__ | |
__Hello__ | |
<<<<<<<< | |
# Here is another benign test file - at the very least it's interesting to see some of the lengths the attacker went to so that their binary would blend in with the other test fixtures. | |
>>>>>>>>tests/files/bad-1-stream_flags-3.xz | |
Wed Nov 19 20:46:52 2008 +0200 | |
e114502b | |
Lasse Collin <redacted> | |
Oh well, big messy commit again. Some highlights: - Updated to the latest, probably final file format version. - Command line tool reworked to not use threads anymore. Threading will probably go into liblzma anyway. - Memory usage limit is now about 30 % for uncompression and about 90 % for compression. - Progress indicator with --verbose - Simplified --help and full --long-help - Upgraded to the last LGPLv2.1+ getopt_long from gnulib. - Some bug fixes | |
1801 commits authored by email (first commit on Sun Dec 9 00:42:33 2007 +0200) | |
========binwalk | |
Scan Time: 2024-03-30 15:15:04 | |
Target File: /fakepath/xz/tests/files/bad-1-stream_flags-3.xz | |
MD5 Checksum: c570f7455b47752caf9cad0aa46f2445 | |
DECIMAL HEXADECIMAL DESCRIPTION | |
-------------------------------------------------------------------------------- | |
Scan Time: 2024-03-30 15:15:04 | |
Target File: /fakepath/xz/tests/files/bad-1-stream_flags-3.xz | |
MD5 Checksum: c570f7455b47752caf9cad0aa46f2445 | |
Signatures: 445 | |
DECIMAL HEXADECIMAL DESCRIPTION | |
-------------------------------------------------------------------------------- | |
0 0x0 xz compressed data | |
Scan Time: 2024-03-30 15:15:04 | |
Target File: /fakepath/xz/tests/files/_bad-1-stream_flags-3.xz.extracted/0 | |
MD5 Checksum: fbf68a8e34b2ded53bba54e68794b4fe | |
Signatures: 445 | |
DECIMAL HEXADECIMAL DESCRIPTION | |
-------------------------------------------------------------------------------- | |
^^^^^^^^strings | |
World! | |
Hello | |
<<<<<<<< | |
# Including for comparison also, this is an older benign test file from 2009 that is a compressed x86 ELF binary (the reason for it is explained in the commit message) | |
>>>>>>>>tests/files/good-1-x86-lzma2.xz | |
Fri Feb 6 09:13:15 2009 +0200 | |
975d8fd7 | |
Lasse Collin <redacted> | |
Recreated the BCJ test files for x86 and SPARC. The old files were linked with crt*.o, which are copyrighted, and thus the old test files were not in the public domain as a whole. They are freely distributable though, but it is better to be careful and avoid including any copyrighted pieces in the test files. The new files are just compiled and assembled object files, and thus don't contain any copyrighted code. | |
1801 commits authored by email (first commit on Sun Dec 9 00:42:33 2007 +0200) | |
========binwalk | |
Scan Time: 2024-03-30 15:15:35 | |
Target File: /fakepath/xz/tests/files/good-1-x86-lzma2.xz | |
MD5 Checksum: cac74e22da04c4e067d21ae170863bf2 | |
DECIMAL HEXADECIMAL DESCRIPTION | |
-------------------------------------------------------------------------------- | |
Scan Time: 2024-03-30 15:15:35 | |
Target File: /fakepath/xz/tests/files/good-1-x86-lzma2.xz | |
MD5 Checksum: cac74e22da04c4e067d21ae170863bf2 | |
Signatures: 445 | |
DECIMAL HEXADECIMAL DESCRIPTION | |
-------------------------------------------------------------------------------- | |
0 0x0 xz compressed data | |
Scan Time: 2024-03-30 15:15:35 | |
Target File: /fakepath/xz/tests/files/_good-1-x86-lzma2.xz.extracted/0 | |
MD5 Checksum: ce212d6a1cfe73d8395a2b42f94c2419 | |
Signatures: 445 | |
DECIMAL HEXADECIMAL DESCRIPTION | |
-------------------------------------------------------------------------------- | |
0 0x0 ELF, 32-bit LSB relocatable, Intel 80386, version 1 (SYSV) | |
^^^^^^^^strings | |
Ebv) | |
<<<<<<<< |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment