Skip to content

Instantly share code, notes, and snippets.

@relaxdiego
Last active February 17, 2019 08:59
Show Gist options
  • Save relaxdiego/ffdb0e3023bfd47b3c531e639cff6d78 to your computer and use it in GitHub Desktop.
Save relaxdiego/ffdb0e3023bfd47b3c531e639cff6d78 to your computer and use it in GitHub Desktop.
#!/bin/bash
set -e
arch=${1:-amd64}
repo=${2:-"http://ftp.us.debian.org/debian/dists/stable/main"}
url="${repo}/Contents-${arch}.gz"
gz_path="$HOME/.deb_content_files/$(sed -r 's|https?://||g' <<< $url)"
txt_path=${gz_path/%.gz/}
mkdir -p $(dirname "$gz_path")
cd $(dirname "$gz_path")
wget "$url"
gunzip --force "$gz_path"
printf "Processing %'d lines\n" $(wc -l "$txt_path" | awk '{print $1}')
echo "Yeah just go grab a coffee while I do my thing..."
time cat "$txt_path" |
awk '{print $2}' | # Get only the last column where package names are mentioned
sed -r 's|,|\n|g' | # Split comma-separated packages by newline
sed -r 's|(.*/)+||g' | # Ignore everything else but the package name
sort | uniq -c | # Count the number of times the package occured
sort -rn | # Reverse sort based on the count
head -n 10 # List only the top 10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment