Skip to content

Instantly share code, notes, and snippets.

@ggirtsou
Created November 17, 2018 09:11
Show Gist options
  • Save ggirtsou/bfc8d328a0318b2f0c4f9819b4627ac4 to your computer and use it in GitHub Desktop.
Save ggirtsou/bfc8d328a0318b2f0c4f9819b4627ac4 to your computer and use it in GitHub Desktop.
Bash script that extracts HTML title tag from files and creates a table of contents. Assumes files are named {number}.html.
#!/bin/bash
END=$(ls -l | grep ".html" | wc -l)
START=1
for (( i=$START; i<=$END; i++ ))
do
cat ./$i.html | grep -oE "<title>.*</title>" | sed 's/<title>/'"$i"'.html /' | sed 's/<\/title>//' | head -n 1;
done
@ggirtsou
Copy link
Author

Usage:

cd directory-with-html-files
chmod +x generate_table_of_contents.sh
./generate_table_of_contents.sh
1.html title 1
2.html title 2
3.html title 3

You might want to redirect the output to a file, for example: ./generate_table_of_contents.sh > table_of_contents.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment