cd securedrop-docs
First collect the output of make docs-linkcheck
:
make docs-linkcheck > linkcheck.log
It looks like this (head linkcheck.log
and tail linkcheck.log
):
rm -rf _build/*
make[1]: Leaving directory '/home/user/src/securedrop-docs/docs'
Running Sphinx v2.3.1
making output directory... done
building [mo]: targets for 0 po files that are out of date
building [linkcheck]: targets for 81 source files that are out of date
updating environment: [new config] 81 added, 0 changed, 0 removed
reading sources... [ 1%] admin
reading sources... [ 2%] backup_and_restore
[snip]
(line 8) ok https://blog.torproject.org/v2-deprecation-timeline
writing output... [ 98%] what_makes_securedrop_unique
(line 76) ok https://www.reuters.com/article/us-media-cybercrime/journalists-media-under-attack-from-hackers-google-researchers-idUSBREA2R0EU20140328
writing output... [100%] yubikey_setup
(line 59) ok https://www.yubico.com/wp-content/uploads/2015/03/YubiKeyManual_v3.4.pdf
(line 15) ok https://support.yubico.com/hc/en-us/articles/360016614780-OATH-HOTP-Yubico-Best-Practices-Guide
(line 15) redirect https://www.yubico.com/products/yubikey-hardware/fido-u2f-security-key - permanently to https://www.yubico.com/authentication-standards/fido-u2f/
Remove all the lines that are not reporting permanent redirects:
cat linkcheck.log | grep redirect | grep permanently > permanent-redirects.log
# grep redirect # only keeps the lines that contain the word 'redirect'
# grep permanently # only keeps the lines that contain the word 'permanently'
The new file looks like this (tail permanent-redirects.log
):
[snip]
(line 1) redirect https://itunes.apple.com/us/app/freeotp-authenticator/id872559395 - permanently to https://apps.apple.com/us/app/freeotp-authenticator/id872559395
(line 6) redirect https://itunes.apple.com/us/app/google-authenticator/id388497605 - permanently to https://apps.apple.com/us/app/google-authenticator/id388497605
(line 7) redirect https://pypi.python.org/pypi/authenticator - permanently to https://pypi.org/project/authenticator/
(line 14) redirect https://arstechnica.com/security/2013/12/scientist-developed-malware-covertly-jumps-air-gaps-using-inaudible-sound/ - permanently to https://arstechnica.com/information-technology/2013/12/scientist-developed-malware-covertly-jumps-air-gaps-using-inaudible-sound/
(line 647) redirect https://blog.torproject.org/blog/critique-website-traffic-fingerprinting-attacks - permanently to https://blog.torproject.org/critique-website-traffic-fingerprinting-attacks
(line 55) redirect https://docs.securedrop.org - permanently to https://docs.securedrop.org/en/stable/
(line 14) redirect https://tails.boum.org/doc/first_steps/startup_options/administration_password/ - permanently to https://tails.boum.org/doc/first_steps/welcome_screen/administration_password/
(line 14) redirect https://tails.boum.org/doc/first_steps/startup_options/administration_password/ - permanently to https://tails.boum.org/doc/first_steps/welcome_screen/administration_password/
(line 15) redirect https://tails.boum.org/doc/first_steps/startup_options/administration_password/ - permanently to https://tails.boum.org/doc/first_steps/welcome_screen/administration_password/
(line 15) redirect https://www.yubico.com/products/yubikey-hardware/fido-u2f-security-key - permanently to https://www.yubico.com/authentication-standards/fido-u2f/
Edit those lines to only keep one original URL and one updated URL on each of them:
cat permanent-redirects | cut -d')' -f2 | cut -d' ' -f4,8 > old-new-links.log
# cut -d')' -f2 # keeps (each) entire line except their start up to the (only) closing bracket
# cut -d' ' -f4,8 # splits each line on spaces and only keeps the 1st and 8th segments (both URLs!)
The new file looks like this (tail old-new-links.log
):
[snip]
https://itunes.apple.com/us/app/freeotp-authenticator/id872559395 https://apps.apple.com/us/app/freeotp-authenticator/id872559395
https://itunes.apple.com/us/app/google-authenticator/id388497605 https://apps.apple.com/us/app/google-authenticator/id388497605
https://pypi.python.org/pypi/authenticator https://pypi.org/project/authenticator/
https://arstechnica.com/security/2013/12/scientist-developed-malware-covertly-jumps-air-gaps-using-inaudible-sound/ https://arstechnica.com/information-technology/2013/12/scientist-developed-malware-covertly-jumps-air-gaps-using-inaudible-sound/
https://blog.torproject.org/blog/critique-website-traffic-fingerprinting-attacks https://blog.torproject.org/critique-website-traffic-fingerprinting-attacks
https://docs.securedrop.org https://docs.securedrop.org/en/stable/
https://tails.boum.org/doc/first_steps/startup_options/administration_password/ https://tails.boum.org/doc/first_steps/welcome_screen/administration_password/
https://tails.boum.org/doc/first_steps/startup_options/administration_password/ https://tails.boum.org/doc/first_steps/welcome_screen/administration_password/
https://tails.boum.org/doc/first_steps/startup_options/administration_password/ https://tails.boum.org/doc/first_steps/welcome_screen/administration_password/
https://www.yubico.com/products/yubikey-hardware/fido-u2f-security-key https://www.yubico.com/authentication-standards/fido-u2f/
Transform the list of URLs into a list of commanda to replace every ocurence of the original URL in a directory by the updated ones.
Copy the URLs to a new file so that the list of not lost.
cp old-new-links.log replace.sh
Each line looks like this:
https://www.yubico.com/products/yubikey-hardware/fido-u2f-security-key https://www.yubico.com/authentication-standards/fido-u2f/
Observations:
- there is a single space, and it is just between the original URL and the updated URL.
- the URLs contain
/
characters, that we'll have to escape before using sed - the original URL comes first, the updated URL second
In order to update all URLs in all files, we want to achieve the following command (for each line):
find ./docs -type f -exec sed -i -e 's/https:\/\/www.yubico.com\/products\/yubikey-hardware\/fido-u2f-security-key/https:\/\/www.yubico.com\/authentication-standards\/fido-u2f\//g' {} \;
# find ./docs -type f -exec COMMAND \; # applies COMMAND to each file in the 'docs' directory and its sub-directories
# # the name of the file will be placed wherever the COMMAND says {} (double curly bracket)
# sed -i -e 's/A/B/g' FILE # replaces all occurrences of A by B in FILE (case insensitive)
# # note that the / characters are meaningful, so we should make sure any / in A or B
# # is escaped as \/. Because \ is used for escaping, writing a \ takes two \\.
First, lets escape the /
by replacing them by \/
. Remember we'll have to write \\\/
in sed to get \/
in the file:
sed -i -e 's/\//\\\//g' replace.sh
Then let's replace the start of the line (represented by ^
) by find ./ -type f -exec sed -i -e 's/
, ting care of escaping any /
and '
. Because the second sed
is inside quotes, it is not a command just text:
sed -i -e 's/^/find .\/docs -type f -exec sed -i -e \'s\/g' replace.sh
Then let's replace the space between the URLs by /
(the separation between A and B in our example). Rember to escape the /
:
sed -i -e 's/ /\//g` replace.sh
Finally, let's replace the end of line (represented by $
) by /g' {} \;
. Take care as usual of escaping /
, \
and '
:
sed -i -e 's/$/\/g\' {} \\;/g' replace.sh
Some URLs may have been present multiple times, resulting in duplicate commands. Remove duplicate commands:
cat replace.sh | uniq > replace-unique.sh
Applying the replacements from longer URL to shorter URL avoids some mix-ups (but not all of them!):
cat replace-unique.sh | perl -e 'print sort { length($b) <=> length($a) } <>' > replace-unique-sorted.sh
At this point the file looks like this (tail replace-unique-sorted.sh
):
[snip]
find ./ -type f -exec sed -i -e 's/https:\/\/pypi.python.org\/pypi\/authenticator/https:\/\/pypi.org\/project\/authenticator\//g' {} \;
find ./ -type f -exec sed -i -e 's/https:\/\/pypi.python.org\/pypi\/html-linter\//https:\/\/pypi.org\/project\/html-linter\//g' {} \;
find ./ -type f -exec sed -i -e 's/http:\/\/www.vagrantup.com\/downloads.html/https:\/\/www.vagrantup.com\/downloads.html/g' {} \;
find ./ -type f -exec sed -i -e 's/http:\/\/docs.seleniumhq.org\/docs\//https:\/\/www.selenium.dev\/documentation\//g' {} \;
find ./ -type f -exec sed -i -e 's/https:\/\/docs.securedrop.org\//https:\/\/docs.securedrop.org\/en\/stable\//g' {} \;
find ./ -type f -exec sed -i -e 's/https:\/\/docs.securedrop.org/https:\/\/docs.securedrop.org\/en\/stable\//g' {} \;
find ./ -type f -exec sed -i -e 's/http:\/\/weblate.securedrop.org\//https:\/\/weblate.securedrop.org\//g' {} \;
find ./ -type f -exec sed -i -e 's/https:\/\/hstspreload.appspot.com\//https:\/\/hstspreload.org\//g' {} \;
find ./ -type f -exec sed -i -e 's/http:\/\/www.ansible.com\//https:\/\/www.ansible.com\//g' {} \;
find ./ -type f -exec sed -i -e 's/https:\/\/ossec.github.io\//https:\/\/www.ossec.net\//g' {} \;
Finally we can apply the replacement:
sh replace-unique-sorted.sh
And control the URLs as we add them to version control:
git add -p
You'll notice that some URL get overwriten multiple times, resuting in for example: https://docs.securedrop.org/en/latest/enlatest/en/latest/
. I didn't think it worth trying to avoid it this time around. Comment below if you know how! : )