pe3 · April 13, 2025 15:45 · rahaaatul · May 25, 2024
diff --git a/scrape_entire_website_with_wget.sh b/scrape_entire_website_with_wget.sh
 this worked very nice for a single page site
 ```
 wget \
     --recursive \
     --page-requisites \
     --convert-links \
         [website]
 ```

 wget options
 ```
 wget \
     --recursive \
     --no-clobber \
     --page-requisites \
     --html-extension \
     --convert-links \
     --restrict-file-names=windows \
     --domains website.org \
     --no-parent \
         www.website.com

 --recursive: download the entire Web site.
 --domains website.org: don't follow links outside website.org.
 --no-parent: don't follow links outside the directory tutorials/html/.
 --page-requisites: get all the elements that compose the page (images, CSS and so on).
 --html-extension: save files with the .html extension.
 --convert-links: convert links so that they work locally, off-line.
 --restrict-file-names=windows: modify filenames so that they will work in Windows as well.
 --no-clobber: don't overwrite any existing files (used in case the download is interrupted and
 resumed).

 ```

 there is also [node-wget](https://github.com/wuchengwei/node-wget)
	this worked very nice for a single page site
	```
	wget \
	--recursive \
	--page-requisites \
	--convert-links \
	[website]
	```

	wget options
	```
	wget \
	--recursive \
	--no-clobber \
	--page-requisites \
	--html-extension \
	--convert-links \
	--restrict-file-names=windows \
	--domains website.org \
	--no-parent \
	www.website.com

	--recursive: download the entire Web site.
	--domains website.org: don't follow links outside website.org.
	--no-parent: don't follow links outside the directory tutorials/html/.
	--page-requisites: get all the elements that compose the page (images, CSS and so on).
	--html-extension: save files with the .html extension.
	--convert-links: convert links so that they work locally, off-line.
	--restrict-file-names=windows: modify filenames so that they will work in Windows as well.
	--no-clobber: don't overwrite any existing files (used in case the download is interrupted and
	resumed).

	```

	there is also [node-wget](https://github.com/wuchengwei/node-wget)