The official Elasticsearch documentation site is protected with Cloudflare Captcha so it can't be directly scraped.
The built-docs repo isn't self-contained (there are external resources, both images and js) and I don't understand how their air_gapped docs work, nor are there instructions on how to use it.
I want to read documentation offline.
What this setup does is bundle the entirety of ES docs into an offline searchable archive.
Operation:
- Do a shallow clone of the documentation
git clone --depth=10 https://github.com/elastic/built-docs
- Put the scripts here inside of it
- Run
./start.sh
Browsertrix is used to crawl the documentation which is hosted and served locally from nginx. A custom sitemap is generated to help browsertrix crawl the entire thing.