Web Crawling: Data Scraping vs. Data Crawling
- https://www.promptcloud.com/data-scraping-vs-data-crawling/
- https://www.mkyong.com/java/jsoup-basic-web-crawler-example/
HTML parsers: https://en.wikipedia.org/wiki/Comparison_of_HTML_parsers
- crawler4j : popular web crawler - https://github.com/yasserg/crawler4j
- jSoup : java HTML parser - https://jsoup.org/
- Jaunt : java web scraping & JSON querying - http://jaunt-api.com/
- mechanize : stateful HTTP web services client library - https://github.com/GistLabs/mechanize
- Ui4j : java based web-automation library (wrapper around javaFx webkit engine) - https://github.com/webfolderio/ui4j
- Apache HttpComponents : low level - http://hc.apache.org/index.html
- HTMLunit - http://htmlunit.sourceforge.net/gettingStarted.html
- HTMLCleaner - http://htmlcleaner.sourceforge.net/