Last active
November 3, 2024 12:14
-
-
Save fastfingertips/8cc0fbb5c35c22dd7238297ca742ecf8 to your computer and use it in GitHub Desktop.
Scraping tools
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| LXML | |
| SCRAPY | |
| HTTPLIB | |
| REQUESTS | |
| SELENIUM | |
| HTMLparser | |
| HTMLPARSER | |
| BEAUTIFULSOUP | |
| URLLIB / URLLIB2 | |
| https://lxml.de/ | |
| https://axiom.ai/ | |
| https://apify.com/ | |
| https://jsoup.org/ | |
| https://scrapy.org | |
| https://texau.app/ | |
| https://page.rest/ | |
| https://80legs.com/ | |
| https://agenty.com/ | |
| http://gearman.org/ | |
| http://sikulix.com/ | |
| https://crawlee.dev/ | |
| https://serpapi.com/ | |
| https://wrapapi.com/ | |
| http://go-colly.org/ | |
| http://gigablast.com/ | |
| https://jaunt-api.com/ | |
| https://www.listly.io/ | |
| https://webscraping.ai/ | |
| https://cheerio.js.org/ | |
| https://playwright.dev/ | |
| https://gimmeproxy.com/ | |
| https://www.mixnode.com/ | |
| https://browserbird.com/ | |
| https://webautomation.io/ | |
| https://scrapingfish.com/ | |
| https://www.opengraph.io/ | |
| https://www.page2api.com/ | |
| https://www.parsehub.com/ | |
| https://serpapi.com/status | |
| https://www.nongnu.org/txr/ | |
| https://www.browserless.io/ | |
| https://python-rq.org/docs/ | |
| https://www.botscraper.com/ | |
| https://www.scrapingbee.com/ | |
| https://pypi.org/project/sh/ | |
| https://www.rainforestqa.com/ | |
| https://chrome.browserless.io/ | |
| https://stedolan.github.io/jq/ | |
| https://videlibri.de/xidel.html | |
| https://github.com/scrapy/scrapy | |
| https://github.com/gocolly/colly | |
| https://altilunium.my.id/psedex/ | |
| https://estela.bitmaker.la/docs/ | |
| https://github.com/scrapy/scrapyd | |
| https://github.com/ericchiang/pup | |
| https://github.com/dmi3kno/polite | |
| https://www.zyte.com/scrapy-cloud/ | |
| https://github.com/featurist/coypu | |
| https://pypi.org/project/explicit/ | |
| https://github.com/jahaynes/crawler | |
| http://novosial.org/perl/one-liner/ | |
| https://github.com/AutomaApp/automa | |
| http://docs.pyspider.org/en/latest/ | |
| https://codecanyon.net/user/ikajian | |
| https://docs.celeryq.dev/en/stable/ | |
| https://github.com/bitmakerla/estela | |
| https://github.com/cheeriojs/cheerio | |
| https://github.com/AlexMili/Scraptory | |
| https://substack.thewebscraping.club/ | |
| https://github.com/altilunium/wi-page | |
| https://github.com/browserless/chrome | |
| https://github.com/scrapinghub/portia | |
| https://github.com/altilunium/wistalk | |
| https://til.simonwillison.net/gpt3/jq | |
| https://github.com/JCMais/node-libcurl | |
| https://github.com/segmentio/nightmare | |
| https://github.com/altilunium/arachnid | |
| https://github.com/gotripod/ssscraper/ | |
| https://github.com/puppeteer/puppeteer | |
| https://github.com/google/gumbo-parser | |
| https://en.wikipedia.org/wiki/ISO_8601 | |
| https://github.com/clj-commons/hickory | |
| https://github.com/microsoft/playwright | |
| https://github.com/matthewmueller/x-ray | |
| https://github.com/altilunium/makalahIF | |
| https://github.com/kanishka-linux/hlspy | |
| https://csvkit.readthedocs.io/en/latest/ | |
| https://mercury.postlight.com/web-parser/ | |
| https://en.wikipedia.org/wiki/Web_ARChive | |
| https://github.com/sananth12/ImageScraper | |
| https://github.com/sparklemotion/mechanize | |
| https://github.com/brutuscat/medusa-crawler | |
| https://cssselect.readthedocs.io/en/latest/ | |
| https://newspaper.readthedocs.io/en/latest/ | |
| https://github.com/prisma-archive/chromeless | |
| https://github.com/lwthiker/curl-impersonate | |
| https://github.com/alixaxel/chrome-aws-lambda | |
| https://www.lambdatest.com/automation-testing | |
| https://robobrowser.readthedocs.io/en/latest/ | |
| https://news.ycombinator.com/item?id=15694118 | |
| https://github.com/python-mechanize/mechanize | |
| https://github.com/ruippeixotog/scala-scraper | |
| https://github.com/vsupalov/docker-puppeteer-dev | |
| https://github.com/MechanicalSoup/MechanicalSoup | |
| https://www.drupal.org/project/example_web_scraper | |
| https://splash.readthedocs.io/en/stable/index.html | |
| https://developer.chrome.com/blog/headless-chrome/ | |
| https://github.com/sunra/php-simple-html-dom-parser | |
| https://til.simonwillison.net/aws/boto-command-line | |
| https://github.com/mherrmann/selenium-python-helium | |
| https://developer.chrome.com/docs/devtools/recorder/ | |
| https://apify.com/petr_cermak/anti-captcha-recaptcha | |
| https://splash.readthedocs.io/en/latest/install.html | |
| https://brycematheson.io/webscraping-with-powershell/ | |
| https://blog.jeaye.com/2017/02/28/clojure-apartments/ | |
| https://vsupalov.com/headless-chrome-puppeteer-docker/ | |
| https://github.com/aaronhoffman/WebsiteContactHarvester | |
| https://github.com/sambaiz/puppeteer-lambda-starter-kit | |
| https://simplehtmldom.sourceforge.io/docs/1.9/index.html | |
| https://dev.woob.tech/guides/module.html#parsing-of-pages | |
| https://ui.vision/rpa/docs/selenium-ide/capturescreenshot | |
| https://webarchive.jira.com/wiki/spaces/Heritrix/overview | |
| https://sangaline.com/post/advanced-web-scraping-tutorial/ | |
| https://www.cloudflare.com/pg-lp/bot-mitigation-fight-mode/ | |
| https://bitmaker.la/blog/2022/06/24/estela-oss-release.html | |
| https://www.npmjs.com/package/puppeteer-extra-plugin-stealth | |
| https://github.com/ultrafunkamsterdam/undetected-chromedriver | |
| https://www.chrismytton.com/2015/01/19/web-scraping-with-ruby/ | |
| https://franciskim.co/why-im-extremely-bullish-on-open-source-rpa/ | |
| https://www.imperva.com/blog/web-scraping-bots/?redirect=Incapsula | |
| https://github.com/reanalytics-databoutique/webscraping-open-project | |
| https://docs.browserflow.app/tutorials/tutorial-scrape-a-list-of-urls | |
| https://cran.r-project.org/web/packages/httr/vignettes/quickstart.html | |
| https://simonwillison.net/2022/Mar/14/scraping-web-pages-shot-scraper/ | |
| https://www.crummy.com/software/BeautifulSoup/bs4/doc/#installing-a-parser | |
| https://sites.google.com/site/scriptsexamples/learn-by-example/parsing-html | |
| https://blitapp.com/blog/take-screenshots-of-multiple-pages-behind-a-login/ | |
| https://www.guru99.com/page-object-model-pom-page-factory-in-selenium-ultimate-guide.html | |
| https://medium.com/phantombuster/web-scraping-in-2017-headless-chrome-tips-tricks-4d6521d695e8 | |
| https://developers.cloudflare.com/logs/get-started/enable-destinations/s3-compatible-endpoints | |
| https://sourcegraph.com/search?q=context:global+repo:chromium/chromium+kHeadless&patternType=literal | |
| https://stackoverflow.com/questions/33225947/can-a-website-detect-when-you-are-using-selenium-with-chromedriver/41220267#41220267 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment