-
pattern[email protected]:clips/pattern.git -Pattern is a web mining module for Python. It has tools for:
Data Mining: web services (Google, Twitter, Wikipedia), web crawler, HTML DOM parser Natural Language Processing: part-of-speech taggers, n-gram search, sentiment analysis, WordNet Machine Learning: vector space model, clustering, classification (KNN, SVM, Perceptron) Network Analysis: graph centrality and visualization. It is well documented, thoroughly tested with 350+ unit tests and comes bundled with 50+ examples. The source code is
This is the official documentation for the BS4 package: http://www.crummy.com/software/BeautifulSoup/bs4/doc This is the official documentation for the Scrapy package: http://doc.scrapy.org/en/latest This is the official documentation for the mechanize package:http://wwwsearch.sourceforge.net/mechanize scrapy commands: https://doc.scrapy.org/en/latest/topics/commands.html Comparison between Portia and ParseHub: https://www.parsehub.com/blog/portia-vs-parsehub-comparison-which-alternative-is-the-best-option-for-web-scraping/ Twint: https://github.com/twintproject/twint
-
/ SpiderFoot is an open source intelligence (OSINT) automation tool. It integrates with just about every data source available and utilises a range of methods for data analysis, making that data easy to navigate.
-
/ SpiderFoot has an embedded web-server for providing a clean and intuitive web-based interface but can also be used completely via the command-line. It's written in Python 3 and GPL-licensed.
-
Trape is an OSINT analysis and research tool, which allows people to track and execute intelligent social engineering attacks in real time. It was created with the aim of teaching the world how large Internet companies could obtain confidential information such as the status of sessions of their websites or services and control their users through their browser, without their knowlege, but It evolves with the aim of helping government organizations, companies and researchers to track the cybercriminals.
- socialbearing 输入关键词产出dashboard dashboard
- twint
- twint源码分享
- twintproject
- Twint-Distributed [IN PROGRESS]
- twint_kibana
-Twint is an advanced Twitter scraping tool written in Python that allows for scraping Tweets from Twitter profiles without using Twitter's API.
- hashtabs:主题
- cashtags:金融相关如股票主题 相关packages: aiodns 2.0.0 aiohttp 3.5.4 aiohttp-socks 0.2.2 async-timeout 3.0.1 attrs 19.1.0 beautifulsoup4 4.7.1 cchardet 2.1.4 certifi 2019.3.9 cffi 1.12.3 chardet 3.0.4 elasticsearch 7.0.0 fake-useragent 0.1.11 geographiclib 1.49 geopy 1.19.0 idna 2.8 idna-ssl 1.1.0 multidict 4.5.2 numpy 1.16.3 oauthlib 3.0.1 pandas 0.24.2 pip 19.1 pycares 3.0.0 pycparser 2.19 PySocks 1.6.8 python-dateutil 2.8.0 pytz 2019.1 requests 2.22.0 requests-oauthlib 1.2.0 schedule 0.6.0 setuptools 41.0.1 six 1.12.0 soupsieve 1.9.1 tweepy 3.7.0 twint 1.2.3 /home/james/app/twitter_crawler/src/twint typing 3.6.6 typing-extensions 3.7.2 urllib3 1.25.2 wheel 0.33.1 yarl 1.3.0
Twint utilizes Twitter's search operators to let you scrape Tweets from specific users, scrape Tweets relating to certain topics, hashtags & trends, or sort out sensitive information from Tweets like e-mail and phone numbers. I find this very useful, and you can get really creative with it too.
Twint also makes special queries to Twitter allowing you to also scrape a Twitter user's followers, Tweets a user has liked, and who they follow without any authentication, API, Selenium, or browser emulation.
twint -s "Trevor Noah" --verified
twint -g="48.2045507,16.3577661,1km" -o file.csv --csv
twint -u username --since "2019-10-11 21:30:15"
twint -u username --resume file.csv
- (Python Modules for Scraping:)