pattern

pattern [email protected]:clips/pattern.git -Pattern is a web mining module for Python. It has tools for:

Data Mining: web services (Google, Twitter, Wikipedia), web crawler, HTML DOM parser Natural Language Processing: part-of-speech taggers, n-gram search, sentiment analysis, WordNet Machine Learning: vector space model, clustering, classification (KNN, SVM, Perceptron) Network Analysis: graph centrality and visualization. It is well documented, thoroughly tested with 350+ unit tests and comes bundled with 50+ examples. The source code is

tools

This is the official documentation for the BS4 package: http://www.crummy.com/software/BeautifulSoup/bs4/doc This is the official documentation for the Scrapy package: http://doc.scrapy.org/en/latest This is the official documentation for the mechanize package:http://wwwsearch.sourceforge.net/mechanize scrapy commands: https://doc.scrapy.org/en/latest/topics/commands.html Comparison between Portia and ParseHub: https://www.parsehub.com/blog/portia-vs-parsehub-comparison-which-alternative-is-the-best-option-for-web-scraping/ Twint: https://github.com/twintproject/twint

awesome-osint
spiderfoot
/ SpiderFoot is an open source intelligence (OSINT) automation tool. It integrates with just about every data source available and utilises a range of methods for data analysis, making that data easy to navigate.
/ SpiderFoot has an embedded web-server for providing a clean and intuitive web-based interface but can also be used completely via the command-line. It's written in Python 3 and GPL-licensed.
trape
Trape is an OSINT analysis and research tool, which allows people to track and execute intelligent social engineering attacks in real time. It was created with the aim of teaching the world how large Internet companies could obtain confidential information such as the status of sessions of their websites or services and control their users through their browser, without their knowlege, but It evolves with the aim of helping government organizations, companies and researchers to track the cybercriminals.

twitter

socialbearing 输入关键词产出dashboard dashboard
twint
twint源码分享
twintproject
Twint-Distributed [IN PROGRESS]
twint_kibana -Twint is an advanced Twitter scraping tool written in Python that allows for scraping Tweets from Twitter profiles without using Twitter's API.
- hashtabs:主题
- cashtags:金融相关如股票主题相关packages: aiodns 2.0.0 aiohttp 3.5.4 aiohttp-socks 0.2.2 async-timeout 3.0.1 attrs 19.1.0 beautifulsoup4 4.7.1 cchardet 2.1.4 certifi 2019.3.9 cffi 1.12.3 chardet 3.0.4 elasticsearch 7.0.0 fake-useragent 0.1.11 geographiclib 1.49 geopy 1.19.0 idna 2.8 idna-ssl 1.1.0 multidict 4.5.2 numpy 1.16.3 oauthlib 3.0.1 pandas 0.24.2 pip 19.1 pycares 3.0.0 pycparser 2.19 PySocks 1.6.8 python-dateutil 2.8.0 pytz 2019.1 requests 2.22.0 requests-oauthlib 1.2.0 schedule 0.6.0 setuptools 41.0.1 six 1.12.0 soupsieve 1.9.1 tweepy 3.7.0 twint 1.2.3 /home/james/app/twitter_crawler/src/twint typing 3.6.6 typing-extensions 3.7.2 urllib3 1.25.2 wheel 0.33.1 yarl 1.3.0

Twint utilizes Twitter's search operators to let you scrape Tweets from specific users, scrape Tweets relating to certain topics, hashtags & trends, or sort out sensitive information from Tweets like e-mail and phone numbers. I find this very useful, and you can get really creative with it too.

Twint also makes special queries to Twitter allowing you to also scrape a Twitter user's followers, Tweets a user has liked, and who they follow without any authentication, API, Selenium, or browser emulation.

Display Tweets by verified users that Tweeted about Trevor Noah.

twint -s "Trevor Noah" --verified

Scrape Tweets from a radius of 1 km around the Hofburg in Vienna export them to a csv file.

twint -g="48.2045507,16.3577661,1km" -o file.csv --csv

Collect Tweets published since 2019-10-11 20:30:15.

twint -u username --since "2019-10-11 21:30:15"

Resume a search starting from the last saved tweet in the provided file

twint -u username --resume file.csv

other

(Python Modules for Scraping:)

sxfmol/osint-twint.md