SRatna · August 16, 2017 11:39
diff --git a/scrapySetup.txt b/scrapySetup.txt
 I used anaconda as my python vendor.
 So first install anaconda.
 Then set path in .profile in home directory.
 --> PATH="$HOME/bin:$HOME/.local/bin:$HOME/anaconda3/bin:$PATH"
 then create a virtual environment using conda as 
 :  conda create --name your_name_for_env
 this will create named env directory inside anaconda3/envs
 do source activate your_name_for_env

 then pip install scrapy
 scrapy startproject project_name
 cd project_name
 scrapy genspider spider_name website_address
 scrapy list --to see list of defined spiders
 --to test errors before crawling pages
 scrapy crawl name_of_spider
 --solved error in setting.py -- change robots.txt from true to false

 scrapy shell----
 fetch('name_of_website')
 response.body --- to see response
 view(response) --- to view response in browser

 we can use both css and xpath to extrat any node
 example
 response.css('h1')
 response.xpath('//h1')

 response.xpath('//h1/a/text()').extract()
 response.xpath('//h1/a/text()').extract_first()

 response.xpath('//*[@class="tag-item"]')
 response.css('.tag-item')
 response.xpath('//*[@class="tag-item"]/a/text()').extract()

 note doubleslashes '//' means search everywhere and return all instances
	I used anaconda as my python vendor.
	So first install anaconda.
	Then set path in .profile in home directory.
	--> PATH="$HOME/bin:$HOME/.local/bin:$HOME/anaconda3/bin:$PATH"
	then create a virtual environment using conda as
	: conda create --name your_name_for_env
	this will create named env directory inside anaconda3/envs
	do source activate your_name_for_env

	then pip install scrapy
	scrapy startproject project_name
	cd project_name
	scrapy genspider spider_name website_address
	scrapy list --to see list of defined spiders
	--to test errors before crawling pages
	scrapy crawl name_of_spider
	--solved error in setting.py -- change robots.txt from true to false

	scrapy shell----
	fetch('name_of_website')
	response.body --- to see response
	view(response) --- to view response in browser

	we can use both css and xpath to extrat any node
	example
	response.css('h1')
	response.xpath('//h1')

	response.xpath('//h1/a/text()').extract()
	response.xpath('//h1/a/text()').extract_first()

	response.xpath('//*[@class="tag-item"]')
	response.css('.tag-item')
	response.xpath('//*[@class="tag-item"]/a/text()').extract()

	note doubleslashes '//' means search everywhere and return all instances