Created
March 16, 2021 21:02
-
-
Save jheasly/a923f02e9ed5ab7faf1336a88a3e59d9 to your computer and use it in GitHub Desktop.
Scrapy Oregon health inspection output
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
(open-health-inspection-scraper) bash-3.2$ ./scrapeHealthData.py | |
/Users/jpheasly/Development/open-health-inspection-scraper/scraper/spiders/healthspace_spider.py:206: SyntaxWarning: "is" with a literal. Did you mean "=="? | |
'critical': critical is "critical", | |
2021-03-16 15:49:34 [scrapy.utils.log] INFO: Scrapy 1.3.3 started (bot: scrapybot) | |
2021-03-16 15:49:34 [scrapy.utils.log] INFO: Overridden settings: {'DOWNLOAD_DELAY': 10, 'SPIDER_MODULES': ['scraper.spiders']} | |
2021-03-16 15:49:34 [scrapy.middleware] INFO: Enabled extensions: | |
['scrapy.extensions.corestats.CoreStats', | |
'scrapy.extensions.telnet.TelnetConsole', | |
'scrapy.extensions.logstats.LogStats', | |
'scrapy.extensions.spiderstate.SpiderState'] | |
2021-03-16 15:49:34 [scrapy.middleware] INFO: Enabled downloader middlewares: | |
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', | |
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', | |
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', | |
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', | |
'scrapy.downloadermiddlewares.retry.RetryMiddleware', | |
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', | |
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', | |
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', | |
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', | |
'scrapy.downloadermiddlewares.stats.DownloaderStats'] | |
2021-03-16 15:49:34 [scrapy.middleware] INFO: Enabled spider middlewares: | |
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', | |
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', | |
'scrapy.spidermiddlewares.referer.RefererMiddleware', | |
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', | |
'scrapy.spidermiddlewares.depth.DepthMiddleware'] | |
2021-03-16 15:49:34 [py.warnings] WARNING: /Users/jpheasly/Development/open-health-inspection-scraper/scraper/pipelines.py:87: SyntaxWarning: "is not" with a literal. Did you mean "!="? | |
if result['n'] is not 1: | |
2021-03-16 15:49:34 [py.warnings] WARNING: /Users/jpheasly/Development/open-health-inspection-scraper/scraper/pipelines.py:101: SyntaxWarning: "is not" with a literal. Did you mean "!="? | |
if result['n'] is not 1: | |
2021-03-16 15:49:34 [scrapy.middleware] INFO: Enabled item pipelines: | |
['scraper.pipelines.MongoDBPipeline'] | |
2021-03-16 15:49:34 [scrapy.core.engine] INFO: Spider opened | |
2021-03-16 15:49:34 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) | |
2021-03-16 15:49:34 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6024 | |
2021-03-16 15:49:35 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://healthspace.com/clients/oregon/state/statewebportal.nsf/module_healthRegions.xsp?showview=region> from <GET https://healthspace.com/clients/oregon/state/statewebportal.nsf/module_healthRegions.xsp?showview=region> | |
2021-03-16 15:49:49 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://healthspace.com/clients/oregon/state/statewebportal.nsf/module_healthRegions.xsp?showview=region> (referer: None) | |
2021-03-16 15:49:49 [scrapy.spidermiddlewares.offsite] DEBUG: Filtered offsite request to 'www.co.washington.or.us': <GET http://www.co.washington.or.us/HHS/EnvironmentalHealth/FoodSafety/web.nsf/module_facilities.xsp?module=Food> | |
2021-03-16 15:50:00 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://healthspace.com/Clients/Oregon/Benton/web.nsf/module_facilities.xsp?module=Food> from <GET https://healthspace.com/Clients/Oregon/Benton/web.nsf/module_facilities.xsp?module=Food> | |
2021-03-16 15:50:00 [scrapy.dupefilters] DEBUG: Filtered duplicate request: <GET https://healthspace.com/Clients/Oregon/Benton/web.nsf/module_facilities.xsp?module=Food> - no more duplicates will be shown (see DUPEFILTER_DEBUG to show all duplicates) | |
2021-03-16 15:50:15 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://healthspace.com/Clients/Oregon/Baker/web.nsf/module_facilities.xsp?module=Food> from <GET https://healthspace.com/Clients/Oregon/Baker/web.nsf/module_facilities.xsp?module=Food> | |
2021-03-16 15:50:26 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://healthspace.com/Clients/Oregon/HoodRiver/web.nsf/module_facilities.xsp?module=Food> from <GET https://healthspace.com/Clients/Oregon/HoodRiver/web.nsf/module_facilities.xsp?module=Food> | |
2021-03-16 15:50:34 [scrapy.extensions.logstats] INFO: Crawled 1 pages (at 1 pages/min), scraped 0 items (at 0 items/min) | |
2021-03-16 15:50:40 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://healthspace.com/Clients/Oregon/Harney/web.nsf/module_facilities.xsp?module=Food> from <GET https://healthspace.com/Clients/Oregon/Harney/web.nsf/module_facilities.xsp?module=Food> | |
2021-03-16 15:50:54 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://healthspace.com/Clients/Oregon/Grant/web.nsf/module_facilities.xsp?module=Food> from <GET https://healthspace.com/Clients/Oregon/Grant/web.nsf/module_facilities.xsp?module=Food> | |
2021-03-16 15:51:07 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://healthspace.com/Clients/Oregon/Douglas/web.nsf/module_facilities.xsp?module=Food> from <GET https://healthspace.com/Clients/Oregon/Douglas/web.nsf/module_facilities.xsp?module=Food> | |
2021-03-16 15:51:19 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://healthspace.com/Clients/Oregon/Deschutes/web.nsf/module_facilities.xsp?module=Food> from <GET https://healthspace.com/Clients/Oregon/Deschutes/web.nsf/module_facilities.xsp?module=Food> | |
2021-03-16 15:51:31 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://healthspace.com/Clients/Oregon/Curry/web.nsf/module_facilities.xsp?module=Food> from <GET https://healthspace.com/Clients/Oregon/Curry/web.nsf/module_facilities.xsp?module=Food> | |
2021-03-16 15:51:34 [scrapy.extensions.logstats] INFO: Crawled 1 pages (at 0 pages/min), scraped 0 items (at 0 items/min) | |
2021-03-16 15:51:47 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://healthspace.com/Clients/Oregon/Crook/web.nsf/module_facilities.xsp?module=Food> from <GET https://healthspace.com/Clients/Oregon/Crook/web.nsf/module_facilities.xsp?module=Food> | |
2021-03-16 15:51:57 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://healthspace.com/Clients/Oregon/Coos/web.nsf/module_facilities.xsp?module=Food> from <GET https://healthspace.com/Clients/Oregon/Coos/web.nsf/module_facilities.xsp?module=Food> | |
2021-03-16 15:52:07 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://healthspace.com/Clients/Oregon/Columbia/web.nsf/module_facilities.xsp?module=Food> from <GET https://healthspace.com/Clients/Oregon/Columbia/web.nsf/module_facilities.xsp?module=Food> | |
2021-03-16 15:52:17 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://healthspace.com/Clients/Oregon/Clatsop/web.nsf/module_facilities.xsp?module=Food> from <GET https://healthspace.com/Clients/Oregon/Clatsop/web.nsf/module_facilities.xsp?module=Food> | |
2021-03-16 15:52:28 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://healthspace.com/Clients/Oregon/Clackamas/web.nsf/module_facilities.xsp?module=Food> from <GET https://healthspace.com/Clients/Oregon/Clackamas/web.nsf/module_facilities.xsp?module=Food> | |
2021-03-16 15:52:34 [scrapy.extensions.logstats] INFO: Crawled 1 pages (at 0 pages/min), scraped 0 items (at 0 items/min) | |
2021-03-16 15:52:44 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://healthspace.com/Clients/Oregon/NorthCentral/web.nsf/module_facilities.xsp?module=Food> from <GET https://healthspace.com/Clients/Oregon/NorthCentral/web.nsf/module_facilities.xsp?module=Food> | |
2021-03-16 15:52:56 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://healthspace.com/Clients/Oregon/Multnomah/web.nsf/module_facilities.xsp?module=Food> from <GET https://healthspace.com/Clients/Oregon/Multnomah/web.nsf/module_facilities.xsp?module=Food> | |
2021-03-16 15:53:08 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://healthspace.com/Clients/Oregon/Morrow/web.nsf/module_facilities.xsp?module=Food> (referer: https://healthspace.com/clients/oregon/state/statewebportal.nsf/module_healthRegions.xsp?showview=region) | |
2021-03-16 15:53:08 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://healthspace.com/Clients/Oregon/Morrow/web.nsf/module_facilities.xsp?module=Food>: HTTP status code is not handled or not allowed | |
2021-03-16 15:53:21 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://healthspace.com/Clients/Oregon/Yamhill/web.nsf/module_facilities.xsp?module=Food> from <GET https://healthspace.com/Clients/Oregon/Yamhill/web.nsf/module_facilities.xsp?module=Food> | |
2021-03-16 15:53:33 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://healthspace.com/Clients/Oregon/Wheeler/web.nsf/module_facilities.xsp?module=Food> from <GET https://healthspace.com/Clients/Oregon/Wheeler/web.nsf/module_facilities.xsp?module=Food> | |
2021-03-16 15:53:34 [scrapy.extensions.logstats] INFO: Crawled 2 pages (at 1 pages/min), scraped 0 items (at 0 items/min) | |
2021-03-16 15:53:47 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://healthspace.com/Clients/Oregon/Wallowa/web.nsf/module_facilities.xsp?module=Food> from <GET https://healthspace.com/Clients/Oregon/Wallowa/web.nsf/module_facilities.xsp?module=Food> | |
2021-03-16 15:54:01 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://healthspace.com/Clients/Oregon/Union/web.nsf/module_facilities.xsp?module=Food> from <GET https://healthspace.com/Clients/Oregon/Union/web.nsf/module_facilities.xsp?module=Food> | |
2021-03-16 15:54:13 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://healthspace.com/Clients/Oregon/Umatilla/web.nsf/module_facilities.xsp?module=Food> from <GET https://healthspace.com/Clients/Oregon/Umatilla/web.nsf/module_facilities.xsp?module=Food> | |
2021-03-16 15:54:26 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://healthspace.com/Clients/Oregon/Tillamook/web.nsf/module_facilities.xsp?module=Food> from <GET https://healthspace.com/Clients/Oregon/Tillamook/web.nsf/module_facilities.xsp?module=Food> | |
2021-03-16 15:54:34 [scrapy.extensions.logstats] INFO: Crawled 2 pages (at 0 pages/min), scraped 0 items (at 0 items/min) | |
2021-03-16 15:54:38 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://healthspace.com/Clients/Oregon/Polk/web.nsf/module_facilities.xsp?module=Food> from <GET https://healthspace.com/Clients/Oregon/Polk/web.nsf/module_facilities.xsp?module=Food> | |
2021-03-16 15:54:50 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://healthspace.com/Clients/Oregon/Marion/web.nsf/module_facilities.xsp?module=Food> from <GET https://healthspace.com/Clients/Oregon/Marion/web.nsf/module_facilities.xsp?module=Food> | |
2021-03-16 15:54:57 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://healthspace.com/Clients/Oregon/Malheur/web.nsf/module_facilities.xsp?module=Food> from <GET https://healthspace.com/Clients/Oregon/Malheur/web.nsf/module_facilities.xsp?module=Food> | |
2021-03-16 15:55:11 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://healthspace.com/Clients/Oregon/Linn/web.nsf/module_facilities.xsp?module=Food> from <GET https://healthspace.com/Clients/Oregon/Linn/web.nsf/module_facilities.xsp?module=Food> | |
2021-03-16 15:55:25 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://healthspace.com/Clients/Oregon/Lincoln/web.nsf/module_facilities.xsp?module=Food> from <GET https://healthspace.com/Clients/Oregon/Lincoln/web.nsf/module_facilities.xsp?module=Food> | |
2021-03-16 15:55:34 [scrapy.extensions.logstats] INFO: Crawled 2 pages (at 0 pages/min), scraped 0 items (at 0 items/min) | |
2021-03-16 15:55:36 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://healthspace.com/Clients/Oregon/Lane/web.nsf/module_facilities.xsp?module=Food> from <GET https://healthspace.com/Clients/Oregon/Lane/web.nsf/module_facilities.xsp?module=Food> | |
2021-03-16 15:55:52 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://healthspace.com/Clients/Oregon/Lake/web.nsf/module_facilities.xsp?module=Food> from <GET https://healthspace.com/Clients/Oregon/Lake/web.nsf/module_facilities.xsp?module=Food> | |
2021-03-16 15:56:03 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://healthspace.com/Clients/Oregon/Klamath/web.nsf/module_facilities.xsp?module=Food> from <GET https://healthspace.com/Clients/Oregon/Klamath/web.nsf/module_facilities.xsp?module=Food> | |
2021-03-16 15:56:19 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://healthspace.com/Clients/Oregon/Josephine/web.nsf/module_facilities.xsp?module=Food> from <GET https://healthspace.com/Clients/Oregon/Josephine/web.nsf/module_facilities.xsp?module=Food> | |
2021-03-16 15:56:34 [scrapy.extensions.logstats] INFO: Crawled 2 pages (at 0 pages/min), scraped 0 items (at 0 items/min) | |
2021-03-16 15:56:35 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://healthspace.com/Clients/Oregon/Jefferson/web.nsf/module_facilities.xsp?module=Food> from <GET https://healthspace.com/Clients/Oregon/Jefferson/web.nsf/module_facilities.xsp?module=Food> | |
2021-03-16 15:56:46 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://healthspace.com/Clients/Oregon/Jackson/web.nsf/module_facilities.xsp?module=Food> from <GET https://healthspace.com/Clients/Oregon/Jackson/web.nsf/module_facilities.xsp?module=Food> | |
2021-03-16 15:56:46 [scrapy.core.engine] INFO: Closing spider (finished) | |
2021-03-16 15:56:46 [scrapy.statscollectors] INFO: Dumping Scrapy stats: | |
{'downloader/request_bytes': 13574, | |
'downloader/request_count': 35, | |
'downloader/request_method_count/GET': 35, | |
'downloader/response_bytes': 17678, | |
'downloader/response_count': 35, | |
'downloader/response_status_count/200': 1, | |
'downloader/response_status_count/302': 33, | |
'downloader/response_status_count/404': 1, | |
'dupefilter/filtered': 32, | |
'finish_reason': 'finished', | |
'finish_time': datetime.datetime(2021, 3, 16, 20, 56, 46, 437621), | |
'log_count/DEBUG': 38, | |
'log_count/INFO': 15, | |
'log_count/WARNING': 2, | |
'offsite/domains': 1, | |
'offsite/filtered': 1, | |
'request_depth_max': 1, | |
'response_received_count': 2, | |
'scheduler/dequeued': 35, | |
'scheduler/dequeued/disk': 35, | |
'scheduler/enqueued': 35, | |
'scheduler/enqueued/disk': 35, | |
'start_time': datetime.datetime(2021, 3, 16, 20, 49, 34, 336531)} | |
2021-03-16 15:56:46 [scrapy.core.engine] INFO: Spider closed (finished) | |
(open-health-inspection-scraper) bash-3.2$ ./scrapeHealthData.py |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment