Last active
November 10, 2022 13:14
-
-
Save ccckmit/f062ea7613f804050088042deed95786 to your computer and use it in GitHub Desktop.
scrapy crawl quotes
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# scrapy crawl quotes | |
``` | |
Hero3C@DESKTOP-O093POU MINGW64 /c/tim/prog10_4 | |
$ scrapy crawl quotes | |
2022-11-10 21:04:58 [scrapy.utils.log] INFO: Scrapy 2.7.1 started (bot: prog10_4) | |
2022-11-10 21:04:58 [scrapy.utils.log] INFO: Versions: lxml 4.9.1.0, | |
libxml2 2.9.12, cssselect 1.2.0, parsel 1.7.0, w3lib 2.0.1, Twisted 22.10.0, Python 3.10.8 (tags/v3.10.8:aaaf517, Oct 11 2022, 16:50:30) [MSC v.1933 64 bit (AMD64)], pyOpenSSL 22.1.0 (OpenSSL 3.0.7 1 Nov 2022), cryptography 38.0.3, Platform Windows-10-10.0.19043-SP0 | |
2022-11-10 21:04:58 [scrapy.crawler] INFO: Overridden settings: | |
{'BOT_NAME': 'prog10_4', | |
'NEWSPIDER_MODULE': 'prog10_4.spiders', | |
'ROBOTSTXT_OBEY': True, | |
'SPIDER_MODULES': ['prog10_4.spiders']} | |
2022-11-10 21:04:58 [py.warnings] WARNING: C:\Users\Hero3C\AppData\Local\Programs\Python\Python310\lib\site-packages\scrapy\utils\request.py:231: ScrapyDeprecationWarning: '2.6' is a deprecated value for the 'REQUEST_FINGERPRINTER_IMPLEMENTATION' setting. | |
It is also the default value. In other words, it is normal to get this warning if you have not defined a value for the 'REQUEST_FINGERPRINTER_IMPLEMENTATION' setting. This is so for backward compatibility reasons, but it will change in a future version of Scrapy. | |
See the documentation of the 'REQUEST_FINGERPRINTER_IMPLEMENTATION' setting for information on how to handle this deprecation. | |
return cls(crawler) | |
2022-11-10 21:04:58 [scrapy.utils.log] DEBUG: Using reactor: twisted.internet.selectreactor.SelectReactor | |
2022-11-10 21:04:58 [scrapy.extensions.telnet] INFO: Telnet Password: c026ecd4fa77d643 | |
2022-11-10 21:04:58 [scrapy.middleware] INFO: Enabled extensions: | |
['scrapy.extensions.corestats.CoreStats', | |
'scrapy.extensions.telnet.TelnetConsole', | |
'scrapy.extensions.logstats.LogStats'] | |
2022-11-10 21:04:58 [scrapy.middleware] INFO: Enabled downloader middlewares: | |
['scrapy.downloadermiddlewares.robotstxt.RobotsTxtMiddleware', | |
'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', | |
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', | |
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', | |
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', | |
'scrapy.downloadermiddlewares.retry.RetryMiddleware', | |
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', | |
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', | |
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', | |
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', | |
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', | |
'scrapy.downloadermiddlewares.stats.DownloaderStats'] | |
2022-11-10 21:04:58 [scrapy.middleware] INFO: Enabled spider middlewares: | |
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', | |
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', | |
'scrapy.spidermiddlewares.referer.RefererMiddleware', | |
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', | |
'scrapy.spidermiddlewares.depth.DepthMiddleware'] | |
2022-11-10 21:04:59 [scrapy.middleware] INFO: Enabled item pipelines:['prog10_4.pipelines.Prog104Pipeline'] | |
2022-11-10 21:04:59 [scrapy.core.engine] INFO: Spider opened | |
2022-11-10 21:04:59 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) | |
2022-11-10 21:04:59 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023 | |
2022-11-10 21:05:00 [scrapy.core.engine] DEBUG: Crawled (404) <GET http://quotes.toscrape.com/robots.txt> (referer: None) | |
2022-11-10 21:05:00 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/> (referer: None) | |
2022-11-10 21:05:01 [scrapy.dupefilters] DEBUG: Filtered duplicate request: <GET http://quotes.toscrape.com/author/Albert-Einstein> - no more duplicates will be shown (see DUPEFILTER_DEBUG to show all duplicates) | |
2022-11-10 21:05:01 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (308) to <GET http://quotes.toscrape.com/author/Jane-Austen/> from <GET http://quotes.toscrape.com/author/Jane-Austen> | |
2022-11-10 21:05:01 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (308) to <GET http://quotes.toscrape.com/author/Andre-Gide/> from <GET http://quotes.toscrape.com/author/Andre-Gide> | |
2022-11-10 21:05:01 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (308) to <GET http://quotes.toscrape.com/author/Eleanor-Roosevelt/> from <GET http://quotes.toscrape.com/author/Eleanor-Roosevelt> | |
2022-11-10 21:05:01 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (308) to <GET http://quotes.toscrape.com/author/Thomas-A-Edison/> from <GET http://quotes.toscrape.com/author/Thomas-A-Edison> | |
2022-11-10 21:05:01 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (308) to <GET http://quotes.toscrape.com/author/Marilyn-Monroe/> from <GET http://quotes.toscrape.com/author/Marilyn-Monroe> | |
2022-11-10 21:05:01 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (308) to <GET http://quotes.toscrape.com/author/Steve-Martin/> from <GET http://quotes.toscrape.com/author/Steve-Martin> | |
2022-11-10 21:05:01 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (308) to <GET http://quotes.toscrape.com/author/Albert-Einstein/> from <GET http://quotes.toscrape.com/author/Albert-Einstein> | |
2022-11-10 21:05:01 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (308) to <GET http://quotes.toscrape.com/author/J-K-Rowling/> from <GET http://quotes.toscrape.com/author/J-K-Rowling> | |
2022-11-10 21:05:01 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/page/2/> (referer: http://quotes.toscrape.com/) | |
2022-11-10 21:05:01 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/author/Andre-Gide/> (referer: http://quotes.toscrape.com/) | |
2022-11-10 21:05:01 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/author/Thomas-A-Edison/> (referer: http://quotes.toscrape.com/) | |
2022-11-10 21:05:01 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/author/Jane-Austen/> (referer: http://quotes.toscrape.com/) | |
2022-11-10 21:05:01 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/author/Eleanor-Roosevelt/> (referer: http://quotes.toscrape.com/) | |
2022-11-10 21:05:01 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/author/Marilyn-Monroe/> (referer: http://quotes.toscrape.com/) | |
2022-11-10 21:05:01 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/author/J-K-Rowling/> (referer: http://quotes.toscrape.com/) | |
2022-11-10 21:05:01 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/author/Steve-Martin/> (referer: http://quotes.toscrape.com/) | |
2022-11-10 21:05:32 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/author/Andre-Gide/> | |
{'author': 'André Gide', | |
'birthday': 'November 22, 1869', | |
'bornplace': 'in Paris, France', | |
'quote': '“It is better to be hated for what you are than to be loved for ' | |
'what you are not.”'} | |
2022-11-10 21:06:02 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/author/Thomas-A-Edison/> | |
{'author': 'Thomas A. Edison', | |
'birthday': 'February 11, 1847', | |
'bornplace': 'in Milan, Ohio, The United States', | |
'quote': "“I have not failed. I've just found 10,000 ways that won't work.”"} | |
2022-11-10 21:06:02 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/author/Albert-Einstein/> (referer: http://quotes.toscrape.com/) | |
2022-11-10 21:06:33 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/author/Jane-Austen/> | |
{'author': 'Jane Austen', | |
'birthday': 'December 16, 1775', | |
'bornplace': 'in Steventon Rectory, Hampshire, The United Kingdom', | |
'quote': '“The person, be it gentleman or lady, who has not pleasure in a ' | |
'good novel, must be intolerably stupid.”'} | |
2022-11-10 21:06:33 [scrapy.extensions.logstats] INFO: Crawled 11 pages (at 11 pages/min), scraped 3 items (at 3 items/min) | |
2022-11-10 21:07:03 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/author/Eleanor-Roosevelt/> | |
{'author': 'Eleanor Roosevelt', | |
'birthday': 'October 11, 1884', | |
'bornplace': 'in The United States', | |
'quote': '“A woman is like a tea bag; you never know how strong it is until ' | |
"it's in hot water.”"} | |
2022-11-10 21:07:33 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/author/Marilyn-Monroe/> | |
{'author': 'Marilyn Monroe', | |
'birthday': 'June 01, 1926', | |
'bornplace': 'in The United States', | |
'quote': "“Imperfection is beauty, madness is genius and it's better to be " | |
'absolutely ridiculous than absolutely boring.”'} | |
2022-11-10 21:08:04 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/author/J-K-Rowling/> | |
{'author': 'J.K. Rowling', | |
'birthday': 'July 31, 1965', | |
'bornplace': 'in Yate, South Gloucestershire, England, The United Kingdom', | |
'quote': '“It is our choices, Harry, that show what we truly are, far more ' | |
'than our abilities.”'} | |
2022-11-10 21:08:34 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/author/Steve-Martin/> | |
{'author': 'Steve Martin', | |
'birthday': 'August 14, 1945', | |
'bornplace': 'in Waco, Texas, The United States', | |
'quote': '“A day without sunshine is like, you know, night.”'} | |
2022-11-10 21:08:34 [scrapy.extensions.logstats] INFO: Crawled 11 pages (at 0 pages/min), scraped 7 items (at 4 items/min) | |
2022-11-10 21:09:04 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/author/Albert-Einstein/> | |
{'author': 'Albert Einstein', | |
'birthday': 'March 14, 1879', | |
'bornplace': 'in Ulm, Germany', | |
'quote': '“The world as we have created it is a process of our thinking. It ' | |
'cannot be changed without changing our thinking.”'} | |
2022-11-10 21:09:04 [scrapy.extensions.logstats] INFO: Crawled 11 pages (at 0 pages/min), scraped 8 items (at 1 items/min) | |
2022-11-10 21:09:04 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET http://quotes.toscrape.com/author/Douglas-Adams> (failed 1 times): User timeout caused connection failure: Getting http://quotes.toscrape.com/author/Douglas-Adams took longer than 180.0 seconds.. | |
2022-11-10 21:09:04 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET http://quotes.toscrape.com/author/Dr-Seuss> (failed 1 times): User timeout caused connection failure: Getting http://quotes.toscrape.com/author/Dr-Seuss took longer than 180.0 seconds.. | |
2022-11-10 21:09:04 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET http://quotes.toscrape.com/author/Bob-Marley> (failed 1 times): User timeout caused connection failure: Getting http://quotes.toscrape.com/author/Bob-Marley took longer than 180.0 seconds.. | |
2022-11-10 21:09:04 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (308) to <GET http://quotes.toscrape.com/author/Allen-Saunders/> from <GET http://quotes.toscrape.com/author/Allen-Saunders> | |
2022-11-10 21:09:04 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (308) to <GET http://quotes.toscrape.com/author/Friedrich-Nietzsche/> from <GET http://quotes.toscrape.com/author/Friedrich-Nietzsche> | |
2022-11-10 21:09:04 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/page/1/> (referer: http://quotes.toscrape.com/page/2/) | |
2022-11-10 21:09:04 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (308) to <GET http://quotes.toscrape.com/author/Mark-Twain/> from <GET http://quotes.toscrape.com/author/Mark-Twain> | |
2022-11-10 21:09:04 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (308) to <GET http://quotes.toscrape.com/author/Elie-Wiesel/> from <GET http://quotes.toscrape.com/author/Elie-Wiesel> | |
2022-11-10 21:09:05 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/author/Friedrich-Nietzsche/> (referer: http://quotes.toscrape.com/page/2/) | |
2022-11-10 21:09:05 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/author/Allen-Saunders/> (referer: http://quotes.toscrape.com/page/2/) | |
2022-11-10 21:09:05 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (308) to <GET http://quotes.toscrape.com/author/Dr-Seuss/> | |
from <GET http://quotes.toscrape.com/author/Dr-Seuss> | |
2022-11-10 21:09:05 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/author/Mark-Twain/> (referer: http://quotes.toscrape.com/page/2/) | |
2022-11-10 21:09:05 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (308) to <GET http://quotes.toscrape.com/author/Bob-Marley/> from <GET http://quotes.toscrape.com/author/Bob-Marley> | |
2022-11-10 21:09:05 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (308) to <GET http://quotes.toscrape.com/author/Douglas-Adams/> from <GET http://quotes.toscrape.com/author/Douglas-Adams> | |
2022-11-10 21:09:35 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/author/Friedrich-Nietzsche/> | |
{'author': 'Friedrich Nietzsche', | |
'birthday': 'October 15, 1844', | |
'bornplace': 'in Röcken bei Lützen, Prussian Province of Saxony, Germany', | |
'quote': '“It is not a lack of love, but a lack of friendship that makes ' | |
'unhappy marriages.”'} | |
2022-11-10 21:09:35 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/author/Dr-Seuss/> (referer: http://quotes.toscrape.com/page/2/) | |
2022-11-10 21:09:35 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/author/Bob-Marley/> (referer: http://quotes.toscrape.com/page/2/) | |
2022-11-10 21:09:35 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/author/Elie-Wiesel/> (referer: http://quotes.toscrape.com/page/2/) | |
2022-11-10 21:09:35 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/author/Douglas-Adams/> (referer: http://quotes.toscrape.com/page/2/) | |
2022-11-10 21:10:05 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/author/Allen-Saunders/> | |
{'author': 'Allen Saunders', | |
'birthday': 'April 24, 1899', | |
'bornplace': 'in The United States', | |
'quote': '“Life is what happens to us while we are making other plans.”'} | |
2022-11-10 21:10:36 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/author/Mark-Twain/> | |
{'author': 'Mark Twain', | |
'birthday': 'November 30, 1835', | |
'bornplace': 'in Florida, Missouri, The United States', | |
'quote': '“Good friends, good books, and a sleepy conscience: this is the ' | |
'ideal life.”'} | |
2022-11-10 21:10:36 [scrapy.extensions.logstats] INFO: Crawled 19 pages (at 8 pages/min), scraped 11 items (at 3 items/min) | |
2022-11-10 21:11:06 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/author/Dr-Seuss/> | |
{'author': 'Dr. Seuss', | |
'birthday': 'March 02, 1904', | |
'bornplace': 'in Springfield, MA, The United States', | |
'quote': '“I like nonsense, it wakes up the brain cells. Fantasy is | |
a ' | |
'necessary ingredient in living.”'} | |
2022-11-10 21:11:36 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/author/Bob-Marley/> | |
{'author': 'Bob Marley', | |
'birthday': 'February 06, 1945', | |
'bornplace': 'in Nine Mile, Saint Ann, Jamaica', | |
'quote': '“You may not be her first, her last, or her only. She loved before ' | |
'she may love again. But if she loves you now, what else matters? ' | |
"She's not perfect—you aren't either, and the two of you may never " | |
'be perfect together but if she can make you laugh, cause you to ' | |
'think twice, and admit to being human and making mistakes, hold ' | |
'onto her and give her the most you can. She may not be thinking ' | |
'about you every second of the day, but she will give you a part of ' | |
"her that she knows you can break—her heart. So don't hurt | |
her, " | |
"don't change her, don't analyze and don't expect more than she can " | |
'give. Smile when she makes you happy, let her know when she makes ' | |
"you mad, and miss her when she's not there.”"} | |
2022-11-10 21:12:07 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/author/Elie-Wiesel/> | |
{'author': 'Elie Wiesel', | |
'birthday': 'September 30, 1928', | |
'bornplace': 'in Sighet, Romania', | |
'quote': "“The opposite of love is not hate, it's indifference. The | |
opposite " | |
"of art is not ugliness, it's indifference. The opposite of faith is " | |
"not heresy, it's indifference. And the opposite of life is not " | |
"death, it's indifference.”"} | |
2022-11-10 21:12:37 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/author/Douglas-Adams/> | |
{'author': 'Douglas Adams', | |
'birthday': 'March 11, 1952', | |
'bornplace': 'in Cambridge, England, The United Kingdom', | |
'quote': '“I may not have gone where I intended to go, but I think I have ' | |
'ended up where I needed to be.”'} | |
2022-11-10 21:12:37 [scrapy.extensions.logstats] INFO: Crawled 19 pages (at 0 pages/min), scraped 15 items (at 4 items/min) | |
2022-11-10 21:12:37 [scrapy.core.engine] INFO: Closing spider (finished) | |
2022-11-10 21:12:37 [scrapy.statscollectors] INFO: Dumping Scrapy stats: | |
{'downloader/exception_count': 3, | |
'downloader/exception_type_count/twisted.internet.error.TimeoutError': 3, | |
'downloader/request_bytes': 10240, | |
'downloader/request_count': 37, | |
'downloader/request_method_count/GET': 37, | |
'downloader/response_bytes': 93932, | |
'downloader/response_count': 34, | |
'downloader/response_status_count/200': 18, | |
'downloader/response_status_count/308': 15, | |
'downloader/response_status_count/404': 1, | |
'dupefilter/filtered': 16, | |
'elapsed_time_seconds': 458.005964, | |
'finish_reason': 'finished', | |
'finish_time': datetime.datetime(2022, 11, 10, 13, 12, 37, 535270), | |
'item_scraped_count': 15, | |
'log_count/DEBUG': 54, | |
'log_count/INFO': 15, | |
'log_count/WARNING': 1, | |
'request_depth_max': 3, | |
'response_received_count': 19, | |
'retry/count': 3, | |
'retry/reason_count/twisted.internet.error.TimeoutError': 3, | |
'robotstxt/request_count': 1, | |
'robotstxt/response_count': 1, | |
'robotstxt/response_status_count/404': 1, | |
'scheduler/dequeued': 36, | |
'scheduler/dequeued/memory': 36, | |
'scheduler/enqueued': 36, | |
'scheduler/enqueued/memory': 36, | |
'start_time': datetime.datetime(2022, 11, 10, 13, 4, 59, 529306)} | |
2022-11-10 21:12:37 [scrapy.core.engine] INFO: Spider closed (finished) | |
``` |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment