Skip to content

Instantly share code, notes, and snippets.

View seagatesoft's full-sized avatar

Sigit Dewanto seagatesoft

  • Sleman, Indonesia
View GitHub Profile
@seagatesoft
seagatesoft / bandung.py
Created May 29, 2021 04:20
Bandung.py Scrapy demo
from math import ceil
from scrapy import Request, Spider
class BookSpider(Spider):
name = 'books_toscrape_com'
start_urls = ['http://books.toscrape.com/']
def parse(self, response):
books = response.css("ol.row > li")
@seagatesoft
seagatesoft / memusg
Created February 21, 2017 06:57 — forked from netj/memusg
memusg -- Measure memory usage of processes
#!/usr/bin/env bash
# memusg -- Measure memory usage of processes
# Usage: memusg COMMAND [ARGS]...
#
# Author: Jaeho Shin <[email protected]>
# Created: 2010-08-16
############################################################################
# Copyright 2010 Jaeho Shin. #
# #
# Licensed under the Apache License, Version 2.0 (the "License"); #
import json
from scrapy.http import Request
from scrapy.selector import Selector
from forumbot.spiders.blogs import BlogSpider
from forumbot.spiders.mixins.livefyre import LivefyreMixin
from bot_engines.utils import error
from forumbot.items import BlogPostLoader, AuthorLoader
@seagatesoft
seagatesoft / challenge.html
Last active August 29, 2015 14:04
A simplified example of Javascript challenge page and how the spider handle it
<html>
<head>
<meta http-equiv="Pragma" content="no-cache"/>
<meta http-equiv="Expires" content="-1"/>
<meta http-equiv="CacheControl" content="no-cache"/>
<script type="text/javascript">
function challenge() {
var x = 17;
var y = 25;
var challenge_answer = (x * y) + 9;
@seagatesoft
seagatesoft / middlewares.py
Last active January 3, 2024 20:58
An example of RotateUserAgentMiddleware
from random import choice
from scrapy import signals
from scrapy.exceptions import NotConfigured
class RotateUserAgentMiddleware(object):
"""Rotate user-agent for each request."""
def __init__(self, user_agents):
self.enabled = False
self.user_agents = user_agents