Skip to content

Instantly share code, notes, and snippets.

View rmax's full-sized avatar
:octocat:
ヾ(⌐■_■)ノ♪

R Max Espinoza rmax

:octocat:
ヾ(⌐■_■)ノ♪
View GitHub Profile
@rmax
rmax / txspider.py
Last active February 15, 2024 17:00
Using twisted deferreds in a scrapy spider!
$ scrapy runspider txspider.py
2016-07-05 23:11:39 [scrapy] INFO: Scrapy 1.1.0 started (bot: scrapybot)
2016-07-05 23:11:39 [scrapy] INFO: Overridden settings: {}
2016-07-05 23:11:40 [scrapy] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats', 'scrapy.extensions.logstats.LogStats']
2016-07-05 23:11:40 [scrapy] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.retry.RetryMiddleware',
@rmax
rmax / interviewitems.MD
Last active August 6, 2019 15:53 — forked from amaxwell01/interviewitems.MD
My answers to over 100 Google interview questions

##Google Interview Questions: Product Marketing Manager

  • Why do you want to join Google?

  • What do you know about Google’s product and technology?

  • If you are Product Manager for Google’s Adwords, how do you plan to market this?

  • What would you say during an AdWords or AdSense product seminar?

  • Who are Google’s competitors, and how does Google compete with them?

  • Have you ever used Google’s products? Gmail?

  • What’s a creative way of marketing Google’s brand name and product?

  • If you are the product marketing manager for Google’s Gmail product, how do you plan to market it so as to achieve 100 million customers in 6 months?

2016-06-04 04:36:49+0000 [-] Log opened.
2016-06-04 04:36:49.933156 [-] Splash version: 2.1
2016-06-04 04:36:49.937837 [-] Qt 5.5.1, PyQt 5.5.1, WebKit 538.1, sip 4.17, Twisted 16.1.1, Lua 5.2
2016-06-04 04:36:49.938075 [-] Python 3.4.3 (default, Oct 14 2015, 20:28:29) [GCC 4.8.4]
2016-06-04 04:36:49.938282 [-] Open files limit: 1048576
2016-06-04 04:36:49.938430 [-] Can't bump open files limit
2016-06-04 04:36:50.046541 [-] Xvfb is started: ['Xvfb', ':1', '-screen', '0', '1024x768x24']
2016-06-04 04:36:50.213871 [-] proxy profiles support is enabled, proxy profiles path: /etc/splash/proxy-profiles
2016-06-04 04:36:50.383912 [-] verbosity=1
2016-06-04 04:36:50.384060 [-] slots=50
Traceback (most recent call last):
File "/Users/rolando/miniconda3/envs/tmp-splash/bin/scrapy", line 7, in <module>
from scrapy.cmdline import execute
File "/Users/rolando/miniconda3/envs/tmp-splash/lib/python2.7/site-packages/scrapy/__init__.py", line 34, in <module>
from scrapy.spiders import Spider
File "/Users/rolando/miniconda3/envs/tmp-splash/lib/python2.7/site-packages/scrapy/spiders/__init__.py", line 10, in <module>
from scrapy.http import Request
File "/Users/rolando/miniconda3/envs/tmp-splash/lib/python2.7/site-packages/scrapy/http/__init__.py", line 11, in <module>
from scrapy.http.request.form import FormRequest
File "/Users/rolando/miniconda3/envs/tmp-splash/lib/python2.7/site-packages/scrapy/http/request/form.py", line 9, in <module>
$ scrapy shell "http://ssd.jpl.nasa.gov/?planet_phys_par"
...
In [1]: data = []
In [2]: for row in response.xpath('//table[count(./tr) > 3 and count(./tr[1]/td) > 3]/tr'):
data.append([' '.join(filter(None, map(unicode.strip, td.css('::text').extract()))) for td in row.xpath('td')])
...:
In [3]: pd.DataFrame(data)
@rmax
rmax / myspider.py
Last active April 7, 2021 18:37
An example of a Scrapy spider returning a Twisted deferred.
from scrapy import Spider, Item, Field
from twisted.internet import defer, reactor
class MyItem(Item):
url = Field()
class MySpider(Spider):
@rmax
rmax / xpathfuncs.py
Last active August 29, 2015 14:04 — forked from shirk3y/lxml_has_class.py
"""XPath extension functions for lxml, inspired by:
https://gist.github.com/shirk3y/458224083ce5464627bc
Usage:
import xpathfuncs; xpathfuncs.setup()
"""
import string
@rmax
rmax / 01-square-detector.py
Last active December 29, 2015 07:59
Solution to Facebook's Hacker Cup 2014 first problem: Square Detector
#!/usr/bin/env python
# encoding: utf-8
"""Square Detector
https://www.facebook.com/hackercup/problems.php?pid=318555664954399&round=598486203541358
You want to write an image detection system that is able to recognize different geometric shapes. In the first version of the system you settled with just being able to detect filled squares on a grid.
You are given a grid of NxN square cells. Each cell is either white or black. Your task is to detect whether all the black cells form a square shape.
@rmax
rmax / hoja_doblada.py
Last active December 25, 2015 11:49
Calculo del área cubierta por una hoja doblada. Visualización en geogebra: http://www.geogebratube.org/student/m52908?mobile=true
"""Calculo del area cubierta por una hoja doblada."""
from __future__ import division
import math
import numpy as np
def solve(w, h, x1, x2):
"""Resuelve el calculo de area cubierta de una hoja doblada.
@requestGenerator
def parse_profile(self, response):
base_url = response.url
ul = UserLoader(response=response)
ul.add_xpath('name', '//h1[1]/text()')
ul.add_xpath('website', '//*[@rel="me" and @class="url"]/text()')
ul.add_xpath('location', '//*[@class="label adr"]/text()')
ul.add_value('url', base_url)
item = ul.load_item()