Skip to content

Instantly share code, notes, and snippets.

View rmax's full-sized avatar
:octocat:
ヾ(⌐■_■)ノ♪

R Max Espinoza rmax

:octocat:
ヾ(⌐■_■)ノ♪
View GitHub Profile
$ scrapy shell "http://ssd.jpl.nasa.gov/?planet_phys_par"
...
In [1]: data = []
In [2]: for row in response.xpath('//table[count(./tr) > 3 and count(./tr[1]/td) > 3]/tr'):
data.append([' '.join(filter(None, map(unicode.strip, td.css('::text').extract()))) for td in row.xpath('td')])
...:
In [3]: pd.DataFrame(data)
Traceback (most recent call last):
File "/Users/rolando/miniconda3/envs/tmp-splash/bin/scrapy", line 7, in <module>
from scrapy.cmdline import execute
File "/Users/rolando/miniconda3/envs/tmp-splash/lib/python2.7/site-packages/scrapy/__init__.py", line 34, in <module>
from scrapy.spiders import Spider
File "/Users/rolando/miniconda3/envs/tmp-splash/lib/python2.7/site-packages/scrapy/spiders/__init__.py", line 10, in <module>
from scrapy.http import Request
File "/Users/rolando/miniconda3/envs/tmp-splash/lib/python2.7/site-packages/scrapy/http/__init__.py", line 11, in <module>
from scrapy.http.request.form import FormRequest
File "/Users/rolando/miniconda3/envs/tmp-splash/lib/python2.7/site-packages/scrapy/http/request/form.py", line 9, in <module>
2016-06-04 04:36:49+0000 [-] Log opened.
2016-06-04 04:36:49.933156 [-] Splash version: 2.1
2016-06-04 04:36:49.937837 [-] Qt 5.5.1, PyQt 5.5.1, WebKit 538.1, sip 4.17, Twisted 16.1.1, Lua 5.2
2016-06-04 04:36:49.938075 [-] Python 3.4.3 (default, Oct 14 2015, 20:28:29) [GCC 4.8.4]
2016-06-04 04:36:49.938282 [-] Open files limit: 1048576
2016-06-04 04:36:49.938430 [-] Can't bump open files limit
2016-06-04 04:36:50.046541 [-] Xvfb is started: ['Xvfb', ':1', '-screen', '0', '1024x768x24']
2016-06-04 04:36:50.213871 [-] proxy profiles support is enabled, proxy profiles path: /etc/splash/proxy-profiles
2016-06-04 04:36:50.383912 [-] verbosity=1
2016-06-04 04:36:50.384060 [-] slots=50
@rmax
rmax / interviewitems.MD
Last active August 6, 2019 15:53 — forked from amaxwell01/interviewitems.MD
My answers to over 100 Google interview questions

##Google Interview Questions: Product Marketing Manager

  • Why do you want to join Google?

  • What do you know about Google’s product and technology?

  • If you are Product Manager for Google’s Adwords, how do you plan to market this?

  • What would you say during an AdWords or AdSense product seminar?

  • Who are Google’s competitors, and how does Google compete with them?

  • Have you ever used Google’s products? Gmail?

  • What’s a creative way of marketing Google’s brand name and product?

  • If you are the product marketing manager for Google’s Gmail product, how do you plan to market it so as to achieve 100 million customers in 6 months?

@rmax
rmax / txspider.py
Last active February 15, 2024 17:00
Using twisted deferreds in a scrapy spider!
$ scrapy runspider txspider.py
2016-07-05 23:11:39 [scrapy] INFO: Scrapy 1.1.0 started (bot: scrapybot)
2016-07-05 23:11:39 [scrapy] INFO: Overridden settings: {}
2016-07-05 23:11:40 [scrapy] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats', 'scrapy.extensions.logstats.LogStats']
2016-07-05 23:11:40 [scrapy] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.retry.RetryMiddleware',
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
import pandas as pd
import numpy as np
#setting up a comparable dataframe
df = pd.DataFrame(np.random.randint(20,100,size=(50, 4)), columns=['A','B','C','D'])
#these two columns become a multi-column index
df['year_idx'] = np.random.randint(2000,2004,50)
df['id_idx'] = np.random.randint(10000,19999,50)
df.drop_duplicates(subset=['year_idx','id_idx'],inplace=True)
@rmax
rmax / sqlite-kv-restful.py
Created August 13, 2016 17:11 — forked from georgepsarakis/sqlite-kv-restful.py
Simple SQLite-backed key-value storage Rest API. Built with Flask & flask-restful.
import os
import sqlite3
from hashlib import md5
from time import time
import simplejson as json
from flask import Flask
from flask.ext import restful
from flask import g
from flask import request
@rmax
rmax / demo.py
Created September 28, 2016 15:13
settings = {}
bot = scrapy.CrawlerBot(name="mybot/1.0", settings=settings)
def follow_links(response):
for link in response.iter_links():
bot.crawl(link.url, callback=follow_links, referer=response)
bot.emit({
"url": response.url,
"status": response.status,
/home/rolando/miniconda3/envs/datascience/lib/python3.5/site-packages/distributed/protocol/pickle.py - INFO - Failed to serialize <_io.BufferedReader name='/home/shared/input-01.jl.gz'>
Traceback (most recent call last):
File "/home/rolando/miniconda3/envs/datascience/lib/python3.5/site-packages/distributed/protocol/pickle.py", line 30, in dumps
result = pickle.dumps(x, protocol=pickle.HIGHEST_PROTOCOL)
TypeError: cannot serialize '_io.BufferedReader' object
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/rolando/miniconda3/envs/datascience/lib/python3.5/site-packages/distributed/protocol/pickle.py", line 43, in dumps