Last active
April 18, 2018 21:52
-
-
Save Alex-Huleatt/fdb1a97dd321c8b7d65ccff8efe6a2ee to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
''' | |
Google deprecated their search api | |
Google results pages do not immediately contain result urls (I checked) | |
Here is a really bad script to get the first page of results and a bunch of other stupid irrelevant urls | |
You need selenium and firefox. | |
Works for me on OSX. | |
Suck it Google, you can't control me. | |
Note: This might be against some Google TOS or get you blocked or banned or something idk. | |
@AlexHuleatt | |
''' | |
from selenium import webdriver | |
import urllib | |
import lxml.html | |
def query(q): | |
driver = webdriver.Firefox() | |
driver.get("https://www.google.com/search?q="+q) | |
src = driver.page_source | |
dom = lxml.html.fromstring(src) | |
results = filter(lambda x:x.startswith("http"),[link for link in dom.xpath('//a/@href')]) | |
driver.close() | |
return results | |
print query("llama") |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment