Skip to content

Instantly share code, notes, and snippets.

@nicksyna01
Created July 22, 2019 20:35
Show Gist options
  • Select an option

  • Save nicksyna01/34af4cbdfea6ecc0400d5a96b9cb3258 to your computer and use it in GitHub Desktop.

Select an option

Save nicksyna01/34af4cbdfea6ecc0400d5a96b9cb3258 to your computer and use it in GitHub Desktop.
Python Code to Google_Search followed by Web_Scraping and Saving that to a Document
from googlesearch import search
import urllib.request as ur
from bs4 import BeautifulSoup
try:
from googlesearch import search
except ImportError:
print("No module named 'google' found")
# to search
query = "G20 Summit"
for j in search(query, tld="co.in", num=10, stop=1, pause=2):
# print(j)
html = ur.urlopen(j).read()
soup = BeautifulSoup(html, "lxml")
text = soup.get_text()
# break into lines and remove leading and trailing space on each
lines = (line.strip() for line in text.splitlines())
# break multi-headlines into a line each
chunks = (phrase.strip() for line in lines for phrase in line.split(" "))
# drop blank lines
text = '\n'.join(chunk for chunk in chunks if chunk)
#print(text)
file= open("G20 Summit.txt","a+")
#file = open(testfil.txt,w+)
file.write(text)
file.close()
@nicksyna01
Copy link
Copy Markdown
Author

Using it too much will stop ones access from google, so use it wisely.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment