Skip to content

Instantly share code, notes, and snippets.

@flovv
Last active September 25, 2020 05:10
Show Gist options
  • Select an option

  • Save flovv/63e79a3149729b57d0397bb22a589856 to your computer and use it in GitHub Desktop.

Select an option

Save flovv/63e79a3149729b57d0397bb22a589856 to your computer and use it in GitHub Desktop.
scrapeGoogleImages_file1
var url ='https://www.google.de/search?q=Yahoo+logo&source=lnms&tbm=isch&sa=X';
var page = new WebPage()
var fs = require('fs');
var vWidth = 1080;
var vHeight = 1920;
page.viewportSize = {
width: vWidth ,
height: vHeight
};
//Scroll throu!
var s = 0;
var sBase = page.evaluate(function () { return document.body.scrollHeight; });
page.scrollPosition = {
top: sBase,
left: 0
};
function sc() {
var sBase2 = page.evaluate(function () { return document.body.scrollHeight; });
if (sBase2 != sBase) {
sBase = sBase2;
}
if (s> sBase) {
page.viewportSize = {width: vWidth, height: vHeight};
return;
}
page.scrollPosition = {
top: s,
left: 0
};
page.viewportSize = {width: vWidth, height: s};
s += Math.min(sBase/20,400);
setTimeout(sc, 110);
}
function just_wait() {
setTimeout(function() {
fs.write('1.html', page.content, 'w');
phantom.exit();
}, 2500);
}
page.open(url, function (status) {
sc();
just_wait();
});
library(plyr)
library(reshape2)
require(rvest)
scrapeJSSite <- function(searchTerm){
url <- paste0("https://www.google.de/search?q=",searchTerm, "&source=lnms&tbm=isch&sa=X")
lines <- readLines("imageScrape.js")
lines[1] <- paste0("var url ='", url ,"';")
writeLines(lines, "imageScrape.js")
## Download website
system("phantomjs imageScrape.js")
pg <- read_html("1.html")
files <- pg %>% html_nodes("img") %>% html_attr("src")
df <- data.frame(images=files, search=searchTerm)
return(df)
}
downloadImages <- function(files, brand, outPath="images"){
for(i in 1:length(files)){
download.file(files[i], destfile = paste0(outPath, "/", brand, "_", i, ".jpg"), mode = 'wb')
}
}
### exchange the search terms here!
gg <- scrapeJSSite(searchTerm = "Adidas+logo")
downloadImages(as.character(gg$images), i)
@geotheory
Copy link
Copy Markdown

34: downloadImages(as.character(gg$images), 'yahoo')

@andreaangeli
Copy link
Copy Markdown

I run your code but it returns this error:
Error in paste0(outPath, "/", brand, "_", i, ".jpg") :
object 'i' not found

@LucaWRGF
Copy link
Copy Markdown

LucaWRGF commented Jul 19, 2017

@andreaangeli, went good for me like this, hope it can help :
line 25 to 34 in scrapeGoogleImages.r
`
#"outPath" has to be adapt !
downloadImages <- function(files, brand, outPath="D://scrape_images//brand"){
for(i in 1:length(files)){
download.file(files[i], destfile = paste0(outPath, "/", brand, "_", i, ".jpg"), mode = 'wb')
}

}

exchange the search terms here!

gg <- scrapeJSSite(searchTerm = "Hermes+logo")
downloadImages(as.character(gg$images), 'Hermes')

`

@markusdumke
Copy link
Copy Markdown

How can I download more than 20 images?

@ArindamRouth
Copy link
Copy Markdown

How to Download more than 20 images? Please help

@flovv
Copy link
Copy Markdown
Author

flovv commented Dec 29, 2019 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment