Last active
January 17, 2025 17:32
-
-
Save voltek62/784cf6cb29c76c182ae12b0481645fc2 to your computer and use it in GitHub Desktop.
get Web Traffic Data from SimilarWeb API with R
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
library(httr) | |
library(jsonlite) | |
# https://dataseolabs.com | |
# Doc : https://www.similarweb.com/corp/developer/ | |
# Create your key here : https://pro.similarweb.com/#/account/api-management | |
# You can have freely 3 Months of Web Traffic Data | |
# conf | |
myList <- c("cuisineaz.com","marmiton.org","odelices.com","allrecipes.fr") | |
myKey <- "YOURKEY" | |
dateStart <- "2018-03" | |
dateEnd <- "2018-05" | |
# create empty dataframe | |
results <- data.frame(site=character(), date=character(), visits=integer()) | |
for (site in myList) { | |
# query similarweb | |
url <- paste0("https://api.similarweb.com/v1/website/",site,"/total-traffic-and-engagement/visits?api_key=",myKey,"&start_date=",dateStart,"&end_date=",dateEnd,"&main_domain_only=false&granularity=monthly") | |
result <- GET(url) | |
text <- content(result,as = "text", encoding = "UTF-8") | |
json <- fromJSON(text) | |
# add lines if no error | |
if (grepl("Success", json$meta$status)) { | |
tmp <- cbind(site, json$visits) | |
results <- rbind(results, tmp) | |
} | |
} | |
# delete tmp objects | |
rm(json) | |
rm(result) | |
rm(tmp) | |
print(results) |
Updated 👍 with
- easy conf with dateStart, dateEnd, myKey, myList
- myList = website list
- error handling
- writing to a dataframe
- memory optimization
The script used to work properly a week back. Now I am getting 400 as the status code. Could you please re-run it and check whether there is any change made by the website.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
your_site <- "yourwebsite.com"
your_site1 <- "yourwebsite1.com"
your_site2 <- "yourwebsite2.com"
your_site3 <- "yourwebsite3.com"
your_site4 <- "yourwebsite4.com"
I think it is not ideal, and we can go so much further, with for exemple add results as df and compare data.
What I've been doing: https://gist.github.com/ArthurCa/8dd9dd4c07c74d19861477da77adef84
Many thanks Vincent for ideas and scripts.