Last active
July 29, 2024 13:36
-
-
Save primaryobjects/0160c84f4dafe51f7e6f to your computer and use it in GitHub Desktop.
Grabs all LinkedIn urls from a Coursera forum thread. Perfect for the "Let's Connect!" threads.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
require(XML) | |
require(stringr) | |
# | |
# Grabs all LinkedIn urls from a Coursera forum thread. Perfect for the "Let's Connect!" threads. | |
# Usage: | |
# 1. Open a Coursera forum thread, containing LinkedIn links. | |
# 2. Scroll all the way to the bottom of the page to load all posts in the thread. | |
# 3. Save the web page to an html file named post.htm. | |
# 4. Call linkedInLinks("post.htm") | |
# 5. To save the result to a file: write.csv(linkedInLinks("post.htm"), "links.csv", quote=FALSE, row.names=FALSE) | |
# | |
linkedInLinks <- function(fileName) { | |
# Read html page from file. | |
data <- htmlTreeParse(fileName, useInternalNodes = T) | |
# Find all linkedin links in the comments. | |
commentElements <- xpathApply(data, "//div[@class='course-forum-post-text']//a[contains(@href, 'linkedin')]") | |
# Get the href value for each link. | |
sapply(commentElements, function(el) xmlGetAttr(el, "href")) | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment