Skip to content

Instantly share code, notes, and snippets.

@revox
Created March 9, 2015 12:23
Show Gist options
  • Save revox/75b12a705c1e8897836f to your computer and use it in GitHub Desktop.
Save revox/75b12a705c1e8897836f to your computer and use it in GitHub Desktop.
Using a HashSet and JSoup to process a List or URLs
import java.util.*;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
import java.io.IOException;
class HashSetExample {
static Set<String> urlsProcessed = new HashSet<String>();
public static void main(String args[]) throws Exception {
List<String> toProcess = new ArrayList<String>();
toProcess.add("http://bbc.co.uk");
toProcess.add("http://wikipedia.com");
toProcess.add("http://example.com");
processURLs(toProcess);
}
public static void processURLs(List<String> urls) throws IOException {
for (String url : urls) {
if (!urlsProcessed.contains(url)) {
Document doc = Jsoup.connect(url).get();
Elements title = doc.getElementsByTag("title");
System.out.println(title.text());
urlsProcessed.add(url);
}
}
}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment