Skip to content

Instantly share code, notes, and snippets.

View revox's full-sized avatar

Andy Freeman revox

View GitHub Profile
@revox
revox / bond_bars.html
Last active November 1, 2018 09:54
Starting point for class exercise in using Google Charts
<!DOCTYPE html>
<html>
<head>
<title>Bonds - who is the most violent?</title>
<script type="text/javascript" src="https://www.google.com/jsapi"></script>
<script type="text/javascript">
google.load('visualization', '1', {packages: ['corechart', 'bar']});
google.setOnLoadCallback(drawBasic);
function drawBasic() {
@revox
revox / tweetdate_to_sqldate.sql
Created July 22, 2015 08:25
Convert tweet dates into MySQL dates
/* Tweet dates look like this Tue Apr 21 18:07:47 +0000 2015 */
/* test */
SELECT STR_TO_DATE('Tue Apr 21 18:07:47 +0000 2015', '%a %b %d %H:%i:%s +0000 %Y');
/* import stringdate into fake date column which has type VARCHAR, then add a DATETIME column and the do this */
UPDATE tweets SET tweetdate = STR_TO_DATE(fake, '%a %b %d %H:%i:%s +0000 %Y');
@revox
revox / mapitmysociety_kml_scraper.py
Created March 13, 2015 12:47
Scrape electoral boundaries from mapitmysociety using BS4
# Scrapes KML files of UK parliamentary electoral units from mapitmysociety
# There are no files for Northern Island so their entries are empty
# Result is a CSV with name and geometry columns
import urllib, bs4, csv, sys, time
URL = 'http://mapit.mysociety.org/areas/WMC.html'
csvfile = open('electoral_units.csv', 'w')
csvwriter = csv.writer(csvfile)
@revox
revox / Brokens.java
Created March 9, 2015 14:03
JSoup crawler variation
import java.util.ArrayList;
import java.util.HashSet;
import java.util.HashSet.*;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
import java.io.IOException;
public class Brokens
@revox
revox / Crawler.java
Created March 9, 2015 12:33
Basic crawler accepts a single starting URL and total length of crawl int
import java.util.ArrayList;
import java.util.HashSet;
import java.util.HashSet.*;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
import java.io.IOException;
public class Crawler
@revox
revox / HashSetExample.java
Created March 9, 2015 12:23
Using a HashSet and JSoup to process a List or URLs
import java.util.*;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
import java.io.IOException;
class HashSetExample {
static Set<String> urlsProcessed = new HashSet<String>();
@revox
revox / ExtractLinks.java
Created March 9, 2015 11:34
JSoup simple routine to extract all HREF links as absolute URLs
@revox
revox / JSoupExampleThree.java
Created March 9, 2015 11:12
JSoup using abs:href to get absolute URLs
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
import java.io.IOException;
public class JSoupExampleThree {
public static void main(String[] args) throws Exception {
Document doc = Jsoup.connect("http://gold.ac.uk").get();
@revox
revox / JSoupExampleTwo.java
Created March 9, 2015 10:35
JSoup example, print links and anchor text
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
import java.io.IOException;
public class JSoupExampleTwo {
public static void main(String[] args) throws Exception {
Document doc = Jsoup.connect("http://en.wikipedia.org/").get();