Skip to content

Instantly share code, notes, and snippets.

@adrianshort
adrianshort / extract-urls.py
Created September 6, 2012 17:30
Extract URLs from a web page
# Extract URLs from a web page to a CSV file
# $ python extract-urls.py http://mysite.com/mypage.html myfile.csv
# By Adrian Short 6 Sep 2012
import sys
import urllib
import csv
from bs4 import BeautifulSoup
url = sys.argv.pop(1)
@adrianshort
adrianshort / scrape.rb
Last active December 10, 2015 13:59
Cheam North and Worcester Park local committee podcast feed creator. Scrapes the webpage and outputs an iTunes-friendly podcast RSS feed.
# Scrape webpage into a podcast RSS feed
# https://www.sutton.gov.uk/index.aspx?articleid=4332
require 'nokogiri'
require 'open-uri'
require 'time'
require 'pp'
FEED_TITLE = "Cheam North and Worcester Park Local Committee"
FEED_IMAGE = "https://dl.dropbox.com/u/300783/logo.png"
@adrianshort
adrianshort / csv2georss.rb
Last active December 17, 2015 03:59
Convert Open Plaques CSV export file to GeoRSS
require 'csv'
require 'pp'
require 'erb'
require 'time'
# https://gist.github.com/adrianshort/5547284
# $ ruby csv2georss.rb myfile.csv > feed.xml
template = ERB.new <<-EOF
<?xml version="1.0" encoding="UTF-8" ?>
# D'Hondt method calculations
# https://en.wikipedia.org/wiki/D'Hondt_method
# By Adrian Short (@adrianshort) 26 May 2014
# European Parliament election, London region, 22 May 2014
@parties = {
'4 Freedoms Party (UK EPP)' => 28014,
'An Independence from Europe' => 26675,
'Animal Welfare Party' => 21092,
@adrianshort
adrianshort / _template.txt
Created January 29, 2015 14:56
DokuWiki template for local wiki (suttonwiki.org in this case)
====== @!!PAGE@ ======
**@!!PAGE@** is
===== External links =====
* [[http://www.example.com/|Official website]]
* [[wp>@!!PAGE@]]
{{tag>tag1 tag2 "tag3 with spaces"}}
# Convert Jekyll blog posts to DokuWiki pages
# Adrian Short (https://adrianshort.org/) 15 Feb 2015
require 'fileutils'
require 'yaml'
require 'pp'
require 'pandoc-ruby'
INPUT_DIR = "./_posts"
OUTPUT_BASEDIR = "./blog"
#!/usr/bin/env sh
# Download PDF files for a planning application from Sutton Council planning website
# If you run this more than once it'll only download the new files uploaded for that application.
# Usage: $ get.sh <application number>, e.g. $ get.sh B2015/71962
# Install curl and wget before use. Mac users can install them with Homebrew.
# Windows users: Try running this in Cygwin or install Linux in a virtual machine.
# Adrian Short 26 Feb 2016
COOKIEJAR=cookiejar.txt
require 'uk_planning_scraper'
require 'csv'
apps = UKPlanningScraper::Authority.named('Aberdeen').scrape({ validated_days: 200, keywords: 'bt phone kiosk'})
puts "#{apps.size} applications scraped."
CSV.open("aberdeen.csv", "w") do |csv|
csv << apps.first.keys # header row
apps.each { |app| csv << app.values }
end