adrianshort’s gists

adrianshort / extract-urls.py

Created September 6, 2012 17:30

Extract URLs from a web page

	# Extract URLs from a web page to a CSV file
	# $ python extract-urls.py http://mysite.com/mypage.html myfile.csv
	# By Adrian Short 6 Sep 2012

	import sys
	import urllib
	import csv
	from bs4 import BeautifulSoup

	url = sys.argv.pop(1)

adrianshort / scrape.rb

Last active December 10, 2015 13:59

Cheam North and Worcester Park local committee podcast feed creator. Scrapes the webpage and outputs an iTunes-friendly podcast RSS feed.

	# Scrape webpage into a podcast RSS feed
	# https://www.sutton.gov.uk/index.aspx?articleid=4332

	require 'nokogiri'
	require 'open-uri'
	require 'time'
	require 'pp'

	FEED_TITLE = "Cheam North and Worcester Park Local Committee"
	FEED_IMAGE = "https://dl.dropbox.com/u/300783/logo.png"

adrianshort / csv2georss.rb

Last active December 17, 2015 03:59

Convert Open Plaques CSV export file to GeoRSS

	require 'csv'
	require 'pp'
	require 'erb'
	require 'time'

	# https://gist.github.com/adrianshort/5547284
	# $ ruby csv2georss.rb myfile.csv > feed.xml

	template = ERB.new <<-EOF
	<?xml version="1.0" encoding="UTF-8" ?>

adrianshort / dhondt.rb

Created May 26, 2014 13:45

	# D'Hondt method calculations
	# https://en.wikipedia.org/wiki/D'Hondt_method
	# By Adrian Short (@adrianshort) 26 May 2014

	# European Parliament election, London region, 22 May 2014

	@parties = {
	'4 Freedoms Party (UK EPP)' => 28014,
	'An Independence from Europe' => 26675,
	'Animal Welfare Party' => 21092,

adrianshort / _template.txt

Created January 29, 2015 14:56

DokuWiki template for local wiki (suttonwiki.org in this case)

	====== @!!PAGE@ ======

	@!!PAGE@ is

	===== External links =====

	* [[http://www.example.com/\|Official website]]
	* [[wp>@!!PAGE@]]

	{{tag>tag1 tag2 "tag3 with spaces"}}

adrianshort / jekyll2dokuwiki.rb

Created February 19, 2015 09:25

	# Convert Jekyll blog posts to DokuWiki pages
	# Adrian Short (https://adrianshort.org/) 15 Feb 2015

	require 'fileutils'
	require 'yaml'
	require 'pp'
	require 'pandoc-ruby'

	INPUT_DIR = "./_posts"
	OUTPUT_BASEDIR = "./blog"

adrianshort / get.sh

Last active January 12, 2017 16:38

	#!/usr/bin/env sh

	# Download PDF files for a planning application from Sutton Council planning website
	# If you run this more than once it'll only download the new files uploaded for that application.
	# Usage: $ get.sh <application number>, e.g. $ get.sh B2015/71962
	# Install curl and wget before use. Mac users can install them with Homebrew.
	# Windows users: Try running this in Cygwin or install Linux in a virtual machine.
	# Adrian Short 26 Feb 2016

	COOKIEJAR=cookiejar.txt

adrianshort / aberdeen.rb

Created October 11, 2018 11:34

	require 'uk_planning_scraper'
	require 'csv'

	apps = UKPlanningScraper::Authority.named('Aberdeen').scrape({ validated_days: 200, keywords: 'bt phone kiosk'})
	puts "#{apps.size} applications scraped."

	CSV.open("aberdeen.csv", "w") do \|csv\|
	csv << apps.first.keys # header row
	apps.each { \|app\| csv << app.values }
	end