Skip to content

Instantly share code, notes, and snippets.

@cruvolo
Created April 5, 2020 16:25
Show Gist options
  • Save cruvolo/2dba382199aa7415085ff893fc9e869e to your computer and use it in GitHub Desktop.
Save cruvolo/2dba382199aa7415085ff893fc9e869e to your computer and use it in GitHub Desktop.
Scraping NJ Covid19 Municipal Data
#!/usr/bin/perl
# (c) Chris Ruvolo 2020.
# 2-clause BSD license: https://opensource.org/licenses/BSD-2-Clause
use strict;
use utf8;
use feature 'unicode_strings';
binmode(STDOUT, ":utf8");
my $url = "https://www.nj.com/coronavirus/2020/04/where-is-the-coronavirus-in-nj-latest-map-update-on-county-by-county-cases-april-5-2020.html";
my $county = undef;
print "County,Municpality,Cases,Deaths\n";
open(DUMP, '-|', "elinks --dump '$url'");
binmode(DUMP, ":utf8");
while (<DUMP>) {
if (/([A-Z][A-Z ]+) COUNTY/) {
$county = lc $1;
$county =~ s/^[a-z]/uc($&)/e;
$county =~ s/ [a-z]/uc($&)/e;
}
my ($muni, $cases) = ($1, $2) if /• *([A-Z ]+): (\d+)/i;
my $deaths = $2 if /(with|including) (\d+) (death|fatal)/ and defined $muni;
print "$county,$muni,$cases,$deaths\n" if defined ($muni);
}
close(DUMP);
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment