Skip to content

Instantly share code, notes, and snippets.

@spiculator
Created August 19, 2012 15:14
Show Gist options
  • Save spiculator/3395437 to your computer and use it in GitHub Desktop.
Save spiculator/3395437 to your computer and use it in GitHub Desktop.
convert livelib.ru page like http://www.livelib.ru/reader/teak/print into zim-wiki format
#!/usr/bin/perl -w
use strict;
use utf8;
use encoding "utf8";
binmode( STDIN, ':utf8' );
binmode( STDOUT, ':utf8' );
use Encode qw(decode_utf8);
my @month_names = qw/Январь Февраль Март Апрель Май Июнь Июль Август Сентябрь Октябрь Ноябрь Декабрь/;
my %months = ();
$months{shift @month_names} = $_ foreach 1 .. 12;
sub format_date($) {
shift =~ /^(.*) (\d+)/ or die;
my ($year, $mname) = ($2, $1);
my $mnum = $months{$mname} or die;
$mnum = sprintf "%02d", $mnum;
return "[[Date:$year:$mnum|$year-$mnum]]";
}
my ($date, $title, $author);
while(<>) {
$_ = decode_utf8( $_, Encode::FB_CROAK );
if( /<tr><td colspan="5"><h2 style="margin-top: 14px; border:0;">(.*)<\/h2><\/td><\/tr>/ ) {
$date = format_date($1);
print "\n";
} elsif( /<a href="\/book\/\d+" style="font-weight: bold">(.*)<\/a><br>/ ) {
die if defined $title;
$title = $1;
} elsif( /<a href="\/author\/\d+" style="font-size:80%">(.*)<\/a>/ ) {
die if defined $author;
$author = $1;
} elsif( /<span style="font-size:80%">(.*)<\/span>/ ) {
die if defined $author;
$author = $1;
} elsif( /^\s*<\/tr>\s*$/ ) {
die "no author" unless defined $author;
die "no title" unless defined $title;
die "no date" unless defined $date;
print "$date **$title** ($author)\n";
undef $title;
undef $author;
}
}
@spiculator
Copy link
Author

The thing I love about Perl is that I spent 6 lines out of 43 to make it work with unicode.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment