Skip to content

Instantly share code, notes, and snippets.

@gtklocker
Last active January 12, 2018 20:38
Show Gist options
  • Save gtklocker/e15b5f1f7e80ad618b80b6adf200cf58 to your computer and use it in GitHub Desktop.
Save gtklocker/e15b5f1f7e80ad618b80b6adf200cf58 to your computer and use it in GitHub Desktop.
delos.uoi.gr scraper
#!/usr/bin/perl
# Last tested: 2017-01-22
use strict;
use warnings;
my $lesson_id = $ARGV[0];
for (my $page_id = 1; ; ++$page_id) {
my $url = "http://delos.uoi.gr/opendelos/search?crs=$lesson_id&sa=$page_id";
my $page = qx{curl --silent "$url"};
my $had_matches = 0;
while ($page =~ /[?]rid=([[:xdigit:]]{8})" class="lecture-title">\s*<strong>\s*([^<]+)[^)]+[)]\s*<[\/]span>\s*<br>\s*<[\/]div>\s*<[\/]div>\s*([^\n\r]+)/g) {
my $url = "http://delos.uoi.gr/delosrc/resources/vl/$1/$1.mp4";
my $title = $2;
my $description = $3;
$description =~ s/(^\s+|\s+$)//g;
$title =~ s/(^\s+|\s+$)//g;
print "$url\n";
print "$title ($description)\n\n";
$had_matches = 1;
}
if (!$had_matches) {
last;
}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment