Skip to content

Instantly share code, notes, and snippets.

@eqhmcow
Last active December 31, 2022 18:49
Show Gist options
  • Save eqhmcow/5389877 to your computer and use it in GitHub Desktop.
Save eqhmcow/5389877 to your computer and use it in GitHub Desktop.
Perl unzip example with IO::Uncompress::Unzip
#!/usr/bin/perl
# example perl code, but this should now actually work properly, even on
# Windows
# thanks to everyone who tested this, reported bugs and suggested or
# implemented fixes!
# this code is licensed under GPL 2 and/or Artistic license;
# aka free perl software
use strict;
use warnings;
=pod
IO::Uncompress::Unzip works great to process zip files; but, it doesn't
include a routine to actually extract an entire zip file.
Other modules like Archive::Zip include their own unzip routines, which aren't
as robust as IO::Uncompress::Unzip; eg. they don't work on zip64 archive files.
So, the following is code to actually use IO::Uncompress::Unzip to extract
a zip file.
THIS IS EXAMPLE CODE
It certainly works and does a good bit of error handling, but it doesn't do
every sanity check you might expect. For example, it will, without warning
you, gladly overwrite any existing files that have the same name as files in
the archive if you extract it to an existing directory.
=cut
use File::Spec::Functions qw(splitpath);
use IO::File;
use IO::Uncompress::Unzip qw($UnzipError);
use File::Path qw(mkpath);
# example code to call unzip:
unzip(shift);
=head2 unzip
Extract a zip file, using IO::Uncompress::Unzip.
Arguments: file to extract, destination path
unzip('stuff.zip', '/tmp/unzipped');
=cut
sub unzip {
my ($file, $dest) = @_;
die 'Need a file argument' unless defined $file;
$dest = "." unless defined $dest;
$dest =~ s!/|\\$!!;
die "File argument is a directory: $file"
if -d $file;
die "No such file: $file $!"
unless -e $file;
my $u = IO::Uncompress::Unzip->new($file)
or die "Cannot open $file: $UnzipError";
my $status;
my %dirs;
for ($status = 1; $status > 0; $status = $u->nextStream()) {
# bail on error
last if $status < 0;
my $header = $u->getHeaderInfo();
my $stored_time = $header->{'Time'};
my (undef, $path, $name) = splitpath($header->{Name});
$path =~ s!/|\\$!!;
$name =~ s!/|\\$!!;
my $destdir = "$dest/$path";
my $destfile = "$destdir/$name";
# https://cwe.mitre.org/data/definitions/37.html
# CWE-37: Path Traversal
die "unsafe $destfile" if $destfile =~ m!\Q..\E(/|\\)!;
# don't try to overwrite an extant file by creating a directory
if (-e $destdir and not -d $destdir) {
die "Cannot create directory $destdir: File or path already exists.\nTry extracting " .
"to a different directory.";
}
# skip if the entire path is just an extant directory
if (-d $destfile) {
next;
}
# ok let's make a directory for this zip archive entry
unless (-d $destdir) {
mkpath($destdir) or die "Couldn't mkdir $destdir: $!";
# we're done if the entire path is simply the directory we
# just created
if (-d $destfile) {
# this entry is probably for the directory itself, so store
# its mtime, because we have to touch all the dirs after
# creating all the files; otherwise as we process the archive,
# file creation will just reset each directory's mtime
$dirs{$destdir} = $stored_time;
next;
}
}
# ok we should have a valid file here we can extract
my $buff;
my $fh = IO::File->new($destfile, "w")
or die "Couldn't write to $destfile: $!";
$fh->binmode();
while (($status = $u->read($buff)) > 0) {
$fh->write($buff);
}
$fh->close();
utime ($stored_time, $stored_time, $destfile)
or die "Couldn't touch $destfile: $!";
}
die "Error processing $file: $UnzipError $!\n"
if $status < 0 ;
# touch all the dirs that we created and that also had explicit directory
# entries in the archive
foreach my $dirpath (keys %dirs) {
my $stored_time = $dirs{$dirpath};
utime ($stored_time, $stored_time, $dirpath)
or die "Couldn't touch directory $dirpath: $!";
}
return;
}
1;
@mgx259
Copy link

mgx259 commented Mar 11, 2016

I found a better one for you that is shorter and works well, even on an older OSX system: https://gist.github.com/mshock/4156726

It's not really better as Archive::Zip library has problems with large files (more than 4 GB)

@tferic
Copy link

tferic commented Mar 30, 2016

I'm having an issue with this sub on Windows (7 & 8).
The sub does unzip a ZIP archive completely, but the resulting unzipped files are binary different compared to the files unzipped by right-click-unzip in Explorer.
i.e. unzipped exe files will not execute causing an error.
Also file properties look different.
Am I the only one having this issue?

@alambike
Copy link

alambike commented Apr 9, 2016

In line 56 I have to set slash optional to enter in conditional when it was extracting a directory:

        if ($name =~ m!/?$!) {

@blakeyjason
Copy link

Did some work on getting this to work on my Windows box... here's my take on this one. Thanks for the help getting it running!

use strict;
use warnings;
use IO::Compress::Zip;

use File::Spec::Functions qw(splitpath);
use IO::File;
use IO::Uncompress::Unzip qw($UnzipError);
use File::Path qw(mkpath);

...

sub unzipThisZipToThisDirectory {
my ($zipFilename, $destinationDir) = @_;

unless ($zipFilename and $destinationDir) {
	$! = "zipFilename and destinationDir must be supplied";
	return;
}

my $zipFile = IO::Uncompress::Unzip->new($zipFilename);

unless ($zipFile) {
	$! = "Could not open $zipFilename as zipFile";
	return;
}


# Status should always be 1 for each entry in the zip...
while (my $status = $zipFile->nextStream()) {

	my $zipEntryHeader = $zipFile->getHeaderInfo();
	my $pathAndFilename = $zipEntryHeader->{Name};

	# Returns from splitPath are volume, path, and filename...
	# We don't care about path (and it shouldn't be there, anyway)
    	my (undef, $path, $filename) = splitpath($pathAndFilename);

	# Make sure we have a filename...skip directories...
	$filename or next;

	my $fullOutputDir = $destinationDir . "/" . $path;
	mkpath($fullOutputDir);

	my $unzippedFilename = $fullOutputDir . $filename;

	my $outputFh;
	unless (open($outputFh, ">", $unzippedFilename)) {
		$! = "Could not open $unzippedFilename for writing";
		return;
	}

	binmode($outputFh);

	my $buffer;
    	while (($status = $zipFile->read($buffer))) {
		print $outputFh $buffer;
    	}

    	close($outputFh);

	print STDERR "Output $unzippedFilename " . -s $unzippedFilename, " bytes\n\n";
}

return 1;

}

@eqhmcow
Copy link
Author

eqhmcow commented Jul 31, 2018

security issue: added a "die" to catch path traversal issue as reported by https://snyk.io/research/zip-slip-vulnerability

@nazarekm
Copy link

Wow. This is the best thing to happen for me today. The unzip.pl worked like gem for me. Only additional check i added is [if ($name =~ m!/$! || $name eq "")] in line 56.

@nazarekm
Copy link

I'm having an issue with this sub on Windows (7 & 8).
The sub does unzip a ZIP archive completely, but the resulting unzipped files are binary different compared to the files unzipped by right-click-unzip in Explorer.
i.e. unzipped exe files will not execute causing an error.
Also file properties look different.
Am I the only one having this issue?

I am also facing the same issue on Windows. Is there a way out to get the original properties of the files ?

@eqhmcow
Copy link
Author

eqhmcow commented Sep 27, 2020

OK! People are still using this code I wrote 8 years ago! I guess that's the danger of contributing to open source :)

Many people reported bugs and supplied fixes. Much thanks to @volomike, @mgx259, @tferic, @alambike, @nazarekm,

and special thanks to @blakeyjason , @maikelsteneker and @tiobe for forking and improving the code. I've taken your improvements and now the code is cleaner and works better on Windows and Linux, and probably everywhere Perl 5.8 (or maybe even less) works. Perl will never die! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment