Skip to content

Instantly share code, notes, and snippets.

@searbe
Created August 7, 2012 09:48
Show Gist options
  • Save searbe/3284011 to your computer and use it in GitHub Desktop.
Save searbe/3284011 to your computer and use it in GitHub Desktop.
Parse simple XLSX in PHP with SimpleXML and ZipArchive
<?php
/**
* I had to parse an XLSX spreadsheet (which should damn well have been a CSV!)
* but the usual tools were hitting the memory limit pretty quick. I found that
* manually parsing the XML worked pretty well. Note that this, most likely,
* won't work if cells contain anything more than text or a number (so formulas,
* graphs, etc ..., I don't know what'd happen).
*/
$inputFile = '/path/to/spreadsheet.xlsx';
$dir = '/path/to/tmp/dir';
// Unzip
$zip = new ZipArchive();
$zip->open($inputFile);
$zip->extractTo($dir);
// Open up shared strings & the first worksheet
$strings = simplexml_load_file($dir . '/xl/sharedStrings.xml');
$sheet = simplexml_load_file($dir . '/xl/worksheets/sheet1.xml');
// Parse the rows
$xlrows = $sheet->sheetData->row;
foreach ($xlrows as $xlrow) {
$arr = array();
// In each row, grab it's value
foreach ($xlrow->c as $cell) {
$v = (string) $cell->v;
// If it has a "t" (type?) of "s" (string?), use the value to look up string value
if (isset($cell['t']) && $cell['t'] == 's') {
$s = array();
$si = $strings->si[(int) $v];
// Register & alias the default namespace or you'll get empty results in the xpath query
$si->registerXPathNamespace('n', 'http://schemas.openxmlformats.org/spreadsheetml/2006/main');
// Cat together all of the 't' (text?) node values
foreach($si->xpath('.//n:t') as $t) {
$s[] = (string) $t;
}
$v = implode($s);
}
$arr[] = $v;
}
// Assuming the first row are headers, stick them in the headers array
if (count($headers) == 0) {
$headers = $arr;
} else {
// Combine the row with the headers - make sure we have the same column count
$values = array_pad($arr, count($headers), '');
$row = array_combine($headers, $values);
/**
* Here, do whatever you like with the [header => value] assoc array in $row.
* It might be useful just to run this script without any code here, to watch
* memory usage simply iterating over your spreadsheet.
*/
}
}
@unlink($dir);
@unlink($inputFile);
@lubosdz
Copy link

lubosdz commented Nov 8, 2016

@fabianledefyl

Dates in excel are numbers representing the number of days since 01.01.1900.
Thus 41263 in a date cell is something like 113 years and 18 days since 01.01.1900 = 18.01.2013.

Here are two PHP functions to convert Excel dates into days, months, years:

    // EXCEL serialDATE conversions based on Julian <-> Gregorian calendars
    protected static function excel_D2DMY($days) {
        if ($days < 1){
            return '';
        }
        if ($days == 60)  {
            return array('day'=>29, 'month'=>2, 'year'=>1900);
        } else {
            if ($days < 60)  {
                // Because of the 29-02-1900 bug, any serial date
                // under 60 is one off... Compensate.
                ++$days;
            }
            // Modified Julian to DMY calculation with an addition of 2415019
            $l = $days + 68569 + 2415019;
            $n = floor(( 4 * $l ) / 146097);
            $l = $l - floor(( 146097 * $n + 3 ) / 4);
            $i = floor(( 4000 * ( $l + 1 ) ) / 1461001);
            $l = $l - floor(( 1461 * $i ) / 4) + 31;
            $j = floor(( 80 * $l ) / 2447);
            $nDay = $l - floor(( 2447 * $j ) / 80);
            $l = floor($j / 11);
            $nMonth = $j + 2 - ( 12 * $l );
            $nYear = 100 * ( $n - 49 ) + $i + $l;
            $ret = array('day'=>$nDay, 'month'=>$nMonth, 'year'=>$nYear);
            return $ret;
        }
    }

    /**
    * @desc Returns Excel serialDate constant calculated from gregorian date
    * original javascript found @ http://wwwmacho.mcmaster.ca/JAVA/JD.html
    *
    * tested compatibility with EXCEL dates between 01/01/1900..31.12.2099,
    * year given eventually as 0 (=1900)..199 (=2099)
    * adds also days if supplied argument more than real number of days
    * in particular month (same applies to months)
    * as for years bellow 1900: algorythm is not exact, therefore limitation
    * for years 1900 - 2099
    */
    protected static function excel_DMY2D($d, $m, $y, $uh=0, $um=0, $us=0) {
        if($y < 1900){
            if($y<0 || $y>199){
                // Invalid year, must be between 1900 - 2099 or 0 - 199
                return false;
            }else{
                $y += 1900;
            }
        }
        $extra = 100.0*$y + $m - 190002.5;
        $rjd = 367.0*$y;
        $rjd -= floor(7.0*($y+floor(($m+9.0)/12.0))/4.0);
        $rjd += floor(275.0*$m/9.0);
        $rjd += $d;
        $rjd += ($uh + ($um + $us/60.0)/60.)/24.0;
        $rjd += 1721013.5;
        $rjd -= 0.5*$extra/abs($extra);
        $rjd += 0.5;
        $rjd -= 2415020.5; // JD correction constant for base 01/01/1900
        $rjd += ($rjd > 60) ? 2 : 1; // adjust to inheritet EXCEL/LOTUS bug (2000 was NOT leap year, but LOTUS/EXCEL treated it is if it was)
        return $rjd;
    }

@makantayebi
Copy link

nice work! On my version of PHP the $headers = array(); in the beginning is necessary. would be a tiny update to this. I suggest allowing pull requests.

@towony
Copy link

towony commented Jan 4, 2018

I want to be very grateful. I wanted a very simple one. You've already made it. I needed a little modification. thanks again

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment