-
-
Save silasrm/3da655045b899a858eae4f4463755f5c to your computer and use it in GitHub Desktop.
<?php | |
/** | |
* Split PDF file | |
* | |
* <p>Split all of the pages from a larger PDF files into | |
* single-page PDF files.</p> | |
* | |
* @package FPDF required http://www.fpdf.org/ | |
* @package FPDI required http://www.setasign.de/products/pdf-php-solutions/fpdi/ | |
* @param string $filename The filename of the PDF to split | |
* @param string $end_directory The end directory for split PDF (original PDF's directory by default) | |
* @return void | |
*/ | |
function split_pdf($filename, $end_directory = false) | |
{ | |
require_once('fpdf/fpdf.php'); | |
require_once('fpdi/fpdi.php'); | |
$end_directory = $end_directory ? $end_directory : './'; | |
$new_path = preg_replace('/[\/]+/', '/', $end_directory.'/'.substr($filename, 0, strrpos($filename, '/'))); | |
if (!is_dir($new_path)) | |
{ | |
// Will make directories under end directory that don't exist | |
// Provided that end directory exists and has the right permissions | |
mkdir($new_path, 0777, true); | |
} | |
$pdf = new FPDI(); | |
$pagecount = $pdf->setSourceFile($filename); // How many pages? | |
// Split each page into a new PDF | |
for ($i = 1; $i <= $pagecount; $i++) { | |
$new_pdf = new FPDI(); | |
$new_pdf->AddPage(); | |
$new_pdf->setSourceFile($filename); | |
$new_pdf->useTemplate($new_pdf->importPage($i)); | |
try { | |
$new_filename = $end_directory.str_replace('.pdf', '', $filename).'_'.$i.".pdf"; | |
$new_pdf->Output($new_filename, "F"); | |
echo "Page ".$i." split into ".$new_filename."<br />\n"; | |
} catch (Exception $e) { | |
echo 'Caught exception: ', $e->getMessage(), "\n"; | |
} | |
} | |
} | |
// Create and check permissions on end directory! | |
split_pdf("filename.pdf", 'split/'); | |
?> |
You'r wrong.
PDF example: https://www.camara.leg.br/internet/comissao/index/mista/orca/orcamento/or2020/proposta/Of745-2019-ME.pdf
Convert this to PDF1.4 (FPDI free has support to this version) with 5,5M.
Generated files with sizes:
Of745-2019-ME-1.4_10.pdf => 56K
Of745-2019-ME-1.4_11.pdf => 167K
Of745-2019-ME-1.4_12.pdf => 55K
Of745-2019-ME-1.4_13.pdf => 195K
Of745-2019-ME-1.4_14.pdf => 55K
Of745-2019-ME-1.4_15.pdf => 428K
Of745-2019-ME-1.4_16.pdf => 300K
Of745-2019-ME-1.4_17.pdf => 385K
Of745-2019-ME-1.4_18.pdf => 202K
Of745-2019-ME-1.4_19.pdf => 315K
Of745-2019-ME-1.4_1.pdf => 265K
Of745-2019-ME-1.4_20.pdf => 231K
Of745-2019-ME-1.4_21.pdf => 528K
Of745-2019-ME-1.4_22.pdf => 401K
Of745-2019-ME-1.4_23.pdf => 202K
Of745-2019-ME-1.4_24.pdf => 56K
Of745-2019-ME-1.4_2.pdf => 56K
Of745-2019-ME-1.4_3.pdf => 368K
Of745-2019-ME-1.4_4.pdf => 57K
Of745-2019-ME-1.4_5.pdf => 365K
Of745-2019-ME-1.4_6.pdf => 228K
Of745-2019-ME-1.4_7.pdf => 583K
Of745-2019-ME-1.4_8.pdf => 432K
Of745-2019-ME-1.4_9.pdf => 163K
Only difference of my test is the use of composer and autoload:
{
"require": {
"setasign/fpdf": "^1.8",
"setasign/fpdi-fpdf": "^2.3"
}
}
require_once 'vendor/autoload.php';
use \setasign\Fpdi\Fpdi as FPDI;
It seems merged PDFs with FPDF/FPDI have this behaviour. I split your original PDF, I get:
Splitting Of745-2019-ME.pdf (5.46 MB) to 24 separate files:
- Page 1 (235.76 KB)
- Page 2 (55.59 KB)
- Page 3 (352.19 KB)
- Page 4 (56.39 KB)
- Page 5 (332.42 KB)
- Page 6 (198.66 KB)
- Page 7 (552.97 KB)
- Page 8 (414.29 KB)
- Page 9 (135.28 KB)
- Page 10 (55.21 KB)
- Page 11 (149.83 KB)
- Page 12 (54.63 KB)
- Page 13 (164.17 KB)
- Page 14 (54.73 KB)
- Page 15 (400.54 KB)
- Page 16 (278.96 KB)
- Page 17 (350.73 KB)
- Page 18 (169.97 KB)
- Page 19 (287.66 KB)
- Page 20 (209.94 KB)
- Page 21 (503.8 KB)
- Page 22 (371.07 KB)
- Page 23 (172.54 KB)
- Page 24 (55.13 KB)
All good. Then I merged all those pages into a single PDF (with FPDF/FPDI), then tried to split that PDF again:
Splitting re-merged.pdf (5.48 MB) to 24 separate files:
- Page 1 (5.47 MB)
- Page 2 (5.47 MB)
- Page 3 (5.47 MB)
- Page 4 (5.47 MB)
- Page 5 (5.47 MB)
- Page 6 (5.47 MB)
- Page 7 (5.47 MB)
- Page 8 (5.47 MB)
- Page 9 (5.47 MB)
- Page 10 (5.47 MB)
- Page 11 (5.47 MB)
- Page 12 (5.47 MB)
- Page 13 (5.47 MB)
- Page 14 (5.47 MB)
- Page 15 (5.47 MB)
- Page 16 (5.47 MB)
- Page 17 (5.47 MB)
- Page 18 (5.47 MB)
- Page 19 (5.47 MB)
- Page 20 (5.47 MB)
- Page 21 (5.47 MB)
- Page 22 (5.47 MB)
- Page 23 (5.47 MB)
- Page 24 (5.47 MB)
If I open each of the page I see the actual page, all of them are ok. When doing binary comparison of these files I get something like
>fc /b re-merged_1.pdf re-merged_2.pdf
Comparing files re-merged_1.pdf and re-merged_2.pdf
0000039A: 54 52
000003A6: 6D 77
000003A8: B5 B6
Only 3 bytes different.
Crazy this error with merged files. Report this case to SetaSign ou FPDF team.
The FPDF team replied
What you see is a consequence of the way FPDI works. When FPDI imports a page, it transforms it into an object (called XObject). So, when you merge 24 PDFs, you end up with a PDF that contains 24 XObjects.
Then, when you import a page from that PDF, I suppose that FPDI imports the 24 XObjects (because it doesn't know which ones are used by the imported page, and which ones are not, so it imports them all).
Olivier
Makes sense.
Humm. Yes. Report this bug to SetaSign and probably make fixed in future.
This doesn't actually "split" the initial PDF, it just hides all other pages (check the output files, they're the exact same size and only few bytes differ), at least this is what happens for some PDFs