Skip to content

Instantly share code, notes, and snippets.

@danielmarschall
Created July 9, 2024 22:28
Show Gist options
  • Save danielmarschall/8f4788ef4ff3b41ff00ba2057acc3234 to your computer and use it in GitHub Desktop.
Save danielmarschall/8f4788ef4ff3b41ff00ba2057acc3234 to your computer and use it in GitHub Desktop.
Convert plain PDF to PDF/A-3b
<?php
use horstoeko\zugferd\ZugferdPdfWriter;
/**
* Converts a plain PDF to PDF/A-3b (without attachments).
* @param string $sourcePdf Source PDF file name
* @param string $destPdf Destination PDF file name (can be the same as the source)
* @param string $title Title metadata
* @param string $author Author metadata
* @param string $creatorTool Creator tool metadata
* @return void
*/
function convert_pdf_to_pdfa(string $sourcePdf, string $destPdf, string $title, string $author, string $creatorTool=''): void
{
$pdfWriter = new ZugferdPdfWriter();
// Copy pages from the original PDF
$pageCount = $pdfWriter->setSourceFile($sourcePdf);
for ($pageNumber = 1; $pageNumber <= $pageCount; ++$pageNumber) {
$pageContent = $pdfWriter->importPage($pageNumber, '/MediaBox');
$pdfWriter->AddPage();
$pdfWriter->useTemplate($pageContent, 0, 0, null, null, true);
}
// Set PDF version 1.7 according to PDF/A-3 ISO 32000-1
$pdfWriter->setPdfVersion('1.7', true);
// Update meta data (e.g. such as author, producer, title)
$pdfMetadata = array(
'author' => $author,
'keywords' => '',
'title' => $title,
'subject' => '',
'createdDate' => date('Y-m-d\TH:i:s') . '+00:00',
'modifiedDate' => date('Y-m-d\TH:i:s') . '+00:00',
);
$pdfWriter->setPdfMetadataInfos($pdfMetadata);
$xmp = simplexml_load_file(ZugferdSettings::getFullXmpMetaDataFilename());
$descriptionNodes = $xmp->xpath('rdf:Description');
// rdf:Description urn:factur-x:pdfa:CrossIndustryDocument:invoice:1p0#
// $descriptionNodes[0] not applicable
// Factur-X PDFA Extension Schema http://www.aiim.org/pdfa/ns/extension/
// $descriptionNodes[1] not applicable
// rdf:Description http://www.aiim.org/pdfa/ns/id/
// PDF/A-3b declaration
$descPdfAid = $descriptionNodes[2];
$pdfWriter->addMetadataDescriptionNode($descPdfAid->asXML());
// rdf:Description http://purl.org/dc/elements/1.1/
$descDc = $descriptionNodes[3];
$descNodes = $descDc->children('dc', true);
$descNodes->title->children('rdf', true)->Alt->li = $pdfMetadata['title'];
$descNodes->creator->children('rdf', true)->Seq->li = $pdfMetadata['author'];
$descNodes->description->children('rdf', true)->Alt->li = $pdfMetadata['subject'];
$pdfWriter->addMetadataDescriptionNode($descDc->asXML());
// rdf:Description http://ns.adobe.com/pdf/1.3/
$descAdobe = $descriptionNodes[4];
$descAdobe->children('pdf', true)->{'Producer'} = 'FPDF';
$pdfWriter->addMetadataDescriptionNode($descAdobe->asXML());
// rdf:Description http://ns.adobe.com/xap/1.0/
$descXmp = $descriptionNodes[5];
$xmpNodes = $descXmp->children('xmp', true);
$xmpNodes->{'CreatorTool'} = $creatorTool;
$xmpNodes->{'CreateDate'} = $pdfMetadata['createdDate'];
$xmpNodes->{'ModifyDate'} = $pdfMetadata['modifiedDate'];
$pdfWriter->addMetadataDescriptionNode($descXmp->asXML());
// Save file
$pdfWriter->Output($destPdf, 'F');
}
@danielmarschall
Copy link
Author

danielmarschall commented Jul 9, 2024

@andrex47
Copy link

wonderful, it is possible also save a pdf with attachments using this function?

@danielmarschall
Copy link
Author

@andrex47 With this function not, but it can be added. This code https://github.com/horstoeko/zugferd/blob/master/src/ZugferdDocumentPdfBuilderAbstract.php#L331 does this, so I think you could copy a few lines from there

@Christoph-Damm
Copy link

I have a running horstoeko/zugferd installation, so I’am able to merge a PDF/A with a XML-file. Now I want to insert the function "convert_pdf_to_pdfa" before merging, so I can use a plain PDF for merging.
For testing I put the file "pdfa_convert.php" in the folder "vendor/horstoeko/zugferd/src" and use this test script:

`<?php
use horstoeko\zugferd\pdfa_convert;
require DIR . "/vendor/autoload.php";

// the parameters for the function convert_pdf_to_pdfa()

$sourcePdf = "uploads/" . basename($_FILES["fileToUpload"]["name"]);
$destPdf = "uploads/converted/" . basename($_FILES["fileToUpload"]["name"]);
$title = "Test";
$author = "Christoph Damm";
$creatorTool ='';

convert_pdf_to_pdfa();`

Why does not work it (no error, no output, no converted PDF)?

One more question:
In line 23 ($pageContent = $pdfWriter->importPage($pageNumber, '/MediaBox');) is a folder "/MediaBox".
Does I have to create a folder with that name? Which directory is the right for inserting the folder?

Thank you in advance for your help!

@danielmarschall
Copy link
Author

danielmarschall commented Jan 30, 2025

I have a running horstoeko/zugferd installation, so I’am able to merge a PDF/A with a XML-file. Now I want to insert the function "convert_pdf_to_pdfa" before merging, so I can use a plain PDF for merging. For testing I put the file "pdfa_convert.php" in the folder "vendor/horstoeko/zugferd/src" and use this test script:

`<?php use horstoeko\zugferd\pdfa_convert; require DIR . "/vendor/autoload.php";

// the parameters for the function convert_pdf_to_pdfa()

$sourcePdf = "uploads/" . basename($_FILES["fileToUpload"]["name"]); $destPdf = "uploads/converted/" . basename($_FILES["fileToUpload"]["name"]); $title = "Test"; $author = "Christoph Damm"; $creatorTool ='';

convert_pdf_to_pdfa();`

Why does not work it (no error, no output, no converted PDF)?

One more question: In line 23 ($pageContent = $pdfWriter->importPage($pageNumber, '/MediaBox');) is a folder "/MediaBox". Does I have to create a folder with that name? Which directory is the right for inserting the folder?

Thank you in advance for your help!

You need to pass the variables to the method.

<?php
use horstoeko\zugferd\pdfa_convert;
require __DIR__ . "/vendor/autoload.php";

$sourcePdf = "uploads/" . basename($_FILES["fileToUpload"]["name"]);
$destPdf = "uploads/converted/" . basename($_FILES["fileToUpload"]["name"]);
$title = "Test";
$author = "Christoph Damm";
$creatorTool ='';

convert_pdf_to_pdfa($sourcePdf, $destPdf, $title, $author, $creatorTool);

/MediaBox is not a directory. It is a command for PDF files. PDF files have text commands in them like RichText (RTF) files. The commands begin with a slash. So, there is nothing to worry for "/MediaBox"

@Christoph-Damm
Copy link

Thanks Daniel for your quick answer and info!
Now I got it working, but without using the script as a function: I define the 5 variables (param strings) and use only the code inside the function (line 16 to 81) as script. In addition to line 1 "use horstoeko\zugferd\ZugferdPdfWriter;" the line "use horstoeko\zugferd\ZugferdSettings;" was required.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment