Skip to content

Instantly share code, notes, and snippets.

@jeremy-code
Created May 5, 2026 02:08
Show Gist options
  • Select an option

  • Save jeremy-code/b46ada0c3f6e1f7e4d57fde87cde795e to your computer and use it in GitHub Desktop.

Select an option

Save jeremy-code/b46ada0c3f6e1f7e4d57fde87cde795e to your computer and use it in GitHub Desktop.
Extract images with transparency from PDFs with sharp, pdf.js

Pdfimages seems to only extract images with transparency from PDFs by downloading it alongside the image mask. You can probably use something like ImageMagick if you only have a couple of images to convert it back into a PNG with transparency, but since there was a lot of images in my case, this is the script I used.

It uses pdf.js and sharp. Sharp was probably overkill but it makes the exporting part a bit cleaner.

import { getDocument, ImageKind, OPS } from "pdfjs-dist/legacy/build/pdf.mjs";
import sharp from "sharp";
const MY_PDF = "./test.pdf"
const document = await getDocument(MY_PDF).promise;
console.log(document);
const promises = Array.from({ length: document.numPages }).map(
async (_, pageIndex) => {
const page = await document.getPage(pageIndex + 1);
const operatorList = await page.getOperatorList();
const imageIndices = operatorList.fnArray.flatMap((fn, index) =>
fn === OPS.paintImageXObject ? [index] : [],
);
const imageIds = imageIndices.map(
(index) => operatorList.argsArray[index][0],
);
const imageObjects = imageIds
.filter(
(imageId) => page.objs.has(imageId) || page.commonObjs.has(imageId),
)
.map((imageId) =>
imageId.startsWith("g_")
? page.commonObjs.get(imageId)
: page.objs.get(imageId),
);
return Promise.all(
imageObjects.map((imageObject, index) =>
sharp(imageObject.data, {
raw: {
width: imageObject.width,
height: imageObject.height,
channels:
imageObject.kind === ImageKind.GRAYSCALE_1BPP
? 1
: imageObject.kind === ImageKind.RGB_24BPP
? 3
: imageObject.kind === ImageKind.RGBA_32BPP
? 4
: 3,
},
}).toFile(`output/out-${pageIndex}-${index}.png`),
),
);
},
);
await Promise.all(promises);
{
"name": "test",
"version": "0.0.0",
"type": "module",
"description": "",
"scripts": {
"start": "node ./index.js"
},
"keywords": [],
"author": "Jeremy Nguyen <nguyen.jeremyt@gmail.com> (https://jeremy.ng)",
"license": "MIT",
"packageManager": "pnpm@10.33.2",
"dependencies": {
"node-canvas": "^2.9.0",
"pdfjs-dist": "^5.7.284",
"sharp": "^0.34.5"
}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment