Skip to content

Instantly share code, notes, and snippets.

@rfezzani
Last active September 27, 2023 07:00
Show Gist options
  • Save rfezzani/b4b8852c5a48a901c1e94e09feb34743 to your computer and use it in GitHub Desktop.
Save rfezzani/b4b8852c5a48a901c1e94e09feb34743 to your computer and use it in GitHub Desktop.
Crop extraction from tiled TIFF image file directory without whole page loading using tifffile
from tifffile import TiffFile
import numpy as np
def get_crop(page, i0, j0, h, w):
"""Extract a crop from a TIFF image file directory (IFD).
Only the tiles englobing the crop area are loaded and not the whole page.
This is usefull for large Whole slide images that can't fit int RAM.
Parameters
----------
page : TiffPage
TIFF image file directory (IFD) from which the crop must be extracted.
i0, j0: int
Coordinates of the top left corner of the desired crop.
h: int
Desired crop height.
w: int
Desired crop width.
Returns
-------
out : ndarray of shape (imagedepth, h, w, sampleperpixel)
Extracted crop.
"""
if not page.is_tiled:
raise ValueError("Input page must be tiled.")
im_width = page.imagewidth
im_height = page.imagelength
if h < 1 or w < 1:
raise ValueError("h and w must be strictly positive.")
if i0 < 0 or j0 < 0 or i0 + h >= im_height or j0 + w >= im_width:
raise ValueError("Requested crop area is out of image bounds.")
tile_width, tile_height = page.tilewidth, page.tilelength
i1, j1 = i0 + h, j0 + w
tile_i0, tile_j0 = i0 // tile_height, j0 // tile_width
tile_i1, tile_j1 = np.ceil([i1 / tile_height, j1 / tile_width]).astype(int)
tile_per_line = int(np.ceil(im_width / tile_width))
out = np.empty((page.imagedepth,
(tile_i1 - tile_i0) * tile_height,
(tile_j1 - tile_j0) * tile_width,
page.samplesperpixel), dtype=page.dtype)
fh = page.parent.filehandle
jpegtables = page.tags.get('JPEGTables', None)
if jpegtables is not None:
jpegtables = jpegtables.value
for i in range(tile_i0, tile_i1):
for j in range(tile_j0, tile_j1):
index = int(i * tile_per_line + j)
offset = page.dataoffsets[index]
bytecount = page.databytecounts[index]
fh.seek(offset)
data = fh.read(bytecount)
tile, indices, shape = page.decode(data, index, jpegtables,
jpegheader=page.jpegheader)
im_i = (i - tile_i0) * tile_height
im_j = (j - tile_j0) * tile_width
out[:, im_i: im_i + tile_height, im_j: im_j + tile_width, :] = tile
im_i0 = i0 - tile_i0 * tile_height
im_j0 = j0 - tile_j0 * tile_width
return out[:, im_i0: im_i0 + h, im_j0: im_j0 + w, :]
@HowcanoeWang
Copy link

Hi there,
sorry for the late reply
I run your code but still, I am getting an error as below.
ERROR:
Traceback (most recent call last):
File "/home/yuvi/PycharmProjects/task-1/VIdeo_frame_extraction/NDPI-ext/Temp-2-Tifffile.py", line 147, in
Img = get_untiled_crop(page=page.parent.pages[0], i0=0, j0=0, h=1024, w=1024)
File "/home/yuvi/PycharmProjects/task-1/VIdeo_frame_extraction/NDPI-ext/Temp-2-Tifffile.py", line 128, in get_untiled_crop
fh.seek(offset)
File "/home/yuvi/anaconda3/envs/task-1/lib/python3.7/site-packages/tifffile/tifffile.py", line 8305, in seek
self._fh.seek(offset, whence)
AttributeError: 'NoneType' object has no attribute 'seek'

Process finished with exit code 1

Does the page file which I am trying to pass is correct or not?
Img = get_untiled_crop(page=page.parent.pages[0], i0=0, j0=0, h=1024, w=1024)

why you use page.parent.pages[0] instead of simple page.pages[0]

@rfezzani
Copy link
Author

rfezzani commented Dec 5, 2020

why you use page.parent.pages[0] instead of simple page.pages[0]

👍 Good point @HowcanoeWang!

@Yuvi-416
Copy link

Yuvi-416 commented Dec 5, 2020

Hi,
Bcoz of this one,
Screenshot from 2020-12-05 18-12-32

Do you know how can I resample this kind of image?
I have an .NDPI image whose image size is around 186496 × 36608. I want to downsample this image as in the range of ((186496, 36608), (93248, 18304), (46624, 9152), (23312, 4576), (11656, 2288), (5828, 1144), (2914, 572), (1457, 286), (728, 143)). Is it possible?

@HowcanoeWang
Copy link

Hi,
Bcoz of this one,
Screenshot from 2020-12-05 18-12-32

Do you know how can I resample this kind of image?
I have an .NDPI image whose image size is around 186496 × 36608. I want to downsample this image as in the range of ((186496, 36608), (93248, 18304), (46624, 9152), (23312, 4576), (11656, 2288), (5828, 1144), (2914, 572), (1457, 286), (728, 143)). Is it possible?

This is a task like clipping raster geotiff by polygon, maybe other python library (install by conda is recommended) like GDAL, osgeo, fiona have a neat function to do this job, rather than using this basic package.

@adfoucart
Copy link

Hi,

FYI I tried using the method on a NDPI image, and it almost worked out of the box but I got an error on the page.decode in line 69 (Jpeg8Error), because the data didn't contain the JPEG headers.

Changing it to:

tile, indices, shape = page.decode(page.jpegheader+data, index, jpegtables)

and it works well.

In case anyone gets the same problem !

@rfezzani
Copy link
Author

Excellent! Thank you @adfoucart 🙏. Another option is to use the jpegheader argument:

tile, indices, shape = page.decode(data, index, jpegtables, jpegheader=page.jpegheader)

I will add this modification to the gist 😉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment