-
-
Save rfezzani/b4b8852c5a48a901c1e94e09feb34743 to your computer and use it in GitHub Desktop.
from tifffile import TiffFile | |
import numpy as np | |
def get_crop(page, i0, j0, h, w): | |
"""Extract a crop from a TIFF image file directory (IFD). | |
Only the tiles englobing the crop area are loaded and not the whole page. | |
This is usefull for large Whole slide images that can't fit int RAM. | |
Parameters | |
---------- | |
page : TiffPage | |
TIFF image file directory (IFD) from which the crop must be extracted. | |
i0, j0: int | |
Coordinates of the top left corner of the desired crop. | |
h: int | |
Desired crop height. | |
w: int | |
Desired crop width. | |
Returns | |
------- | |
out : ndarray of shape (imagedepth, h, w, sampleperpixel) | |
Extracted crop. | |
""" | |
if not page.is_tiled: | |
raise ValueError("Input page must be tiled.") | |
im_width = page.imagewidth | |
im_height = page.imagelength | |
if h < 1 or w < 1: | |
raise ValueError("h and w must be strictly positive.") | |
if i0 < 0 or j0 < 0 or i0 + h >= im_height or j0 + w >= im_width: | |
raise ValueError("Requested crop area is out of image bounds.") | |
tile_width, tile_height = page.tilewidth, page.tilelength | |
i1, j1 = i0 + h, j0 + w | |
tile_i0, tile_j0 = i0 // tile_height, j0 // tile_width | |
tile_i1, tile_j1 = np.ceil([i1 / tile_height, j1 / tile_width]).astype(int) | |
tile_per_line = int(np.ceil(im_width / tile_width)) | |
out = np.empty((page.imagedepth, | |
(tile_i1 - tile_i0) * tile_height, | |
(tile_j1 - tile_j0) * tile_width, | |
page.samplesperpixel), dtype=page.dtype) | |
fh = page.parent.filehandle | |
jpegtables = page.tags.get('JPEGTables', None) | |
if jpegtables is not None: | |
jpegtables = jpegtables.value | |
for i in range(tile_i0, tile_i1): | |
for j in range(tile_j0, tile_j1): | |
index = int(i * tile_per_line + j) | |
offset = page.dataoffsets[index] | |
bytecount = page.databytecounts[index] | |
fh.seek(offset) | |
data = fh.read(bytecount) | |
tile, indices, shape = page.decode(data, index, jpegtables, | |
jpegheader=page.jpegheader) | |
im_i = (i - tile_i0) * tile_height | |
im_j = (j - tile_j0) * tile_width | |
out[:, im_i: im_i + tile_height, im_j: im_j + tile_width, :] = tile | |
im_i0 = i0 - tile_i0 * tile_height | |
im_j0 = j0 - tile_j0 * tile_width | |
return out[:, im_i0: im_i0 + h, im_j0: im_j0 + w, :] |
Thanks for this function and works great for my big geotiff file.
However, for some smaller geotiff file, their is_tiled value is false, can we do the similar thing without loading full image to memeory?
Here is my code to specify this function, for those images without tiled, it reads one row once.
def get_untiled_crop(page, i0, j0, h, w):
if page.is_tiled:
raise ValueError("Input page must not be tiled")
im_width = page.imagewidth
im_height = page.imagelength
if h < 1 or w < 1:
raise ValueError("h and w must be strictly positive.")
i1, j1 = i0 + h, j0 + w
if i0 < 0 or j0 < 0 or i1 >= im_height or j1 >= im_width:
raise ValueError(f"Requested crop area is out of image bounds.{i0}_{i1}_{im_height}, {j0}_{j1}_{im_width}")
out = np.empty((page.imagedepth, h, w, page.samplesperpixel), dtype=np.uint8)
fh = page.parent.filehandle
for index in range(i0, i1):
offset = page.dataoffsets[index]
bytecount = page.databytecounts[index]
fh.seek(offset)
data = fh.read(bytecount)
tile, indices, shape = page.decode(data, index)
out[:,index-i0,:,:] = tile[:,:,j0:j1,:]
return out
2022.07.22 Updates:
I met with an untiled geotiff that read 2 rows per time (not I assumed one row per time), and I changed the code as follows:
UTokyo-FieldPhenomics-Lab/EasyIDP@f02cd19
The same function name: get_untiled_crop
(Line 285)
Thanks for this function and works great for my big geotiff file.
Thank you @HowcanoeWang for your feedback, I am glad that this piece of code was useful for you 😉.
I will investigate on your problem ASAP.
@HowcanoeWang, I think you need to play with the RowsPerStrip
, StripOffsets
and StripByteCounts
tags. See the Strips section in this documentation for example.
@HowcanoeWang, I think you need to play with the
RowsPerStrip
,StripOffsets
andStripByteCounts
tags. See the Strips section in this documentation for example.
Thank you for your guide, apologize I am a green hand of geotiff, digging the documentation to optimize my previous code may take a long time for me. But I have tested my previous code def get_untiled_crop
, it does get the same result on the untiled geotiff, may you help me test how it works on your images? And quite glad if you could optimize it by the tags you mentioned before.
I am really not a TIFF expert 😄. I just published this gist to help people that may face the same problem as me.
You may be can find more help for your specific problem on image.sc forum. BTW, I will try to investigate, but without any guaranty of success 😉
Hi there,
I try to run your code but I am getting an error as:
I am passing this command:
image = get_crop(page=page, i0=0, j0=1024, h=512, w=512)
raise ValueError("Input page must be tiled.")
ValueError: Input page must be tiled.
I can see in the above picture that is_tiled is false.
Does it create an error?
Thank you
Hi there,
I try to run your code but I am getting an error as:
I am passing this command:
image = get_crop(page=page, i0=0, j0=1024, h=512, w=512)raise ValueError("Input page must be tiled.")
ValueError: Input page must be tiled.I can see in the above picture that is_tiled is false.
Does it create an error?
Thank you
please check whether my previous get_untiled_crop
function helps or not
Hi there,
sorry for the late reply
I run your code but still, I am getting an error as below.
ERROR:
Traceback (most recent call last):
File "/home/yuvi/PycharmProjects/task-1/VIdeo_frame_extraction/NDPI-ext/Temp-2-Tifffile.py", line 147, in
Img = get_untiled_crop(page=page.parent.pages[0], i0=0, j0=0, h=1024, w=1024)
File "/home/yuvi/PycharmProjects/task-1/VIdeo_frame_extraction/NDPI-ext/Temp-2-Tifffile.py", line 128, in get_untiled_crop
fh.seek(offset)
File "/home/yuvi/anaconda3/envs/task-1/lib/python3.7/site-packages/tifffile/tifffile.py", line 8305, in seek
self._fh.seek(offset, whence)
AttributeError: 'NoneType' object has no attribute 'seek'
Process finished with exit code 1
Does the page file which I am trying to pass is correct or not?
Img = get_untiled_crop(page=page.parent.pages[0], i0=0, j0=0, h=1024, w=1024)
Hi there,
sorry for the late reply
I run your code but still, I am getting an error as below.
ERROR:
Traceback (most recent call last):
File "/home/yuvi/PycharmProjects/task-1/VIdeo_frame_extraction/NDPI-ext/Temp-2-Tifffile.py", line 147, in
Img = get_untiled_crop(page=page.parent.pages[0], i0=0, j0=0, h=1024, w=1024)
File "/home/yuvi/PycharmProjects/task-1/VIdeo_frame_extraction/NDPI-ext/Temp-2-Tifffile.py", line 128, in get_untiled_crop
fh.seek(offset)
File "/home/yuvi/anaconda3/envs/task-1/lib/python3.7/site-packages/tifffile/tifffile.py", line 8305, in seek
self._fh.seek(offset, whence)
AttributeError: 'NoneType' object has no attribute 'seek'Process finished with exit code 1
Does the page file which I am trying to pass is correct or not?
Img = get_untiled_crop(page=page.parent.pages[0], i0=0, j0=0, h=1024, w=1024)
why you use page.parent.pages[0]
instead of simple page.pages[0]
why you use page.parent.pages[0] instead of simple page.pages[0]
👍 Good point @HowcanoeWang!
Do you know how can I resample this kind of image?
I have an .NDPI image whose image size is around 186496 × 36608. I want to downsample this image as in the range of ((186496, 36608), (93248, 18304), (46624, 9152), (23312, 4576), (11656, 2288), (5828, 1144), (2914, 572), (1457, 286), (728, 143)). Is it possible?
Do you know how can I resample this kind of image?
I have an .NDPI image whose image size is around 186496 × 36608. I want to downsample this image as in the range of ((186496, 36608), (93248, 18304), (46624, 9152), (23312, 4576), (11656, 2288), (5828, 1144), (2914, 572), (1457, 286), (728, 143)). Is it possible?
This is a task like clipping raster geotiff by polygon
, maybe other python library (install by conda is recommended) like GDAL, osgeo, fiona have a neat function to do this job, rather than using this basic package.
Hi,
FYI I tried using the method on a NDPI image, and it almost worked out of the box but I got an error on the page.decode
in line 69 (Jpeg8Error), because the data didn't contain the JPEG headers.
Changing it to:
tile, indices, shape = page.decode(page.jpegheader+data, index, jpegtables)
and it works well.
In case anyone gets the same problem !
Excellent! Thank you @adfoucart 🙏. Another option is to use the jpegheader
argument:
tile, indices, shape = page.decode(data, index, jpegtables, jpegheader=page.jpegheader)
I will add this modification to the gist 😉
Thanks for this function and works great for my big geotiff file.
However, for some smaller geotiff file, their is_tiled value is false, can we do the similar thing without loading full image to memeory?