Big image data formats

Prerequisites

Before starting this lesson, you should be familiar with:

Learning Objectives

After completing this lesson, learners should be able to:
  • Understand the concepts of lazy-loading, chunking and scale pyramids

  • Understand some file formats that implement chunking and scale pyramids

Motivation

Modern microscopy frequently generates image data in the GB-TB range. Such data cannot be naively opened. First, the data may not fit into the working memory (RAM) of your computer. Second, it would take a lot of time to load the data into the memory. Thus, it is important to know about dedicated concepts and implemenations that enable swift interaction with such big image data.

Concept map

graph TD BIG("Big image data") --- RP("Resolution pyramids") BIG --- C("Chunking") C --- LL("Lazy loading")



Figure


Big image data formats typically support flexible chunking of data and resolution pyramids. Chunking enables efficient loading of image subregions. Resolution pyramids prevent loading useless details when being zoomed out.



Similarities of big microscopy data with Google maps

We can think of the data in Google maps as one very big 2D image. Loading all the data in Google maps into your phone or computer is not possible, because it would take to long and your device would run out of memory.

Another important aspect is that if you are currently looking at a whole country, it is not useful to load very detailed data about individual houses in one city, because the monitor of your device would not have enough pixels to display this information.

Thus, to offer you a smooth browsing experience, Google Maps lazy loads only the part of the world (chunk) that you currently look at, at an resolution level that is approriate for the number of pixels of your phone or computer monitor.

Chunking

The efficiency with which parts (chunks) of image data can be loaded from your hard disk into your computer memory depends on how the image data is layed out (chunked) on the hard disk. This is a longer, very technical, discussion and what is most optimal probably also depends on the exact storage medium that you are using. Essentially, you want to have the size of your chunks small enough such that your hardware can load one chunk very fast, but you also want the chunks big enough in order to minimise the number of chunks that you need to load. The reason for the latter is that for each chunk your software has to tell your computer “please go and load this chunk”, which in itself takes time, even if the chunk is very small. Thus, big image data formats typically offer you to choose the chunking such that you can optimise it for your hardware and access patterns.

Resolution pyramids

TODO




Activities

Lazy load from a multi-plane TIFF file


Show activity for:  

ImageJ GUI

  • Check the image file size on disk
  • Compare this to your computer’s memory
  • Open Fiji
  • Use [ Edit > Options > Memory & Threads… ] to see how much memory is accessible to Fiji
  • Use [ Plugins > Utilities > Monitor Memory… ] to monitor how much memory is currently used
  • Use [ File > Open ] to open the entire TIFF stack
    • Observe that this takes time and that the memory fills up
  • Close the image and observe whether memory is freed
  • Use [ Plugins > Utilities > Collect Garbage ] to enforce freeing the memory
  • Use [ Plugins > Bio-Formats > Bio-Formats Importer ] to lazy open the TIFF stack
    • Open virtual (<= this is key!)
    • Observe that initial opening is faster and your memory is not filling up as much
  • Move up and down along the z-axis
    • Observe that this is a bit slow because it needs to fetch the data
    • Observe that your memory fills up while you move
  • Use [ Image > Stacks > Orthogonal Views ] to look at the data from the side
    • Observe that now it needs to load all data

Lazy load into BigDataViewer (BDV)

  • Close all images
  • Use [ Plugins > Utilities > Collect Garbage ] to free all memory
  • Make sure to still monitor the memory, using [ Plugins > Utilities > Monitor Memory… ]
  • Again, use [ Plugins > Bio-Formats > Bio-Formats Importer ] to lazy open the TIFF stack
    • Open virtual
  • Now, use [ Plugins > BigDataViewer > Open Current Image ] to view the TIFF stack in BDV
  • In BDV, use the down arrow key to zoom out (this is necessary for the following )
  • In BDV, use [ Shift + Y ] to view a XZ plane of the image
    • Observe that you immediately see something and how the planes are lazy loaded
    • Observe that not all planes are loaded, but just as many as needed for the current number of pixels of the viewer window and the current zoom level
    • Use the up arrow keys to zoom in and observe how additional data is being loaded

Key points

  • “Bio-Formats Importer” with the “Open virtual” option allows you to lazy load image data into Fiji
  • “Bio-Formats Importer” only supports plane-wise lazy loading from a single resolution level
  • TIFF stacks are internally plane-wise chunked

python bioio

# %% 
# Open a tif image file
# minimal conda env for this module
# conda create -n ImageFileFormats python=3.10
# activate ImageFileFormat
# pip install bioio bioio-tifffile bioio-lif bioio-czi bioio-ome-tiff bioio-ome-zarr notebook
# Note: for only dealing with .tif just do pip install bioio bioio-tifffile


# %%
# Load .tif file
# - Observe that BioImage chooses the correct reader plugin
from bioio import BioImage
image_url = 'https://github.com/NEUBIAS/training-resources/raw/master/image_data/xy_8bit__nuclei_PLK1_control.tif'
bioimage = BioImage(image_url)
print(bioimage)
print(type(bioimage))

# %%
# Inspect dimension and shape of image
print(f'Image dimension: {bioimage.dims}')
print(f'Dimension order is: {bioimage.dims.order}')
print(f'Image shape: {bioimage.shape}')

# %%
# Extract image data (5D)
image_data = bioimage.data
print(f'Image type: {type(image_data)}')
print(f'Image array shape: {image_data.shape}')
# Extract specific image part
image_data = bioimage.get_image_data('YX')
print(f'Image type: {type(image_data)}')
print(f'Image array shape: {image_data.shape}')

# %%
# Read pixel size
print(f'Pixel size: {bioimage.physical_pixel_sizes}')
# Read metadata
print(f'Metadata type: {type(bioimage.metadata)}')
print(f'Metadata: {bioimage.metadata}')

# %%
# Load .tif file with extensive metadata
image_url = "https://github.com/NEUBIAS/training-resources/raw/master/image_data/xy_16bit__collagen.md.tif"
bioimage = BioImage(image_url)

# %%
# Read pixel size
print(f'Pixel size: {bioimage.physical_pixel_sizes}')
# Read metadata
print(f'Metadata type: {type(bioimage.metadata)}')
print(f'Metadata: {bioimage.metadata}')

#%%
# Load a large tif lazily
from pathlib import Path
image_path = Path().cwd()/'xyz_uint8__em_platy_raw_s4.tif'
bioimage = BioImage(image_path)
# lazy load
bioimage_data = bioimage.dask_data
print(bioimage_data)

#%%
# load specific image plane
bioimage_data = bioimage.dask_data[:,:,:,10,:].compute()



Lazy load from an BDV HDF5 image


Show activity for:  

h5ls

  • Inspect the XML file using your web browser, e.g.
    • Chrome: [ File > Open File… ]
    • Observe the relevant metadata such as
      • Image dimensions
      • Voxel size
  • Inspect the HDF5 file
    • Install the HDF5 command line tools, e.g. conda create -n hdf5 python=3.9 hdf5
    • h5ls xyz_uint8__em_platy_raw_s4.h5
      • Observe that the h5 file is like a file directory
    • Explore the content, e.g.,
      • h5ls -v -d xyz_uint8__em_platy_raw_s4.h5/s00/resolutions
      • h5ls -v -d xyz_uint8__em_platy_raw_s4.h5/s00/subdivisions
      • h5ls -v xyz_uint8__em_platy_raw_s4.h5/t00000/s00/0

ImageJ GUI Bio-Formats

  • Open Fiji
  • Use [ Plugins > Utilities > Monitor Memory… ] to keep an eye on the memory
  • Drag and drop the BDV XML file on the Fiji menu bar
  • The Bio-Formats plugin should automatically pop up
  • Choose one or multiple resolution layers to be opened

ImageJ GUI BDV

  • Open Fiji
  • Use [ Plugins > Utilities > Monitor Memory… ] to keep an eye on the memory
  • Use [ Plugins › BigDataViewer › Open XML/HDF5 ] to open the image
  • Use Shift X, Y, Z to view orthogonal planes
  • Use the mouse wheel to move along the current axis
  • Use the arrow keys to zoom
  • Observe that memory is released from time to time
  • Observe that browsing the data is very smooth due to lazy loading the chunks

python bioio

# %% 
# Open a CZI image file
# minimal conda env for this module
# conda create -n ImageFileFormats python=3.10
# activate ImageFileFormat
# pip install bioio bioio-tifffile bioio-lif bioio-czi bioio-ome-tiff bioio-ome-zarr notebook
# Note: for only dealing with .czi just do pip install bioio bioio-czi


# %%
# Load BDV file
# - Observe that BioImage chooses the correct reader plugin
from bioio import BioImage
from pathlib import Path
bioimage = BioImage(Path().cwd()/'xyz_uint8__em_platy_raw_s4.xml')
print(bioimage)
print(type(bioimage))

# %%
# load whole data
image_data = bioimage.data

# %%
# lazy load data
image_data = bioimage.dask_data
print(image_data)


#%%
# load specific image plane
bioimage_data = bioimage.dask_data[:,:,:,10,:].compute()






Assessment

Fill in the blanks

  1. Opening data piece-wise on demand is also called ___ .
  2. Storing data piece-wise is also called ___ .
  3. In order to enable fast inspection of spatial data at different scales (like on Google maps) one can use ___ .

Solution

  1. lazy-loading
  2. chunking
  3. resolution pyramids




Follow-up material

Recommended follow-up modules:

Learn more: