Image data formats

Prerequisites

Before starting this lesson, you should be familiar with:

Learning Objectives

After completing this lesson, learners should be able to:
  • Open and save various image files formats

  • Understand the difference between image voxel data and metadata

  • Understand that converting between image file formats likely leads to loss of information

Motivation

There are numerous ways how to save image data on disk. Virtually every microscope vendor has their own file format. It is thus very important to understand how to open those files and inspect their content. Moreover, some software will open only specific image file formats and thus it is sometime necessary to re-save the data. During such image file format conversions information can be lost; it is important to be aware of this and avoid such information loss as much as possible.

Concept map

graph TD F("TIFF, JPEG, XML/HDF5, CZI, LIF, ...") F --> PD("Pixel data") PD --> Values PD --> Dimensions F --> MD("Metadata") MD --> IC("Image calibration") MD --> MS("Microscope settings") MD --> DS("Display settings") MD --> NA("...")



Figure


Image pixel data and metadata



Activities

Open CZI image data

Data

Show activity for:  

ImageJ GUI

  • Open the file mentioned in the activity using:
    • [Plugins > Bio-Formats > Bio-Format Importer]
      • Display metadata
      • Display OME-XML Metadata
    • Press [OK]
    • Select both “Series”
    • Look at the images
    • Inspect the metadata

python BioIO

# %% 
# Open a CZI image file
# minimal conda env for this module
# conda create -n ImageFileFormats python=3.10
# activate ImageFileFormat
# pip install bioio bioio-tifffile bioio-lif bioio-czi bioio-ome-tiff bioio-ome-zarr notebook

# TODO
# - Change the below code to only open the CZI image
# - Implement that it opens both images that are contained in the file (see ImageJ GUI activity)

# %%
# Load .tif file with minimal metadata
# - Observe that BioImage chooses the correct reader plugin
# - Observe that the return object is not the image matrix
from bioio import BioImage
image_url = 'https://github.com/NEUBIAS/training-resources/raw/master/image_data/xy_8bit__nuclei_PLK1_control.tif'
bioimage = BioImage(image_url)
print(bioimage)
print(type(bioimage))

# %%
# Print some onject attributes
# - Observe that the object is 5 dimensional with most dimensions being empty
# - Observe that the dimension order is always time, channel, z, y, x, (TCZYX)
print(bioimage.dims)
print(bioimage.shape)
print(f'Dimension order is: {bioimage.dims.order}')
print(type(bioimage.dims.order))
print(f'Size of X dimension is: {bioimage.dims.X}')

# %%
# Extract image data
# - Observe that the returned numpy.array is still 5 dimensional
image_data = bioimage.data
print(type(image_data))
print(image_data)
print(image_data.shape)

# %%
# Extract specific part of image data
# - Observe that numpy.array is reduced to populated dimensions only
yx_image_data = bioimage.get_image_data('YX')
print(type(yx_image_data))
print(yx_image_data)
print(yx_image_data.shape)

# %%
# Access pixel size
import numpy as np
print(bioimage.physical_pixel_sizes)
print(f'An pixel has a length of {np.round(bioimage.physical_pixel_sizes.X,2)} microns in X dimension.')

# %%
# Access general metadata
print(type(bioimage.metadata))
print(bioimage.metadata)

# %%
# Load .tif file with extensive metadata
image_url = "https://github.com/NEUBIAS/training-resources/raw/master/image_data/xy_16bit__collagen.md.tif"
bioimage = BioImage(image_url)
print(bioimage)
print(type(bioimage))

# - Observe that the image is larger than the previous
print(bioimage.dims)

# %%
# Access image and reduce to only populated dimensions
yx_image_data = bioimage.data.squeeze()
print(type(yx_image_data))
print(yx_image_data)
print(yx_image_data.shape)

# %%
# Access pixel size
print(bioimage.physical_pixel_sizes)
print(f'An pixel has a length of {np.round(bioimage.physical_pixel_sizes.Y,2)} microns in Y dimension.')

# Access general metadata
# - Observe that metadata are more extensive than in the previous image
print(type(bioimage.metadata))
print(bioimage.metadata)

# %%
# Load .lif file
# - Observe that BioImage chooses the correct reader plugin
# - Observe that the return object has 4 different channels
# - Observe that the general metadata are an abstract element
image_url = "https://github.com/NEUBIAS/training-resources/raw/master/image_data/xy_xyc__two_images.lif"
bioimage = BioImage(image_url)
print(bioimage)
print(type(bioimage))
print(bioimage.dims)
print(bioimage.metadata)
print(type(bioimage.metadata))

# %%
# Access channel information
print(bioimage.channel_names)

# %%
# Access image data for all channels
img_4channel = bioimage.data.squeeze()

# Alternative
img_4channel = bioimage.get_image_data('CYX')

# - Observe that numpy.array shape is 3 dimensional representing channel,y,x
print(img_4channel.shape)

# Access only one channel
img_1channel = bioimage.get_image_data('YX',C=0)

# Alternative
img_1channel = img_4channel[0]

# - Observe that numpy.array shape is 2 dimensional representing y,x
print(img_1channel.shape)

# %%
# Access different images in one image file (scenes)
# - Observe that one image file can contain several scenes
# - Observe that they can be different in various aspects
print(bioimage.scenes)
print(f'Current scene: {bioimage.current_scene}')

# - Observe that the image in the current scene as 4 channel and Y/X dimensions have the size of 1024
print(bioimage.dims)
print(bioimage.physical_pixel_sizes)

# Switch to second scene
# - Observe that the image in the other scene as only one channel and Y/X dimensions are half as large as the first scene
# - Observe that the pixel sizes are doubled
bioimage.set_scene(1)
print(bioimage.dims)
print(bioimage.physical_pixel_sizes)

# %%
# Load .czi file
# file needs first to be downloaded from https://github.com/NEUBIAS/training-resources/raw/master/image_data/xyz__multiple_images.czi
# save file in the same directory as this notebook
# - Observe that BioImage chooses the correct reader plugin
# - Observe that the return object has a z dimension
bioimage = BioImage('/Users/fschneider/skimage-napari-tutorial/ExampleImages/xyz__multiple_images.czi')
print(bioimage)
print(type(bioimage))

# %%
# little excersise in between
# Access image dimensions
print(bioimage.dims)

# Access general metadata
# - Observe that metadata are abstract
print(bioimage.metadata)
print(type(bioimage.metadata))

# Access pixel size
print(bioimage.physical_pixel_sizes)

# Access image data for all channels
img_3d = bioimage.data.squeeze()

# Alternative
img_3d = bioimage.get_image_data('ZYX')

# - Observe that numpy.array shape is 3 dimensional representing z,y,x
print(img_3d.shape)

# Access only one channel
img_2d = bioimage.get_image_data('YX',Z=0)

# Alternative
img_2d = img_3d[0]

# - Observe that numpy.array shape is 2 dimensional representing y,x
print(img_2d.shape)

# %%
# little excercise:
# paticipants should try to open one of their files with python



Open volume EM TIFF series

Data

Show activity for:  

ImageJ GUI

  • Open the file mentioned in the activity using:
    • [Plugins > Bio-Formats > Bio-Format Importer]
      • Display metadata
      • Display OME-XML Metadata
    • Press [OK]
    • Select both “Series”
    • Look at the images
    • Inspect the metadata

python BioIO

# %% 
# Open a CZI image file
# minimal conda env for this module
# conda create -n ImageFileFormats python=3.10
# activate ImageFileFormat
# pip install bioio bioio-tifffile bioio-lif bioio-czi bioio-ome-tiff bioio-ome-zarr notebook

# TODO
# - Change the below code to only open the CZI image
# - Implement that it opens both images that are contained in the file (see ImageJ GUI activity)

# %%
# Load .tif file with minimal metadata
# - Observe that BioImage chooses the correct reader plugin
# - Observe that the return object is not the image matrix
from bioio import BioImage
image_url = 'https://github.com/NEUBIAS/training-resources/raw/master/image_data/xy_8bit__nuclei_PLK1_control.tif'
bioimage = BioImage(image_url)
print(bioimage)
print(type(bioimage))

# %%
# Print some onject attributes
# - Observe that the object is 5 dimensional with most dimensions being empty
# - Observe that the dimension order is always time, channel, z, y, x, (TCZYX)
print(bioimage.dims)
print(bioimage.shape)
print(f'Dimension order is: {bioimage.dims.order}')
print(type(bioimage.dims.order))
print(f'Size of X dimension is: {bioimage.dims.X}')

# %%
# Extract image data
# - Observe that the returned numpy.array is still 5 dimensional
image_data = bioimage.data
print(type(image_data))
print(image_data)
print(image_data.shape)

# %%
# Extract specific part of image data
# - Observe that numpy.array is reduced to populated dimensions only
yx_image_data = bioimage.get_image_data('YX')
print(type(yx_image_data))
print(yx_image_data)
print(yx_image_data.shape)

# %%
# Access pixel size
import numpy as np
print(bioimage.physical_pixel_sizes)
print(f'An pixel has a length of {np.round(bioimage.physical_pixel_sizes.X,2)} microns in X dimension.')

# %%
# Access general metadata
print(type(bioimage.metadata))
print(bioimage.metadata)

# %%
# Load .tif file with extensive metadata
image_url = "https://github.com/NEUBIAS/training-resources/raw/master/image_data/xy_16bit__collagen.md.tif"
bioimage = BioImage(image_url)
print(bioimage)
print(type(bioimage))

# - Observe that the image is larger than the previous
print(bioimage.dims)

# %%
# Access image and reduce to only populated dimensions
yx_image_data = bioimage.data.squeeze()
print(type(yx_image_data))
print(yx_image_data)
print(yx_image_data.shape)

# %%
# Access pixel size
print(bioimage.physical_pixel_sizes)
print(f'An pixel has a length of {np.round(bioimage.physical_pixel_sizes.Y,2)} microns in Y dimension.')

# Access general metadata
# - Observe that metadata are more extensive than in the previous image
print(type(bioimage.metadata))
print(bioimage.metadata)

# %%
# Load .lif file
# - Observe that BioImage chooses the correct reader plugin
# - Observe that the return object has 4 different channels
# - Observe that the general metadata are an abstract element
image_url = "https://github.com/NEUBIAS/training-resources/raw/master/image_data/xy_xyc__two_images.lif"
bioimage = BioImage(image_url)
print(bioimage)
print(type(bioimage))
print(bioimage.dims)
print(bioimage.metadata)
print(type(bioimage.metadata))

# %%
# Access channel information
print(bioimage.channel_names)

# %%
# Access image data for all channels
img_4channel = bioimage.data.squeeze()

# Alternative
img_4channel = bioimage.get_image_data('CYX')

# - Observe that numpy.array shape is 3 dimensional representing channel,y,x
print(img_4channel.shape)

# Access only one channel
img_1channel = bioimage.get_image_data('YX',C=0)

# Alternative
img_1channel = img_4channel[0]

# - Observe that numpy.array shape is 2 dimensional representing y,x
print(img_1channel.shape)

# %%
# Access different images in one image file (scenes)
# - Observe that one image file can contain several scenes
# - Observe that they can be different in various aspects
print(bioimage.scenes)
print(f'Current scene: {bioimage.current_scene}')

# - Observe that the image in the current scene as 4 channel and Y/X dimensions have the size of 1024
print(bioimage.dims)
print(bioimage.physical_pixel_sizes)

# Switch to second scene
# - Observe that the image in the other scene as only one channel and Y/X dimensions are half as large as the first scene
# - Observe that the pixel sizes are doubled
bioimage.set_scene(1)
print(bioimage.dims)
print(bioimage.physical_pixel_sizes)

# %%
# Load .czi file
# file needs first to be downloaded from https://github.com/NEUBIAS/training-resources/raw/master/image_data/xyz__multiple_images.czi
# save file in the same directory as this notebook
# - Observe that BioImage chooses the correct reader plugin
# - Observe that the return object has a z dimension
bioimage = BioImage('/Users/fschneider/skimage-napari-tutorial/ExampleImages/xyz__multiple_images.czi')
print(bioimage)
print(type(bioimage))

# %%
# little excersise in between
# Access image dimensions
print(bioimage.dims)

# Access general metadata
# - Observe that metadata are abstract
print(bioimage.metadata)
print(type(bioimage.metadata))

# Access pixel size
print(bioimage.physical_pixel_sizes)

# Access image data for all channels
img_3d = bioimage.data.squeeze()

# Alternative
img_3d = bioimage.get_image_data('ZYX')

# - Observe that numpy.array shape is 3 dimensional representing z,y,x
print(img_3d.shape)

# Access only one channel
img_2d = bioimage.get_image_data('YX',Z=0)

# Alternative
img_2d = img_3d[0]

# - Observe that numpy.array shape is 2 dimensional representing y,x
print(img_2d.shape)

# %%
# little excercise:
# paticipants should try to open one of their files with python



Explore various image file formats

Example image data

Show activity for:  

ImageJ GUI

  • Open the files mentioned in the activity:
    • [Plugins > Bio-Formats > Bio-Format Importer].
      • Display metadata
      • Display OME-XML Metadata
        • Should be the same information as above but in XML (sometimes it is more correct than the above)
  • For ICS/IDS and XML/HDF5:
    • The ICS and XML file are the entry points that should be opened (the respective other file will be read automatically).
    • Also inspect the ICS and XML files in a simple text editor.
  • Saving 8 bit single channel image as TIFF:
    • Open xy_8bit__nuclei_PLK1_control.tif
    • [Image > Adjust > Brightness/Contrast] such that cells appear saturated
    • [File > Save As > TIFF…]
      • Open with Fiji
        • LUT metadata has changed, but pixel values and calibration metadata are preserved
      • Open with a web browser
        • It may not open at all
  • Saving 8 bit single channel image as JPEG:
    • Open xy_8bit__nuclei_PLK1_control.tif
    • [Image > Adjust > Brightness/Contrast] such that cells appear saturated
    • [File > Save As > JPEG…]
      • Open with Fiji
        • Pixel values have changed
        • Calibration metadata is gone
      • Open with a web browser
        • It should look the same as when you saved it
  • Saving 16 bit two channel movie as JPEG: xyzct_16bit__mitosis.tif
    • Select a timepoint in the middle of the movie
    • [File > Save As > JPEG…]
      • Open JPEG with Fiji
      • Image dimensions, data type, pixel values, and metadata have changed
  • Saving 8 bit single channel movie as GIF: xyt_8bit__mitocheck_incenp.tif
    • [Image > Adjust > Brightness/Contrast] such that cells appear saturated
    • [File > Save As > GIF…]
      • Open with Fiji
        • Pixel values have changed
      • Open with a web browser
        • Movie plays and looks as when you saved it

python BioIO

# %% 
# Load different image files and access various levels of metadata
# minimal conda env for this module
# conda create -n ImageFileFormats python=3.10
# activate ImageFileFormat
# pip install bioio bioio-tifffile bioio-lif bioio-czi bioio-ome-tiff bioio-ome-zarr notebook

# %%
# Load .tif file with minimal metadata
# - Observe that BioImage chooses the correct reader plugin
# - Observe that the return object is not the image matrix
from bioio import BioImage
image_url = 'https://github.com/NEUBIAS/training-resources/raw/master/image_data/xy_8bit__nuclei_PLK1_control.tif'
bioimage = BioImage(image_url)
print(bioimage)
print(type(bioimage))

# %%
# Print some onject attributes
# - Observe that the object is 5 dimensional with most dimensions being empty
# - Observe that the dimension order is always time, channel, z, y, x, (TCZYX)
print(bioimage.dims)
print(bioimage.shape)
print(f'Dimension order is: {bioimage.dims.order}')
print(type(bioimage.dims.order))
print(f'Size of X dimension is: {bioimage.dims.X}')

# %%
# Extract image data
# - Observe that the returned numpy.array is still 5 dimensional
image_data = bioimage.data
print(type(image_data))
print(image_data)
print(image_data.shape)

# %%
# Extract specific part of image data
# - Observe that numpy.array is reduced to populated dimensions only
yx_image_data = bioimage.get_image_data('YX')
print(type(yx_image_data))
print(yx_image_data)
print(yx_image_data.shape)

# %%
# Access pixel size
import numpy as np
print(bioimage.physical_pixel_sizes)
print(f'An pixel has a length of {np.round(bioimage.physical_pixel_sizes.X,2)} microns in X dimension.')

# %%
# Access general metadata
print(type(bioimage.metadata))
print(bioimage.metadata)

# %%
# Load .tif file with extensive metadata
image_url = "https://github.com/NEUBIAS/training-resources/raw/master/image_data/xy_16bit__collagen.md.tif"
bioimage = BioImage(image_url)
print(bioimage)
print(type(bioimage))

# - Observe that the image is larger than the previous
print(bioimage.dims)

# %%
# Access image and reduce to only populated dimensions
yx_image_data = bioimage.data.squeeze()
print(type(yx_image_data))
print(yx_image_data)
print(yx_image_data.shape)

# %%
# Access pixel size
print(bioimage.physical_pixel_sizes)
print(f'An pixel has a length of {np.round(bioimage.physical_pixel_sizes.Y,2)} microns in Y dimension.')

# Access general metadata
# - Observe that metadata are more extensive than in the previous image
print(type(bioimage.metadata))
print(bioimage.metadata)

# %%
# Load .lif file
# - Observe that BioImage chooses the correct reader plugin
# - Observe that the return object has 4 different channels
# - Observe that the general metadata are an abstract element
image_url = "https://github.com/NEUBIAS/training-resources/raw/master/image_data/xy_xyc__two_images.lif"
bioimage = BioImage(image_url)
print(bioimage)
print(type(bioimage))
print(bioimage.dims)
print(bioimage.metadata)
print(type(bioimage.metadata))

# %%
# Access channel information
print(bioimage.channel_names)

# %%
# Access image data for all channels
img_4channel = bioimage.data.squeeze()

# Alternative
img_4channel = bioimage.get_image_data('CYX')

# - Observe that numpy.array shape is 3 dimensional representing channel,y,x
print(img_4channel.shape)

# Access only one channel
img_1channel = bioimage.get_image_data('YX',C=0)

# Alternative
img_1channel = img_4channel[0]

# - Observe that numpy.array shape is 2 dimensional representing y,x
print(img_1channel.shape)

# %%
# Access different images in one image file (scenes)
# - Observe that one image file can contain several scenes
# - Observe that they can be different in various aspects
print(bioimage.scenes)
print(f'Current scene: {bioimage.current_scene}')

# - Observe that the image in the current scene as 4 channel and Y/X dimensions have the size of 1024
print(bioimage.dims)
print(bioimage.physical_pixel_sizes)

# Switch to second scene
# - Observe that the image in the other scene as only one channel and Y/X dimensions are half as large as the first scene
# - Observe that the pixel sizes are doubled
bioimage.set_scene(1)
print(bioimage.dims)
print(bioimage.physical_pixel_sizes)

# %%
# Load .czi file
# file needs first to be downloaded from https://github.com/NEUBIAS/training-resources/raw/master/image_data/xyz__multiple_images.czi
# save file in the same directory as this notebook
# - Observe that BioImage chooses the correct reader plugin
# - Observe that the return object has a z dimension
bioimage = BioImage('/Users/fschneider/skimage-napari-tutorial/ExampleImages/xyz__multiple_images.czi')
print(bioimage)
print(type(bioimage))

# %%
# little excersise in between
# Access image dimensions
print(bioimage.dims)

# Access general metadata
# - Observe that metadata are abstract
print(bioimage.metadata)
print(type(bioimage.metadata))

# Access pixel size
print(bioimage.physical_pixel_sizes)

# Access image data for all channels
img_3d = bioimage.data.squeeze()

# Alternative
img_3d = bioimage.get_image_data('ZYX')

# - Observe that numpy.array shape is 3 dimensional representing z,y,x
print(img_3d.shape)

# Access only one channel
img_2d = bioimage.get_image_data('YX',Z=0)

# Alternative
img_2d = img_3d[0]

# - Observe that numpy.array shape is 2 dimensional representing y,x
print(img_2d.shape)

# %%
# little excercise:
# paticipants should try to open one of their files with python



Resave images in various file formats

Resaving images in different file formats very often leads to a loss of metadata or distortion of the pixel values. It is critical to be aware of this!

Checks to be done after each resaving
Resave 8 bit single channel image as TIFF
Resave 8 bit single channel image as JPEG
Resave 16 bit two channel movie as JPEG
Resave 8 bit single channel movie as GIF

Show activity for:  

ImageJ GUI

  • Saving 8 bit single channel image as TIFF:
    • Open xy_8bit__nuclei_PLK1_control.tif
    • [Image > Adjust > Brightness/Contrast] such that cells appear saturated
    • [File > Save As > TIFF…]
      • Open with Fiji
        • LUT metadata has changed, but pixel values and calibration metadata are preserved
      • Open with a web browser
        • It may not open at all
  • Saving 8 bit single channel image as JPEG:
    • Open xy_8bit__nuclei_PLK1_control.tif
    • [Image > Adjust > Brightness/Contrast] such that cells appear saturated
    • [File > Save As > JPEG…]
      • Open with Fiji
        • Pixel values have changed
        • Calibration metadata is gone
      • Open with a web browser
        • It should look the same as when you saved it
  • Saving 16 bit two channel movie as JPEG: xyzct_16bit__mitosis.tif
    • Select a timepoint in the middle of the movie
    • [File > Save As > JPEG…]
      • Open JPEG with Fiji
      • Image dimensions, data type, pixel values, and metadata have changed
  • Saving 8 bit single channel movie as GIF: xyt_8bit__mitocheck_incenp.tif
    • [Image > Adjust > Brightness/Contrast] such that cells appear saturated
    • [File > Save As > GIF…]
      • Open with Fiji
        • Pixel values have changed
      • Open with a web browser
        • Movie plays and looks as when you saved it






Assessment

True or false

  1. One could use Excel’s XLSX file format for saving image data.

Solution

  1. One could use Excel’s XLSX file format for saving image data. True, the matrix of each sheet could represent one image plane and one could use the first sheet to store metadata and the mapping of each sheet (image plane) to the zct coordinates, e.g. sheet 12 c 2 z 3 t 1.

Discuss

  1. What are the pros and cons of converting an image into another format?
  2. What are the pros and cons of splitting metadata and image pixel data into separate files?
  3. Do you know any good file formats for image metadata?

Solution

  1. (A) Sometimes it is necessary to convert to another format to be able to open the image in a specific software. (B) Converting an image to another format typically loose information, e.g. because the file format that you are saving to cannot represent all the metadata of the original image file. Thus, it is in general recommened to keep to original image file. (C) Converting to a file format with good compression may save you considerable disk space.
  2. (A) Metadata typically is much smaller than the pixel data. Thus, it can be a good idea to keep metadata in a separate file that can be readily inspected (inspecting the potentially TB sized pixel data files can be tricky). (B) The best file formats for metadata and pixel data can be very different due to the nature of the data, thus splitting can make sense. (C) Having separate files always bares the risk that you loose one of them, e.g. you may forget to copy both to a new folder.
  3. TXT, XML, and JSON are good formats for image metadata, because they are human readable standard formats that can be openend with any text editor.

Explanations




Follow-up material

Recommended follow-up modules:

Learn more: