OME-Zarr
Prerequisites
Before starting this lesson, you should be familiar with:
Learning Objectives
After completing this lesson, learners should be able to:
Understand the OME-Zarr image file format
Render cloud (S3 object store) hosted OME-Zarr image data
Access the pixel values of cloud hosted OME-Zarr image data
Apply basic image processing on cloud hosted OME-Zarr image data
Motivation
Storing TB-sized image data locally and in multiple copies is either not possible or inefficient. Cloud storage enables efficient concurrent access to the same image data by multiple clients (scientists). OME-Zarr is the emerging community standard image file format for cloud (S3 object store) compatible image data storage. Thus it is important to know how to access S3 hosted OME-Zarr in various image analysis and visualisation platforms.
Concept map
Figure
Activities
Open OME-Zarr
Open OME-Zarr data using various tools.
Show activity for:
Inspect OME-Zarr datasets using minio-client
Connect to the s3 bucket using credentials:
mc alias set s3 https://s3.embl.de T0XMlxMdq8C6rSxurrdqMqHNrhyhC4f0 dRFXoR852egFtp3lC9NJPYjpPaCBNRa8
Check out what we have at our s3 bucket:
mc tree -d 2 s3/ome-zarr-course/
mc ls s3/ome-zarr-course/data/MFF/
mc ls s3/ome-zarr-course/data/JPEG/
mc ls s3/ome-zarr-course/data/ZARR/common/
Check out the multiscales metadata for one of the OME-Zarr datasets we created:
mc cat s3/ome-zarr-course/data/ZARR/common/13457537T.zarr/.zattrs
Check out the array metadata for the highest resolution array:
mc cat s3/ome-zarr-course/data/ZARR/common/13457537T.zarr/0/.zarray
Configure mc for anonymous access to public s3 buckets:
mc alias set s3pub https://s3.embl.de
Have a look at the metadata for a big OME-Zarr data:
mc cat s3pub/i2k-2020/platy-raw.ome.zarr/.zattrs
mc cat s3pub/i2k-2020/platy-raw.ome.zarr/s0/.zarray
Inspect OME-Zarr from IDR using minio-client
Create a project directory and cd into that
mkdir ~/ome_zarr_course && cd ~/ome_zarr_course mkdir data
Connect to the EBI server:
mc alias set uk1s3 https://uk1s3.embassy.ebi.ac.uk
No need to provide access and secret keys for this public resource. When requested to supply credentials, simply click
enter
.Check out the contents of the IDR bucket dedicated to OME-Zarr data:
mc tree -d 1 uk1s3/idr/
mc ls uk1s3/idr/zarr/v0.4/idr0062A/6001240.zarr
Check out the multiscales metadata for an example OME-Zarr dataset:
mc cat uk1s3/idr/zarr/v0.4/idr0062A/6001240.zarr/.zattrs
Check out the array metadata for the highest resolution array:
mc cat uk1s3/idr/zarr/v0.4/idr0062A/6001240.zarr/0/.zarray
Download the example data for local use:
mc mirror uk1s3/idr/zarr/v0.4/idr0062A/6001240.zarr ~/ome_zarr_course/data/zarr/6001240.zarr
Inspect OME-Zarr datasets using ome_zarr_py client
Use the ome_zarr tool for the inspection:
ome_zarr info https://s3.embl.de/ome-zarr-course/data/ZARR/common/13457537T.zarr
ome_zarr info https://s3.embl.de/i2k-2020/platy-raw.ome.zarr
Inspect and validate OME-Zarr in Python using ome-zarr-validator
Open the OME-Zarr validator with local data using ome-zarr-py from the command line:
ome_zarr view ~/ome_zarr_course/data/zarr/6001240.zarr
The validator will open in a web browser and demonstrate various metadata fields of the OME-Zarr dataset.
- Find out the metadata fields such as axes, units and scales.
- Check the array and chunk shapes and bytes per resolution level.
- Visualize a single chunk.
Now do the same but with remote data:
ome_zarr view https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.4/idr0062A/6001240.zarr
Note that with the remote url it is possible to copy the link from your browser and share it with your colleagues.
Take ome-zarr-py out of the loop and use the web browser directly:
Enter the following into your browser:
https://ome.github.io/ome-ngff-validator/?source=
Then paste the dataset url after the ‘equal’ sign, constructing the following link:
Open local OME-Zarr in Python using zarr-python
Import the necessary tools:
import zarr, os, pprint import numpy as np
Open a local OME-Zarr using
zarr.open_group
path = f"{os.path.expanduser('~')}/ome_zarr_course/data/zarr/6001240.zarr" dataset = zarr.open_group(store, mode = 'r') print(f"Type of the dataset: {type(dataset)}")
Summarize group-level metadata:
dataset.info
Note the store type, the number of arrays and groups.
Note also the group named ‘labels’.Print the full metadata:
pprint.pprint(dict(dataset.attrs))
Get multiscales metadata:
meta = dict(dataset.attrs['multiscales'][0])
Print the axis ordering and the units
pprint.pprint(meta['axes']) axis_order = ''.join(item['name'] for item in meta['axes']) print(f"Axis order is {axis_order}")
Print the voxel scaling for each resolution level
for idx, transform in enumerate(meta['datasets']): print(f"\033[1mVoxel transform for the level {idx}:\033[0m") pprint.pprint(transform)
Get the top resolution array:
zarr_array0 = dataset[0] print(f"Array type: {type(zarr_array0)}") print(f"Shape of the top-level array: {zarr_array0.shape}")
Get a downscaled array:
zarr_array1 = dataset[1] print(f"Array type: {type(zarr_array1)}") print(f"Shape of the first-level downscaled array: {zarr_array1.shape}")
Summarize array-level metadata:
zarr_array0.info zarr_array1.info
Print chunk size for the top layer:
print(f"Chunk size: {zarr_array0.chunks}")
Convert the zarr array to a numpy array:
numpy_array0 = zarr_array0[:] print(f"Array type: {type(numpy_array0)}") # or use numpy directly numpy_array0 = np.array(zarr_array0) print(f"Array type: {type(numpy_array0)}")
Open remote OME-Zarr in Python using zarr-python
Import the necessary tools:
import zarr, s3fs, os, pprint import numpy as np
Open a remote OME-Zarr using
zarr.open_group
url = "https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.4/idr0062A/6001240.zarr" dataset = zarr.open_group(url, mode = 'r') print(f"Type of the dataset: {type(dataset)}")
Summarize group-level metadata:
dataset.info
Note the store type, the number of arrays and groups.
Note also the group named ‘labels’.Print the full metadata:
pprint.pprint(dict(dataset.attrs))
Get multiscales metadata:
meta = dict(dataset.attrs['multiscales'][0])
Print the axis ordering and the units
pprint.pprint(meta['axes']) axis_order = ''.join(item['name'] for item in meta['axes']) print(f"Axis order is {axis_order}")
Print the voxel scaling for each resolution level
for idx, transform in enumerate(meta['datasets']): print(f"\033[1mVoxel transform for the level {idx}:\033[0m") pprint.pprint(transform)
Get the top resolution array:
zarr_array0 = dataset[0] print(f"Array type: {type(zarr_array0)}") print(f"Shape of the top-level array: {zarr_array0.shape}")
Get a downscaled array:
zarr_array1 = dataset[1] print(f"Array type: {type(zarr_array1)}") print(f"Shape of the first-level downscaled array: {zarr_array1.shape}")
Summarize array-level metadata:
zarr_array0.info zarr_array1.info
Print chunk size for the top layer:
print(f"Chunk size: {zarr_array0.chunks}")
Convert the zarr array to a numpy array:
numpy_array0 = zarr_array0[:] print(f"Array type: {type(numpy_array0)}") # or use numpy directly numpy_array0 = np.array(zarr_array0) print(f"Array type: {type(numpy_array0)}")
Open OME-Zarr in Python using ome-zarr-py
Import the relevant tools:
import ome_zarr, zarr, pprint, os from ome_zarr.reader import Reader from ome_zarr.io import parse_url
Read remote OME-Zarr:
# local_path = f"{os.path.expanduser('~')}/ome_zarr_course/data/zarr/6001240.zarr" remote_path = f"https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.4/idr0062A/6001240.zarr" reader = Reader(parse_url(remote_path)) # Note here that 'parse_url' can parse both remote and local urls. # No need for explicit use of s3fs.
Note that ome-zarr-py uses the term ‘node’ for different zarr groups
and reads them in a flat list.Print the node information per resolution level:
nodes = list(reader()) for idx, node in enumerate(nodes): print(f"The node at the level {idx} is {node}")
Get the data and metadata of the top-level node:
dataset = nodes[0].data meta = nodes[0].metadata
Check the ‘data’ instance to examine the array shape and the chunks for each resolution layer:
for idx, array in enumerate(dataset): print(f"The array {idx} is a {type(array)} and has shape {array.shape} and has chunks with shape {array.chunksize}")
Print the axis types and units of the arrays using the metadata instance
print(f"Axis properties of the dataset:") pprint.pprint(meta['axes'])
Print the voxel sizes per resolution level (and any other voxel transforms that may exist)
for idx, transforms in enumerate(meta['coordinateTransformations']): print(f"\033[1mThe transform metadata for the level {idx}:\033[0m") print(f"{transforms}")
Open OME-Zarr in Fiji using MoBIE
- Open Fiji with MoBIE
- [ Plugins > MoBIE > Open > Open OME ZARR… ]
Image URI
:https://s3.embl.de/i2k-2020/platy-raw.ome.zarr
( Labels URI )
:https://s3.embl.de/i2k-2020/platy-raw.ome.zarr/labels/cells
( Labels Table URI )
: TODO
Open OME-Zarr in Fiji using n5-ij
Open a remote OME-Zarr in Fiji
- Open the n5-ij in Fiji via:
[ File > Import > HDF5/N5/Zarr/OME-NGFF ... ]
- In the window that opens, paste the following path in the uri space:
https://s3.embl.de/ome-zarr-course/data/commons/xyz_8bit_calibrated__fib_sem_crop.ome.zarr
Then click
Detect datasets
button as shown below:The tool will display a multiscales schema with two datasets in the dialog box. Select one of the datasets as shown below and click OK:
- This will open the dataset in Fiji as a normal Fiji image (see below).
Open a subset of a remote OME-Zarr in Fiji
Follow the same steps above do select a dataset but instead of directly opening the dataset, click the crop button in the window before clicking OK as shown below:
In the window that open, select the indices of the subset as shown below:
When you click OK, the specified subset of the image will be opened as shown below:
Open OME-Zarr in Fiji using n5-viewer
Open a remote OME-Zarr in BigDataViewer
Now let’s imagine the dataset you want to open is too large to fit the RAM of your machine.
- Open the n5-viewer in Fiji via:
[ Plugins > BigDataViewer > HDF5/N5/Zarr/OME-NGFF Viewer ]
- In the window that opens, paste the following path in the uri space:
https://s3.embl.de/i2k-2020/platy-raw.ome.zarr
Then click
Detect datasets
button as shown below:
The tool will display a multiscales schema with 9 datasets in the dialog box. In this case, one can either open the individual datasets or the entire pyramid. To do the latter, click on the multiscale object and then click OK as shown below:
This will open the multiscales object in BDV as shown below:
This is a huge (terabyte-scale) image, which is not amenable to processing as a whole in Fiji. It is possible, however, to extract subsets of it to Fiji and continue with processing. To do so, follow the steps below:
- In the BDV window, open the cropping window via:
[ Tools > Extract to ImageJ ]
(also see below)
- In the cropping window that opens, select the indices of the subset as shown below:
Note that this step may require incremental rotation of the image and adjustment of the bounding box until the desired region of interest is obtained. It is also important to check the size of the cropped volume at the top of the cropping window to make sure that it is not larger than the memory. Once you are fine with the settings, click OK.- The output is a standard Fiji image as shown below:
Note that this image has been loaded into the RAM; as such, it can be processed like any other Fiji image and saved to any desired file format.
Open OME-Zarr in napari
Visualise the remote data using Napari together with the napari-ome-zarr plugin.
napari --plugin napari-ome-zarr https://s3.embl.de/ome-zarr-course/data/ZARR/$USER/xyzct_8bit__mitosis.ome.zarr
napari --plugin napari-ome-zarr https://s3.embl.de/ome-zarr-course/data/ZARR/$USER/xyz_8bit_calibrated__fib_sem_crop.ome.zarr
Optional: visualise local OME-Zarr data in the same way:
napari --plugin napari-ome-zarr ~/data/ZARR/xyzct_8bit__mitosis.ome.zarr
Optional: visualise big remote OME-Zarr data:
napari --plugin napari-ome-zarr https://s3.embl.de/i2k-2020/platy-raw.ome.zarr
Note that compared to BigDataViewer, there are more delays with Napari.
Open OME-Zarr in vizarr
Open Google Chrome on BAND (for some reason vizarr does not work with Firefox on BAND). Google Chrome can be found under the Applications menu at the top left corner of the screen:
[Applications > internet > Google Chrome]
- To visualise a self-created OME-Zarr via vizarr, replace the $USER in the following link with your user name, copy-paste the link into the Google Chrome’s search bar and press enter:
https://hms-dbmi.github.io/vizarr/?source=https://s3.embl.de/ome-zarr-course/data/ZARR/$USER/xyzct_8bit__mitosis.ome.zarr
- Note: you can find your user name by entering
echo $USER
in the BAND terminal.- Optional: visualise the following in the same way:
- 3D EM data: https://hms-dbmi.github.io/vizarr/?source=https://s3.embl.de/ome-zarr-course/data/ZARR/$USER/xyz_8bit_calibrated__fib_sem_crop.ome.zarr
- A well from an HCS plate: https://hms-dbmi.github.io/vizarr/?source=https://s3.embl.de/eosc-future/EUOS/testdata.zarr/A/1
Save OME-Zarr
Save data to OME-Zarr
Show activity for:
Save OME-Zarr in Python using ome-zarr-py
Import the relevant tools:
import zarr, os import numcodecs from ome_zarr import writer, scale from ome_zarr.io import parse_url from skimage.data import astronaut
Create fake data:
data = astronaut().transpose()
Create a zarr store to write:
For the sake of simplicity, here we demonstrate how to write to a local store. It is also possible to write to a remote location by simply specifying a remote url as input to the
parse_url
function.# Specify the path where you want to write output_path = f"{os.path.expanduser('~')}/ome_zarr_course/data/zarr/outputs/astronaut.zarr" # Parse the url as a zarr store. Note that "mode = 'w'" enables writing to this store. store = parse_url(output_path, mode = 'w').store root = zarr.open_group(store)
Specify a scaler:
In order to create an image pyramid, one has to instantiate a scaler. This scaler requires the parameters: scale factor, number of resolution layers and downscaling method.
scaler = scale.Scaler(downscale=2, # Downscaling factor fox x and y axes max_layer=4, # Number of downscalings method = 'nearest' # downscaling method )
Specify the axis identities and the corresponding units:
This dictionary will impose the axis order and the units corresponding to each axis.
axes = [ dict(name = 'c', type = 'channel'), dict(name = 'y', type = 'space', unit = 'micrometer'), dict(name = 'x', type = 'space', unit = 'micrometer'), ]
Specify the voxel sizes for each resolution level:
This is a list of list, where the length of the outer list must match the number of resolution levels. The inner lists contain dictionaries for different types of coordinate transforms. Each inner list must contain a scaling transform, a dictionary that takes
scale
as key and an iterable of voxel sizes as value.coordinate_transforms = [ [{'scale': [1, 0.2, 0.2], 'type': 'scale'}], [{'scale': [1, 0.4, 0.4], 'type': 'scale'}], [{'scale': [1, 0.8, 0.8], 'type': 'scale'}], [{'scale': [1, 1.6, 1.6], 'type': 'scale'}], [{'scale': [1, 3.2, 3.2], 'type': 'scale'}] ]
Specify zarr storage options
The most important zarr storage options are the
chunks
and thecompression
parameters. Thechunks
parameter is simply a tuple of integers corresponding to each axis. Thecompression
parameter requires compressor object from theNumcodecs
package, which is a dependency ofzarr-python
.storage_options=dict( chunks=(1, 64, 64), # Output chunk shape compression = numcodecs.Zlib(), # Compressor to be used, defaults to numcodecs.Blosc() overwrite = True # Overwrite the output path )
Save the array:
Here we use the
ome_zarr.writer.write_image
function to save the array. This function takes the parameters specified above as input, downscales the array accordingly and writes the resulting pyramid to the specified zarr group.writer.write_image(image = data, # In this case, a numpy array group = root, axes = axes, # Dimensionality order scaler=scaler, coordinate_transformations=transforms, storage_options = storage_options )
Update the rendering metadata using zarr and ome-zarr-py
Import the necessary modules
import zarr, os from ome_zarr.io import parse_url import matplotlib.colors as mcolors
At this stage inspect the image using the OME-Zarr validator:
ome_zarr view ~/ome_zarr_course/data/zarr/outputs/astronaut.zarr
Define a utility function to get the hex color code by simple color names
def get_color_code(color_name): try: color_code = mcolors.CSS4_COLORS[color_name.lower()] return color_code except KeyError: return f"Color '{color_name}' not found."
Now add rendering metadata
path = f"{os.path.expanduser('~')}/ome_zarr_course/data/zarr/outputs/astronaut.zarr" store = parse_url(path, mode = 'w').store # Create a zarr store to save the data. Note that this can also be an s3 object store. root = zarr.open_group(store=store) root.attrs["omero"] = { "channels": [ { "color": get_color_code('cyan'), "window": {"start": 0, "end": 255, "min": 0, "max": 255}, "label": "ch0", "active": True, }, { "color": get_color_code('magenta'), "window": {"start": 0, "end": 255, "min": 0, "max": 255}, "label": "ch1", "active": True, }, { "color": get_color_code('yellow'), "window": {"start": 0, "end": 255, "min": 0, "max": 255}, "label": "ch2", "active": True, }, ] }
It is important to know here that not all OME-Zarr readers recognize each of these settings.
Apply the validator again to the data to see the changes:ome_zarr view ~/ome_zarr_course/data/zarr/outputs/astronaut.zarr
As the data looks valid, now visualize using different viewers to see if the rendering is working.
Convert an image file to OME-Zarr
Convert monolithic file formats to the OME-Zarr format.
Show activity for:
BatchConvert
First check out what data we have the s3 end:
mc tree -d 2 s3/ome-zarr-course/
There are multiple conversion modes. Let’s try each of them.
Perform parallelised, independent conversion:
batchconvert omezarr -st s3 -dt s3 --drop_series data/MFF data/ZARR/$USER;
This command maps each input file in the
data/MFF
folder to a single OME-Zarr series, which is then transferred to a user-specific folder. Note that the-st s3
option makes sure that the input path is searched for in the s3 bucket, while-dt s3
triggers the output files to be transferred to the s3 bucket under the output path.Perform grouped conversion:
batchconvert omezarr -st s3 -dt s3 --drop_series --merge_files --concatenation_order t data/JPEG data/ZARR/$USER;
This conversion mode assumes that the input files are part of the same series and thus will merge them along a specific axis during the conversion process. The
--merge_files
flag specifies the grouped conversion option and the--concatenation_order t
option allows the files to be merged along the time channel.Check what has changed at the s3 end after the conversion:
mc tree -d 2 s3/ome-zarr-course/
mc ls s3/ome-zarr-course/data/ZARR/$USER/
Optional: Copy the converted Zarr data to the home folder:
mc mirror s3/ome-zarr-course/data/ZARR/$USER ~/data/ZARR;
Assessment
Follow-up material
Recommended follow-up modules:
Learn more: