From KitwarePublic
Jump to navigationJump to search

HDF5 file format and library

HDF5 is both a file format and a library dedicated to reading and writing files in that format.

According to Wikipedia, "HDF5 include only two major types of object:

  • Datasets, which are multidimensional arrays of a homogenous type
  • Groups, which are container structures which can hold datasets and other groups

This results in a truly hierarchical, filesystem-like data format. In fact, resources in an HDF5 file are even accessed using the POSIX-like syntax /path/to/resource. Metadata is stored in the form of user-defined, named attributes attached to groups and datasets. More complex storage APIs representing images and tables can then be built up using datasets, groups and attributes. In addition to these advances in the file format, HDF5 includes an improved type system, and dataspace objects which represent selections over dataset regions. The API is also object-oriented with respect to datasets, groups, attributes, types, dataspaces and property lists. Because it uses B-trees to index table objects, HDF5 works well for Time series data such as stock price series, network monitoring data, and 3D meteorological data. The bulk of the data goes into straightforward arrays (the table objects) that can be accessed much more quickly than the rows of a SQL database, but B-Tree access is available for non-array data. The HDF5 data storage mechanism can be simpler and faster than an SQL Star schema."

It is available in BSD-like license.

Use cases



  • Chunking (streaming)
  • Multi-Resolution
  • Multi-Channel images
  • Large datasets ( Size > 4Gb )
  • Single experiment images of size 1024 x 1024 x 75 (XYZ), 2 channels, 1000 time-points
  • 8bit and 16bit
  • Images stored as 2D PNGs with filenames giving location
  • Need to support optimized reading (image streaming) of a sub-volume
  • Eg: Box filtering using a kernel of size 5x5x1x1x3
  • Cyclic buffer optimization in the ITK reader that keeps overlapping data and only reads new data
  • Multi-resolution images for heirarchical registration of multiple experimental sets
  • Compression is not as important in the short term but will be needed in the long term



Atomic objects






Compound objects