Using HDF5 (.h5) Data

PINNICLE supports reading observational or model data stored in the HDF5 (.h5) format, a widely used standard in Earth science and machine learning for managing large, hierarchical datasets.

Overview

HDF5 files store data in a hierarchical format using datasets and groups. PINNICLE can navigate these structures and extract relevant fields for use in physics-informed neural networks. It supports both structured gridded data and scattered point data stored in .h5 containers.

Configuration

To use .h5 data in PINNICLE, add a new data source with "source": "h5" in your configuration, and you need to set the mapping for both the coordinates and variables:

hp["data"] = {
   "MyH5": {
      "data_path": h5path,
      "X_map": {"x":"surf_x", "y":"surf_y" },
      "name_map": {"s":"surf_elv", "u":"surf_vx", "v":"surf_vy", "a":"surf_SMB", "b":"bed_BedMachine"},
      "data_size": {"u": 5000, "v": 5000, "H": 5000, "s": 5000},
      "source": "h5"
    }
}
  • "data_path": Path to the .h5 file

  • "X_map": set the name mapping of the coordinates in PINNICLE to the .h5 file

  • "name_map": set the name mapping of the variables in PINNICLE to the .h5 file

  • "data_size": Number of data points to randomly sample for each variable

  • Set a variable to "None" to infer it is only used as a Dirichlet boundary condition

  • If the key is not mentioned in "data_size", then the corresponding field will not use data from this file

See the Examples section for workflows that incorporate .h5 datasets.