OMSI Dataformat Package

Main module for specification of the OpenMSI HDF5-based data format. The module contains various sub-modules, with the main goal to organize different categories of data.

Naming conventions for objects inside the HDF5 file are defined in the omsi_file.format module. These are used by the manager API classes to then implement the specific format.

The basic idea behind the design of the OpenMSI file format is the concept of managed objects. Managed objects are objects in an HDF5 file (usually HDF5 Groups –similar to directories just within a file) that have a corresponding interface class in the API. These classes in the API always start with the prefix omsi_file_ and inherit from omsi_file.common.omsi_file_common.

To make it easy to nest different objects, we also have the concept of manager helper classes, which encapsulate common functionality for creation and interaction with the objects when they are contained in another object. Manager helper classes follow the following naming convection omsi_<objectname>_manager, where objectname is name of the object to be managed. E.g, omsi_instrument_manager is used to help place instrument groups inside another managed object. This is done by inheriting from the given manager helper class.

This means, multiple inheritance is used in order to nest other managed modules with other interfaces. This allows us to easily encapsulate common interaction features in centralized locations and construct more complex containers simply via inheritance.

The user of multiple inheritance and super can be tricky in python. To simplify the use and ensure stability we use the following conventions:

  • All omsi_file_* manager classes must except the h5py.Group object they manage as input and call super(..)__init__(managed_group) with the managed group as parameter in their __init__.
  • All omsi_<objectname>_manager manager helper classes must except the h5py.Group that contains the object(s) that should be managed using the helper class as input and call super(..)__init__(managed_group) with the managed group as parameter in their in __init__.
  • All omsi_file_* manager classes must inherit from omsi_file.common.omsi_file_common
  • All omsi_<objectname>_manager manager helper classes must inherit from object (i.e., we use new-style classes).

omsi_file Package

Module for specification of the OpenMSI file API.

format Module

This module defines the basic format for storing mass spectrometry imaging data, metadata, and analysis in HDF5 in compliance with OpenMSI file format.

class omsi.dataformat.omsi_file.format.omsi_format_analysis

Bases: omsi.dataformat.omsi_file.format.omsi_format_common

Specification for storing analysis related data.

Variables:
  • analysis_groupnameanalysis_ : Group with additional analysis results
  • analysis_identifieranalysis_identifier : Identifier for the analysis to enable look-up by analysis id
  • analysis_typeanalysis_type : Dataset used to store the analysis type descriptor string
  • analysis_parameter_group – Group for storing analysis parameters. Dependent parameters are stored separately using a omsi_format_dependencies group.
  • analysis_runinfo_group – Group for storing run information, e.g., where was the analysis run, how long did it take etc.
analysis_groupname = 'analysis_'
analysis_identifier = 'analysis_identifier'
analysis_parameter_group = 'parameter'
analysis_parameter_help_attr = 'help'
analysis_runinfo_group = 'runinfo'
analysis_type = 'analysis_type'
current_version = '0.2'
class omsi.dataformat.omsi_file.format.omsi_format_common

Bases: object

Specification of common attributes, and names for the file format.

Variables:
  • str_typestr_type = h5py.new_vlen(str) : Datatype used for storing strings in hdf5
  • type_attribute – Name of the optional type attribute indicating which omsi_file_* class should be used to interact with a given group.
current_version = '0.1'
str_type = dtype('O')
str_type_unicode = True
timestamp_attribute = 'timestamp'
type_attribute = 'omsi_type'
version_attribute = 'version'
class omsi.dataformat.omsi_file.format.omsi_format_data

Bases: omsi.dataformat.omsi_file.format.omsi_format_common

Specification for storing raw data information.

Variables:
  • data_groupname – The base name for the hdf5 group containing the imaging data
  • dataset_name – The base name for storing raw data. In the case of MSI data, this is the 3D data cube stored as 3D (full_cube), 2D (partial_cube) or 1D (partial_spectra) dataset, depending on the format_type.
  • data_dependency_group – Optional group for storing data dependencies
current_version = '0.1'
data_groupname = 'data_'
dataset_name = 'data_'
class omsi.dataformat.omsi_file.format.omsi_format_dependencies

Bases: omsi.dataformat.omsi_file.format.omsi_format_common

Specification for the management of a collection of dependencies.

Variables:dependencies_groupnamedependency : Name of the group the dependencies are stored in.
current_version = '0.1'
dependencies_groupname = 'dependency'
class omsi.dataformat.omsi_file.format.omsi_format_dependencydata

Bases: omsi.dataformat.omsi_file.format.omsi_format_common

Specification for the storage of a single dependency.

This type of group does not have specific name to allow the user to specify a specific link_name to ease retrieval of the data.

Variables:
  • dependency_parameterparameter_name : Name of string dataset used to store the name of the dependent parameter
  • dependency_selectionselection : Name of the string dataset used to store a selection string if needed.
  • dependency_mainnamemain_name : Name of the string dataset used to store the description of the link to the object that this depends on.
  • dependency_datasetname – ‘data_name` : Name fo the string dataset used to store the name of dataset within the mainname highlevel object.
current_version = '0.3'
dependency_datasetname = 'data_name'
dependency_mainname = 'main_name'
dependency_parameter = 'parameter_name'
dependency_parameter_help_attr = 'help'
dependency_selection = 'selection'
dependency_typename = 'dependency_type'
class omsi.dataformat.omsi_file.format.omsi_format_experiment

Bases: omsi.dataformat.omsi_file.format.omsi_format_common

Specification of file format specific name conventions

Variables:
  • exp_groupnameentry_ : The base name for a group containing data about an experiment
  • exp_identifier_nameexperiment_identifier : The identifier dataset for an experiment
current_version = '0.1'
exp_groupname = 'entry_'
exp_identifier_name = 'experiment_identifier'
class omsi.dataformat.omsi_file.format.omsi_format_file

Bases: omsi.dataformat.omsi_file.format.omsi_format_common

Specification of main-file related specific name conventions

current_version = '0.1'
class omsi.dataformat.omsi_file.format.omsi_format_instrument

Bases: omsi.dataformat.omsi_file.format.omsi_format_metadata_collection

Specification for storing instrument related information

Variables:
current_version = '0.2'
instrument_groupname = 'instrument'
instrument_mz_name = 'mz'
instrument_name = 'name'
class omsi.dataformat.omsi_file.format.omsi_format_metadata_collection

Bases: omsi.dataformat.omsi_file.format.omsi_format_common

Specification of the basic format for a general-purpose metadata storage

Variables:
current_version = '0.1'
description_value_attribute = 'description'
metadata_collection_groupname_default = 'metadata'
ontology_value_attribute = 'ontology'
unit_value_attribute = 'unit'
class omsi.dataformat.omsi_file.format.omsi_format_methods

Bases: omsi.dataformat.omsi_file.format.omsi_format_metadata_collection

Specification of the basic format for storing method-related information

Variables:
  • methods_groupnamemethods : The group storing all the information about the method
  • methods_old_groupnamemethod : The group object was refactored to methods. To ensure that old files can still be read, this variable was added and is checked as well if needed.
  • methods_namename : The dataset with the name of the method
current_version = '0.3'
methods_groupname = 'methods'
methods_name = 'name'
methods_old_groupname = 'sample'
class omsi.dataformat.omsi_file.format.omsi_format_msidata

Bases: omsi.dataformat.omsi_file.format.omsi_format_data

Specification of the basic format for storing an MSI dataset consisting of a complete 3D cube (or a 3D cube completed with 0s for missing data)

Variables:
  • format_types – Data layout types supported for storing MSI data.
  • mzdata_name – Global mz axis for the MSI data cube.
  • format – Dataset in HDF5 with the format_type descriptor.
current_version = '0.1'
format_name = 'format'
format_types = {'full_cube': 1, 'partial_cube': 2, 'partial_spectra': 3}
mzdata_name = 'mz'
class omsi.dataformat.omsi_file.format.omsi_format_msidata_partial_cube

Bases: omsi.dataformat.omsi_file.format.omsi_format_msidata

Specification of the basic format for storing an MSI datasets that define a partial cube with full spectra

Variables:
  • xy_index_name – 2D dataset indicating for each spectrum its start location in the the main dataset
  • inv_xy_index_name – 2D dataset with n rows and 2 columns indicating for each spectrum i the (x,y) pixel index the spectrum belongs to. This index is stored for convenience purposes but is not actually needed for data access.
  • shape_name – Simple [3] indicating the true image size in x,y,mz
inv_xy_index_name = 'inv_xy_index'
shape_name = 'shape'
xy_index_name = 'xyindex'
class omsi.dataformat.omsi_file.format.omsi_format_msidata_partial_spectra

Bases: omsi.dataformat.omsi_file.format.omsi_format_msidata_partial_cube

Specification of the basic format for storing an MSI dataset of a full or partial cube with partial spectra

Variables:
  • mz_index_name – 1D dataset of the same size as the spectrum data, indicating the indices into the global m/z list
  • xy_index_end_name – 2D dataset indicating for each spectrum its end location (index not included) in the the main dataset
mz_index_name = 'mz_index'
xy_index_end_name = 'xyindexend'

common Module

Module for common data format classes and functionality.

class omsi.dataformat.omsi_file.common.omsi_file_common(managed_group)

Bases: object

Base class for definition of file format modules for the OpenMSI data format.

Use of super()

This class inherits only from object and calls super in the __init__ without parameters. In the standard design pattern of the omsi.dataformat.omsi_file module, it is, therefore, the last class we inherit from in the case of multiple inheritance.

Multiple inheritance is used in omsi.dataformat.omsi_file module when a class contains other managed objects and uses the manager classes, e.g, omsi_instrument_mangager etc. to get all the features needed to manage those objects.

All child classes of omsi_file_common also call super(..).__init__(manager_group) but using a single input parameter indicating the manager h5py.Group object that contains the given object.

Variables:
  • managed_group – The h5py.Group object managed by the class
  • name – The path to the object in the hdf5 file. Same as managed_group.name
  • file – The h5py.File object the managed_group is associated with. Same as managed_group.file
static create_path_string(filename, objectname)

Given the name of the file and the object path within the file, create a string describing the external reference to the datra

Parameters:
  • filename – The full or relative path to the file
  • objectname – The object path in the HDF5 file
Returns:

String describing the path to the object

classmethod get_h5py_object(omsi_object, resolve_dependencies=False)

This static method is a convenience function used to retrieve the corresponding h5py interface object for any omsi file API object.

Parameters:
  • omsi_object – omsi file API input object for which the corresponding h5py.Group, h5py.File, or h5py.Dataset object should be retrieved. The omsi_object input may itself also be a h5py.Group, h5py.File, or h5py.Dataset, in which case omsi_object itself is returned by the function.
  • resolve_dependencies – Set to True if omsi_file_dependencydata objects should be resolved to retrieve the dependent object the dependency is pointing to. Dependencies are resolved recursively, i.e., if a dependency points to another dependency then that one will be resolved as well. Default value is False, i.e., the omis_file_dependency object itself is returned.
Returns:

h5py.Group, h5py.File, or h5py.Dataset corresponding to the given omsi_object.

Raises ValueError:
 

A ValueError is raised in case that an unsupported omsi_object object is given, i.e., the input object is not a omsi_file API object nor a h5py Group, File, or Dataset object.

get_managed_group()

Return the h5py object with the analysis data.

The returned object can be used to read data directly from the HDF5 file. Write operations to the analysis group can be performed only if the associated omsi_file was opened with write permissions.

Returns:h5py object for the analysis group.
classmethod get_num_items(file_group, basename='')

Get the number of object with the given basename at the given path

Parameters:
  • file_group – The h5py object to be examined
  • basename – The name that should be searched for.
Returns:

Number of objects with the given basename at the given path

classmethod get_omsi_object(h5py_object, resolve_dependencies=False)

This static method is convenience function used to retrieve the corresponding interface class for a given h5py group object.

Parameters:
  • h5py_object – h5py object for which the corresponding omsi_file API object should be generated. This may also be a string describing the requested object based on a combination of the path to the file and a path ot the object <filename.h5>:<object_path>
  • resolve_dependencies – Set to True if omsi_file_dependencydata objects should be resolved to retrieve the dependent object the dependency is pointing to. Dependencies are resolved recursively, i.e., if a dependency points to another dependency then that one will be resolved as well. Default value is False, i.e., the omis_file_dependency object itself is returned.
Returns:

None in case no corresponding object was found. Otherwise an instance of:

  • omsi_file : If the given object is a h5py.File object
  • omsi_file_experiment : If the given object is an experiment groupt
  • omsi_file_methods : If the given object is a method group
  • omsi_file_instrument : If the given object is an instrument group
  • omsi_file_analysis : If the given object is an analysis group
  • omsi_file_msidata : If the given object is a MSI data group
  • omsi_file_dependencydata : If the fiven object is a dependency group
  • The input h5py_object: If the given object is a h5py.Dataset or h5py.Group
  • None: In case that an unknown type is given.

get_timestamp()

Get the timestamp when the analysis group was created in the HDF5 file.

Returns:Python timestamp string generated using time.ctime(). None may be returned in case that the timestamp does not exists or cannot be retrieved from the file for some reason.
get_version()

Get the omsi version for the representation of this object in the HDF5 file

classmethod is_managed(in_object)

Check whether the given object is managed by any omsi API class.

Parameters:in_object (Any omsi_file API object or h5py.Dataset or h5py.Group or h5py.File object.) – The object to be checked
items()

Get the list of items associdated with the h5py.Group object managed by this object

static parse_path_string(path)

Given a string of the form <filename.h5>:<object_path> retrieve the name of the file and the object path.

Parameters:path – The string defining the file and object path.
Returns:Tuple with the filename and the object path. Both may be None depending on whether an object_path is given and whether the path string is valid.
Raises:ValueError in case that an invalid string is given
static same_file(filename1, filename2)

Check whether two files are the same.

This function uses the os.path.samefile(...) method to compare files and falls back to comparing the absolute paths of files if samefile should fail or cannot be imported.

Parameters:
  • filename1 – The name of the first file
  • filename2 – The name of the second file
Returns:

class omsi.dataformat.omsi_file.common.omsi_file_object_manager(*args, **kwargs)

Bases: object

Base class used to define manager helper classes used to manage contained managed objects. Managed objects are HDF5.Groups (or Datasets) with a corresponding manager API class and may be nested within other Managed objects.

What is a manager helper class?

Manager classes are used in the design of omsi.dataformat.omsi_file to encapsulate functionality needed for management of other manager objects. The expected use of this class, hence, is through multiple inheritance where the main base class is omsi.dataformat.omsi_file.common.omsi_file_common. This is important due to the use of super to accomodate multiple inheritance to allow object to manage an arbitrary number of other object and inherit from other object as well.

Use of super()

This class inherits only from object but calls super in the __init__(manager_group) with the manager_group as only input parameter, in the expectation that this class is used using multiple inheritance with omsi_file_common as main base class .

Multiple inheritance is used in omsi.dataformat.omsi_file module when a class contains other managed objects and uses the manager classes (such as this one) to get all the features needed to manage those objects.

All child classes of omsi_file_common call super(..).__init__(manager_group) and all manager helper classes (such as this one) use a single input parameter indicating the manager h5py.Group object that contains the given object.

main_file Module

Module for managing OpenMSI HDF5 data files.

class omsi.dataformat.omsi_file.main_file.omsi_file(filename, mode='a', **kwargs)

Bases: omsi.dataformat.omsi_file.experiment.omsi_experiment_manager, omsi.dataformat.omsi_file.common.omsi_file_common

API for creating and managing a single OpenMSI data file.

Use of supe()r

This class inherits from omsi.dataformat.omsi_file.common.omsi_file_common . Consistent with the design pattern for multiple inheritance of the omsi.dataformat.omsi_file module, the __init__ function calls super(...).__init__(manager_group) with a single parameter indicating the parent group.

Inherited Instance Variables

Variables:
  • managed_group – The group that is managed by this object
  • name – Name of the managed group

Open the given file or create it if does not exit.

The creation of the object may fail if the file does not exist, and the selected mode is ‘r’ or ‘r+’.

Keyword arguments:

Parameters:
  • filename – string indicating the name+path of the OpenMSI data file. Alternatively this may also be an h5py.File instance (or an h5py.Group, h5py.Dataset instance from which we can get the file)
  • mode

    read/write mode. One of :

    r = readonly, file must exist.

    r+ = read/write, file must exist.

    w = Create file, truncate if exists.

    w- = create file, fail if exists.

    a = read/write if exists, create otherwise (default)

  • **kargs

    Other keyword arguments to be used for opening the file using h5py. See the h5py.File documentation for details. For example to use parallel HDF5, the following additional parameters can be given driver=’mpio’, comm:MPI.COMM_WORLD.

close_file()

Close the msi data file

flush()

Flush all I/O

get_filename()

Get the name of the omsi file

Returns:String indicating the filename (possibly including the full path, depending on how the object has been initalized)
get_h5py_file()

Get the h5py object for the omsi file

Returns:h5py redernce to the HDF5 file
classmethod is_valid_dataset(name)

Perform basic checks for the given filename, whether it is a valid OMSI file.

Parameters:name – Name of the file to be checked.
Returns:Boolean indicating whether the file is valid
write_xdmf_header(xdmf_filename)

Write XDMF header file for the current HDF5 datafile

Parameters:xdmf_filename – The name of the xdmf XML header file to be created for the HDF5 file.

experiment Module

OMSI file module for management of experiment data.

class omsi.dataformat.omsi_file.experiment.omsi_experiment_manager(experiment_parent)

Bases: omsi.dataformat.omsi_file.common.omsi_file_object_manager

Experiment manager helper class used to define common functionality needed for experiment-related data. Usually, a class that defines a format that contains an omsi_file_experiment object will inherit from this class (in addition to omsi_file_common) to acquire the common features.

For more details see: omsi.dataforamt.omsi_file.omsi_common.omsi_file_object_manager

Variables:experiment_parent – The h5py.Group parent object containing the instrument object to be managed.
create_experiment(exp_identifier=None, flush_io=True)

Create a new group in the file for a new experiment and return the omsi_file_experiment object for the new experiment.

Parameters:
  • exp_identifier (string or None (default)) – The string used to identify the analysis
  • flush_io – Call flush on the HDF5 file to ensure all HDF5 bufferes are flushed so that all data has been written to file.
Returns:

omsi_file_experiment object for the newly created group for the experiment

get_experiment(exp_index)

Get the omsi_format_experiment object for the experiment with the given index

Parameters:exp_index (uint) – The index of the requested experiment
Returns:h5py reference to the experiment with the given index. Returns None in case the experiment does not exist.
get_experiment_by_identifier(exp_identifier_string)

Get the omsi_format_experiment object for the experiment with the given identifier.

Parameters:exp_identifier_string (string) – The string used to identify the analysis
Returns:Returns h5py object of the experiment group or None in case the experiment is not found.
static get_experiment_path(exp_index=None)

Based on the index of the experiment return the full path to the hdf5 group containing the data for an experiment.

Parameters:exp_index – The index of the experiment.
Returns:String indicating the path to the experiment.
get_num_experiments()

Get the number of experiments in this file.

Returns:Integer indicating the number of experiments.
class omsi.dataformat.omsi_file.experiment.omsi_file_experiment(exp_group)

Bases: omsi.dataformat.omsi_file.methods.omsi_methods_manager, omsi.dataformat.omsi_file.instrument.omsi_instrument_manager, omsi.dataformat.omsi_file.analysis.omsi_analysis_manager, omsi.dataformat.omsi_file.msidata.omsi_msidata_manager, omsi.dataformat.omsi_file.common.omsi_file_common

Class for managing experiment specific data

Use of super():

This class inherits from omsi.dataformat.omsi_file.common.omsi_file_common. Consistent with the design pattern for multiple inheritance of the omsi.dataformat.omsi_file module, the __init__ function calls super(...).__init__(manager_group) with a single parameter indicating the parent group.

Inherited instance variable:

Variables:
  • managed_group – The group that is managed by this object
  • methods_parent – The parent group containing the methods object (same as managed_group)
  • instrument_parent – The parent group containing the instrument object (same as managed_group)
  • name – Name of the managed group

Initalize the experiment object given the h5py object of the experiment group

Parameters:exp_group – The h5py object with the experiment group of the omsi hdf5 file.
get_experiment_identifier()

Get the HDF5 dataset with the identifier description for the experiment.

Returns:h5py object of the experiment identifier or None in case not present
get_experiment_index()

Determine the index of the experiment based on the name of the group :return: Integer index of the experiment

get_instrument_info(check_parent=False)

Inherited from omsi_instrument_manager parent class. Overwritten here to change the default parameter setting for check_parent. See omsi.dataformat.omsi_file.instrument.omsi_instrument_manager for details.

get_method_info(check_parent=False)

Inherited from omsi_method_manager parent class. Overwritten here to change the default parameter setting for check_parent. See omsi.dataformat.omsi_file.methods.omsi_method_manager for details

set_experiment_identifier(identifier)

Overwrite the current identfier string for the experiment with the given string

Parameters:identifier – The new experiment identifier string.

metadata_collection Module

Module for management of general metadata storage entities. These are often specialized —e.g., omsi_file_instrument, omsi_file_sample—to store specific metadata and add more functionality.

class omsi.dataformat.omsi_file.metadata_collection.omsi_file_metadata_collection(metadata_group)

Bases: omsi.dataformat.omsi_file.common.omsi_file_common

Class for managing method specific data.

Use of super():

This class inherits from omsi.dataformat.omsi_file.common.omsi_file_common. Consistent with the design pattern for multiple inheritance of the omsi.dataformat.omsi_file module, the __init__ function calls super(...).__init__(manager_group) with a single parameter indicating the parent group.

Inherited Instance Variables

Variables:
  • managed_group – The group that is managed by this object
  • name – Name of the managed group

Initialize the metadata collection object given the h5py object of the metadata collection

Parameters:metadata_group – The h5py object with the metadata collection group of the omsi hdf5 file.
add_metadata(metadata)

Add a new metadata entry

Parameters:metadata – Instance of omsi.shared.metadata_data.metadata_value or describing omsi.shared.metadata_data.metadata_dict with the metadata to be added.
get_metadata(key=None)

Get dict with the full description of the metadata for the given key or all metadata if no key is given.

Returns:omsi.shared.metadata_data.metadata_value object if a key is given or a omsi.shared.metadata_data.metadata_dict with all metadata if no key is specified.
Raises:KeyError is raised in case that the specified key does not exist
keys()

Get a list of all metadata keys

Returns:List of string with the metadata keys
values()

Convenience function returning a list of all metadata objects. This is equivilant get_metadata(key=None).values(), however, for consistency with other dict-like interfaces this function returns a list of omsi.shared.metadata_data.metadata_value objects rather than the omsi.shared.metadata_data.metadata_dict

Returns:List of omsi.shared.metadata_data.metadata_value with all metadata
class omsi.dataformat.omsi_file.metadata_collection.omsi_metadata_collection_manager(metadata_parent=None)

Bases: omsi.dataformat.omsi_file.common.omsi_file_object_manager

This is a file format manager helper class used to define common functionality needed for management of metadata-related data. Usually, a class that defines a format that contains an omsi_file_metadata object will inherit from this class (in addition to omsi_file_common) to acquire the common features.

For more details see: omsi.dataforamt.omsi_file.omsi_common.omsi_file_object_manager

Variables:metadata_parent – The parent h5py.Group object containing the method object to be managed
create_metadata_collection(group_name=None, metadata=None, flush_io=True)

Add a new group for managing metadata

Parameters:
  • group_name (str, None) – Optional name of the new metadata group. If None is given then the omsi_format_metadata_collection.metadata_collection_groupname_default will be used
  • metadata (None, omsi.shared.metadata_data.metadata_value, omsi_shared.metadata_data.metadata_dict) – Additional metadata to be added to the collection after creation
  • flush_io – Call flush on the HDF5 file to ensure all HDF5 buffers are flushed so that all data has been written to file
Returns:

omsi_file_metadata_collection object

get_default_metadata_collection(omsi_object=None)

Get the default metadata collection object if it exists

Parameters:omsi_object – The omsi file API object or h5py.Group object that we should check. If set to None (default) then the self.metadata_parent will be used
Returns:None, omsi_file_metadata_collection
get_metadata_collections(omsi_object=None, name=None)

Get all metadata_collections defined for given OpenMSI file API object or h5py.Group.

Parameters:
  • omsi_object – The omsi file API object or h5py.Group object that we should check. If set to None (default) then the self.metadata_parent will be use
  • name – If name is specified, then only retrieve collections with the given name
Returns:

List of omsi_file_metadata_collection objects for the requested group. The function returns None in case that the h5py.Group for the omsi_object could not be determined.

has_default_metadata_collection(omsi_object)

Check whether the omsi API object (or h5py.Group) contains any a metadata collection with the default name.

Returns:bool
has_metadata_collections(omsi_object=None)

Check whether the given omsi API object (or h5py.Group) contains any metadata collections

Parameters:omsi_object – The omsi file API object or h5py.Group object that we should check. If set to None (default) then the self.metadata_parent will be used
Returns:Boolean indicating whether metadata collections were found

instrument Module

Module for managing instrument related data in OMSI files.

class omsi.dataformat.omsi_file.instrument.omsi_file_instrument(instrument_group)

Bases: omsi.dataformat.omsi_file.metadata_collection.omsi_file_metadata_collection

Class for managing instrument specific data

Use of super():

This class inherits from omsi.dataformat.omsi_file.common.omsi_file_common. Consistent with the design pattern for multiple inheritance of the omsi.dataformat.omsi_file module, the __init__ function calls super(...).__init__(manager_group) with a single parameter indicating the parent group.

Inherited Instance Variables

Variables:
  • managed_group – The group that is managed by this object
  • name – Name of the managed group

Initalize the instrument object given the h5py object of the instrument group

Parameters:instrument_group – The h5py object with the instrument group of the omsi hdf5 file.
get_instrument_mz()

Get the HDF5 dataset with the mz data for the instrument.

To get the numpy array of the full mz data use: get_instrument_mz()[:]

Returns:Returns the h5py object with the instrument mz data. Returns None in case no mz data was found for the instrument.
get_instrument_name()

Get the HDF5 dataset with the name of the instrument.

To get the string of the instrument name use: get_instrument_name()[...]

Returns:h5py object to the dataset with the instrument name. Returns None in case no method name is found.
has_instrument_name()

Check whether a name has been saved for the instrument

Returns:bool
set_instrument_name(name)

Overwrite the current identifier string for the experiment with the given string.

Parameters:name (string.) – The new instrument name.
class omsi.dataformat.omsi_file.instrument.omsi_instrument_manager(instrument_parent)

Bases: omsi.dataformat.omsi_file.metadata_collection.omsi_metadata_collection_manager

Instrument manager helper class used to define common functionality needed for instrument-related data. Usually, a class that defines a format that contains an omsi_file_methods object will inherit from this class (in addition to omsi_file_common) to acquire the common features.

For more details see: omsi.dataforamt.omsi_file.omsi_common.omsi_file_object_manager

Variables:instrument_parent – The h5py.Group parent object containing the instrument object to be managed.
create_instrument_info(instrument_name=None, mzdata=None, flush_io=True)

Add information about the instrument used for creating the images for this experiment.

Parameters:
  • instrument_name (string, None) – The name of the instrument
  • mzdata (numpy array or None) – Numpy array of the mz data values of the instrument
  • flush_io – Call flush on the HDF5 file to ensure all HDF5 bufferes are flushed so that all data has been written to file
Returns:

The function returns the h5py HDF5 handler to the instrument info group created for the experiment.

get_instrument_info(check_parent=True)

Get the HDF5 group opbject with the instrument information.

Parameters:check_parent – If no method group is available for this dataset should we check whether the parent object (i.e., the experiment group containing the dataset) has information about the method. (default=True)
Returns:omsi_file_instrument object for the requested instrument info. The function returns None in case no instrument information was found for the experiment
has_instrument_info(check_parent=False)

Check whether custom instrument information is available for this dataset.

Parameters:check_parent – If no instrument group is available for this dataset should we check whether the parent object (i.e., the experiment group containing the dataset) has information about the instrument. (default=False)
Returns:Boolean indicating whether instrument info is available.

methods Module

Module for management of method specific data in OMSI data files

class omsi.dataformat.omsi_file.methods.omsi_file_methods(method_group)

Bases: omsi.dataformat.omsi_file.metadata_collection.omsi_file_metadata_collection

Class for managing method specific data.

Use of super():

This class inherits from omsi.dataformat.omsi_file.common.omsi_file_common. Consistent with the design pattern for multiple inheritance of the omsi.dataformat.omsi_file module, the __init__ function calls super(...).__init__(manager_group) with a single parameter indicating the parent group.

Inherited Instance Variables

Variables:
  • managed_group – The group that is managed by this object
  • name – Name of the managed group

Initialize the method object given the h5py object of the method group

Parameters:method_group – The h5py object with the method group of the omsi hdf5 file.
get_method_name()

Get the HDF5 dataset with the name of the method.

To retrieve the name string use get_method_name()[...]

Returns:h5py object where the method name is stored. Returns None in case no method name is found.
has_method_name()

Check whether an object has a method name

Returns:bool
set_method_name(name_string)

Overwrite the name string for the method with the given name string

Parameters:name_string (string) – The new method name.
class omsi.dataformat.omsi_file.methods.omsi_methods_manager(methods_parent=None)

Bases: omsi.dataformat.omsi_file.metadata_collection.omsi_metadata_collection_manager

This is a file format manager helper class used to define common functionality needed for methods-related data. Usually, a class that defines a format that contains an omsi_file_methods object will inherit from this class (in addition to omsi_file_common) to acquire the common features.

For more details see: omsi.dataforamt.omsi_file.omsi_common.omsi_file_object_manager

Variables:method_parent – The parent h5py.Group object containing the method object to be managed
create_method_info(method_name=None, metadata=None, flush_io=True)

Add information about the method imaged to the experiment. Note, if a methods group already exists, then that group will be used. If method_name is not None, then the existing name will be overwritten by the new value.

Parameters:
  • method_name (str, None) – Optional name of the method
  • metadata (metadata_value, metadata_dict) – Additional metadata to be stored with the methods
  • flush_io – Call flush on the HDF5 file to ensure all HDF5 buffers are flushed so that all data has been written to file
Returns:

h5py object of the newly created method group.

get_method_info(check_parent=True)

Get the omsi_file_methods object with the method information.

Parameters:check_parent – If no method group is available for this dataset should we check whether the parent object (i.e., the experiment group containing the dataset) has information about the method. (default=True)
Returns:omsi_file_methods object for the requested method info. The function returns None in case no method information was found for the experiment
has_method_info(check_parent=False)

Check whether custom method information is available for this dataset.

Parameters:check_parent – If no method group is available for this dataset should we check whether the parent object (i.e., the experiment group containing the dataset) has information about the method. (default=False)
Returns:Boolean indicating whether method info is available.

msidata Module

Module for managing MSI data in OMSI data files

class omsi.dataformat.omsi_file.msidata.omsi_file_msidata(data_group, fill_space=True, fill_spectra=True, preload_mz=False, preload_xy_index=False)

Bases: omsi.dataformat.omsi_file.dependencies.omsi_dependencies_manager, omsi.dataformat.omsi_file.methods.omsi_methods_manager, omsi.dataformat.omsi_file.instrument.omsi_instrument_manager, omsi.dataformat.omsi_file.metadata_collection.omsi_metadata_collection_manager, omsi.dataformat.omsi_file.common.omsi_file_common

Interface for interacting with mass spectrometry imaging datasets stored in omis HDF5 files. The interface allows users to interact with the data as if it where a 3D cube even if data is missing. Full spectra may be missing in cases where only a region of interest in space has been imaged. Spectra may further be pre-processed so that each spectrum has only information about its peaks so that each spectrum has it’s own mz-axis.

To load data ue standard array syntax, e.g., [1,1,:] can be used to retrieve the spectrums at location (1,1).

Use of super():

This class inherits from omsi.dataformat.omsi_file.common.omsi_file_common. Consistent with the design pattern for multiple inheritance of the omsi.dataformat.omsi_file module, the __init__ function calls super(...).__init__(manager_group) with a single parameter indicating the parent group.

Current limitations:

  • The estimates in def __best_dataset__(self,keys) are fairly crude at this point
  • The __getitem__ function for the partial_spectra case is not implemented yet.
  • The __setitem__ function for the partial spectra case is not implemented yet (Note, it should also support dynamic expansion of the cube by adding previously missing spectra).
  • For the partial cube case, assignement using __setitem__ function is only supported to valid spectra, i.e., spectra that were specified as occupied during the intital creation process.

Public object variables:

Variables:
  • shape – Define the full 3D shape of the dataset (i.e., even if the data is stored in sparse manner)
  • dtype – The numpy datatyp of the main MSI data. This is the same as dataset.dtype
  • name – The name of the corresponding groupt in the HDF5 file. Used to generate hard-links to the group.
  • format_type – Define according to which standard the data is stored in the file
  • datasets – List of h5py objects containing possibly multiple different version of the same MSI data (spectra). There may be multiple versions stored with different layouts in order to optimize the selection process.
  • mz – dataset with the global mz axis information. If prelaod_mz is set in the constructor, then this is a numpy dataset with the preloaded data. Otherwise, this is the h5py dataset pointing to the data on disk.
  • xy_index – None if format_type is ‘full_cube’. Otherwise, this is the 2D array indicating for each x/y location the index of the spectrum in dataset. If prelaod_xy_index is set in the constructor, then this is a numpy dataset with the preloaded data. Otherwise, this is the h5py dataset pointing to the data on disk. Negative (-1) entries indicate that no spectrum has been recored for the given pixel.
  • inv_xy_index – 2D dataset with n rows and 2 columns indicating for each spectrum i the (x,y) pixel index the spectrum belongs to. This index is stored for convenience purposes but is not actually needed for data access.
  • mz_index – None if format_type is not ‘partial_spectra’. Otherwise this is a dataset of the same size as the spectra data stored in dataset. Each entry indicates an index into the mz dataset to determine the mz_data value for a spectrum. This means mz[ mx_index ] gives the true mz value.
  • xy_index_end – None if format_type is not ‘partial_spectra’. Otherwise this is a 2D array indicating for each x/y location the index where the given spectrum ends in the dataset. If prelaod_xy_index is set in the constructor, then this is a numpy dataset with the preloaded data. Otherwise, this is the h5py dataset pointing to the data on disk. Negative (-1) entries indicate that no spectrum has been recored for the given pixel.

Private object variables:

Variables:
  • _data_group – Store the pointer to the HDF5 group with all the data
  • _fill_xy – Define whether the data should be reconstructed as a full image cube Set using the set_fill_space function(..)
  • _fill_mz – Define whether spectra should be remapped onto a global m/z axis. Set using the set_fill_spectra function(..)

Initialize the omsi_msidata object.

The fill options are provided to enable a more convenient access to the data independent of how the data is stored in the file. If the fill options are enabled, then the user can interact with the data as if it where a 3D cube while missing is data is filled in by the given fill value.

The prelaod options provided here refer to generally smaller parts of the data for which it may be more efficient to load the data and keep it around rather than doing repeated reads. If the object is used only for a single read and destroyed afterwards, then disabling the preload options may give a slight advantage but in most cases enabling the preload should be Ok (default).

Parameters:
  • data_group – The h5py object for the group with the omsi_msidata.
  • fill_space – Define whether the data should be padded in space (filled with 0’s) when accessing the data using [..] operator so that the data behaves like a 3D cube.
  • fill_spectra – Define whether the spectra should completed by adding 0’s so that all spectra retrieved via the [..] operator so that always spectra of the full length are returned. This option is provided to ease extension of the class to cases where only partial spectra are stored in the file but is not used at this point.
  • preload_mz – Should the data for the mz axis be kept in memory or loaded on the fly when needed.
  • preload_xy_index – Should the xy index (if available) be preloaderd into memory or should the required data be loaded on the fly when needed.
copy_dataset(source, destination, print_status=False)

Helper function used to copy a source msi dataset one chunk at a time to the destination dataset. The data copy is done one destination chunk at a time to achieve chunk-aligned write.

Parameters:
  • source – The source h5py dataset
  • destination – The h5py desitnation h5py dataset.
  • print_status – Should the function print the status of the conversion process to the command line?
create_optimized_chunking(chunks=None, compression=None, compression_opts=None, copy_data=True, print_status=False, flush_io=True)

Helper function to allow one to create optimized copies of the dataset with different internal data layouts to speed up selections. The function expects that the original data has already been written to the data group. The function takes

Parameters:
  • chunks – Specify whether chunking should be used (True,False), or specify the chunk sizes to be used explicitly.
  • compression – h5py compression option. Compression strategy. Legal values are ‘gzip’, ‘szip’, ‘lzf’. Can also use an integer in range(10) indicating gzip.
  • compression_opts – h5py compression settings. This is an integer for gzip, 2-tuple for szip, etc.. For gzip (H5 deflate filter) this is the aggression paramter. The agression parameter is a number between zero and nine (inclusive) to indicate the tradeoff between speed and compression ratio (zero is fastest, nine is best ratio).
  • copy_data – Should the MSI data be copied by this function to the new dataset or not. If False, then it is up to the user of the function to copy the appropriate data into the returned h5py dataset (not recommended but may be useful for performance optimization).
  • print_status – Should the function print the status of the conversion process to the command line?
  • flush_io – Call flush on the HDF5 file to ensure all HDF5 bufferes are flushed so that all data has been written to file
Returns:

h5py dataset with the new copy of the data

get_h5py_datasets(index=0)

Get the h5py dataset object for the given dataset.

Parameters:index – The index of the dataset.
Returns:h5py object for the requested dataset.
Raises:and Index error is generated in case an invalid index is given.
get_h5py_mzdata()

Get the h5py object for the mz datasets.

Returns:h5py object of the requested mz dataset.
set_fill_space(fill_space)

Define whether spatial selection should be filled with 0’s to retrieve full image slices

Parameters:fill_space – Boolean indicating whether images should be filled with 0’s
set_fill_spectra(fill_spectra)

Define whether spectra should be filled with 0’s to map them to the global mz axis when retrieved.

Parameters:fill_spectra – Define whether m/z values should be filled with 0’s.
class omsi.dataformat.omsi_file.msidata.omsi_msidata_manager(msidata_parent)

Bases: omsi.dataformat.omsi_file.common.omsi_file_object_manager

MSI-data manager helper class used to define common functionality needed for msidata-related data. Usually, a class that defines a format that contains an omsi_file_msidata object will inherit from this class (in addition to omsi_file_common) to acquire the common features.

For more details see: omsi.dataforamt.omsi_file.omsi_common.omsi_file_object_manager

Variables:msidata_parent – The h5py.Group parent object containing the instrument object to be managed.

Initatize the manger object.

Parameters:msidata_parent – The h5py.Group parent object for the msi data.
create_msidata_full_cube(data_shape, data_type='f', mzdata_type='f', chunks=None, compression=None, compression_opts=None, flush_io=True)

Create a new mass spectrometry imaging dataset for the given experiment written as a full 3D cube.

Parameters:
  • data_shape – Shape of the dataset. Eg. shape=(10,10,10) creates a 3D dataset with 10 entries per dimension
  • data_type – numpy style datatype to be used for the dataset.
  • mzdata_type – numpy style datatype to be used for the mz data array.
  • chunks – Specify whether chunkning should be used (True,False), or specify the chunk sizes to be used in x,y, and m/z explicitly.
  • compression – h5py compression option. Compression strategy. Legal values are ‘gzip’, ‘szip’, ‘lzf’. Can also use an integer in range(10) indicating gzip.
  • compression_opts – h5py compression settings. This is an integer for gzip, 2-tuple for szip, etc.. For gzip (H5 deflate filter) this is the aggression paramter. The agression parameter is a number between zero and nine (inclusive) to indicate the tradeoff between speed and compression ratio (zero is fastest, nine is best ratio).
  • flush_io – Call flush on the HDF5 file to ensure all HDF5 bufferes are flushed so that all data has been written to file
Returns:

The following two empty (but approbriately sized) h5py datasets are returned in order

to be filled with data:

  • data_dataset : Primary h5py dataset for the MSI data with shape data_shape and dtype data_type.
  • mz_dataset : h5py dataset for the mz axis data with shape [data_shape[2]] and dtype mzdata_type.

Returns:

data_group : The h5py object with the group in the HDF5 file where the data should be stored.

create_msidata_partial_cube(data_shape, mask, data_type='f', mzdata_type='f', chunks=None, compression=None, compression_opts=None, flush_io=True)

Create a new mass spectrometry imaging dataset for the given experiment written as a partial 3D cube of complete spectra.

Parameters:
  • data_shape – Shape of the dataset. Eg. shape=(10,10,10) creates a 3D dataset with 10 entries per dimension
  • mask – 2D boolean NumPy array used as mask to indicate which (x,y) locations have spectra associated with them.
  • data_type – numpy style datatype to be used for the dataset.
  • mzdata_type – numpy style datatype to be used for the mz data array.
  • chunks – Specify whether chunkning should be used (True,False), or specify the chunk sizes to be used in x,y, and m/z explicitly.
  • compression – h5py compression option. Compression strategy. Legal values are ‘gzip’, ‘szip’, ‘lzf’. Can also use an integer in range(10) indicating gzip.
  • compression_opts – h5py compression settings. This is an integer for gzip, 2-tuple for szip, etc.. For gzip (H5 deflate filter) this is the aggression paramter. The agression parameter is a number between zero and nine (inclusive) to indicate the tradeoff between speed and compression ratio (zero is fastest, nine is best ratio).
  • flush_io – Call flush on the HDF5 file to ensure all HDF5 bufferes are flushed so that all data has been written to file
Returns:

The following two empty (but approbriately sized) h5py datasets are returned in order to

be filled with data:

  • data_dataset : Primary h5py dataset for the MSI data with shape data_shape and dtype data_type.
  • mz_dataset : h5py dataset for the mz axis data with shape [data_shape[2]] and dtype mzdata_type.

Returns:

The following already complete dataset

  • xy_index_dataset : This dataset indicates for each xy location to which index in data_dataset the location corresponds to. This dataset is needed to identify where spectra need to be written to.

Returns:

data_group : The h5py object with the group in the HDF5 file where the data should be stored.

create_msidata_partial_spectra(spectra_length, len_global_mz, data_type='f', mzdata_type='f', chunks=None, compression=None, compression_opts=None, flush_io=True)

Create a new mass spectrometry imaging dataset for the given experiment written as a partial 3D cube of partial spectra.

Parameters:
  • spectra_length – 2D boolean NumPy array used indicating for each (x,y) locations the length of the corresponding partial spectrum.
  • len_global_mz – The total number of m/z values in the global m/z axis for the full 3D cube
  • data_type – The dtype for the MSI dataset
  • mzdata_type – The dtype for the mz dataset
  • mzdata_type – numpy style datatype to be used for the mz data array.
  • chunks – Specify whether chunkning should be used (True,False), or specify the chunk sizes to be used in x,y, and m/z explicitly.
  • compression – h5py compression option. Compression strategy. Legal values are ‘gzip’, ‘szip’, ‘lzf’. Can also use an integer in range(10) indicating gzip.
  • compression_opts – h5py compression settings. This is an integer for gzip, 2-tuple for szip, etc.. For gzip (H5 deflate filter) this is the aggression paramter. The agression parameter is a number between zero and nine (inclusive) to indicate the tradeoff between speed and compression ratio (zero is fastest, nine is best ratio).
  • flush_io – Call flush on the HDF5 file to ensure all HDF5 bufferes are flushed so that all data has been written to file
Returns:

The following two empty (but approbriatelu sized) h5py datasets are returned in order to be filled with data:

  • data_dataset : The primary h5py dataset for the MSI data with shape data_shape and dtype data_type.
  • mz_index_dataset : h5py dataset with the mz_index values
  • mz_dataset : h5py dataset for the mz axis data with shape [data_shape[2]] and dtype mzdata_type.

Returns:

The following already complete dataset

  • xy_index_dataset : This dataset indicates for each xy location at which index in data_dataset

    the corresponding spectrum starts. This dataset is needed to identify where spectra need to be written to.

  • xy_index_end_dataset : This dataset indicates for each xy location at which index in

    data_dataset the corresponding spectrum ends (exclusing the given value). This dataset is needed to identify where spectra need to be written to.

Returns:

data_group : The h5py object with the group in the HDF5 file where the data should be stored.

get_msidata(data_index, fill_space=True, fill_spectra=True, preload_mz=True, preload_xy_index=True)

Get the dataset with the given index for the given experiment.

For more detailed information about the use of the fill_space and fill_spectra and preload_mz and preload_xy_index options, see the init function of omsi.dataformat.omsi_file_msidata.

Parameters:
  • data_index (unsigned int) – Index of the dataset.
  • fill_space – Define whether the data should be padded in space (filled with 0’s) when accessing the data using [..] operator so that the data behaves like a 3D cube.
  • fill_spectra – Define whether the spectra should completed by adding 0’s so that all spectra retrived via the [..] opeator so that always spectra of the full length are returned.
  • preload_mz – Should the data for the mz axis be kept in memory or loaded on the fly when needed.
  • preload_xy_index – Should the xy index (if available) be preloaderd into memory or should the required data be loaded on the fly when needed.
Returns:

omsi_file_msidata object for the given data_index or None in case the data with given index does not exist or the access failed for any other reason.

get_msidata_by_name(data_name)

Get the h5py data object for the the msidata with the given name.

Parameters:data_name (string) – The name of the dataset
Returns:h5py object of the dataset or None in case the dataset is not found.
get_num_msidata()

Get the number of raw mass spectrometry images stored for a given experiment

Returns:Integer indicating the number of msi datasets available for the experiment.

analysis Module

Module for managing custom analysis data in OMSI HDF5 files.

class omsi.dataformat.omsi_file.analysis.omsi_analysis_manager(analysis_parent)

Bases: omsi.dataformat.omsi_file.common.omsi_file_object_manager

Analysis manager helper class used to define common functionality needed for analysis-related data. Usually, a class that defines a format that contains an omsi_file_analysis object will inherit from this class (in addition to omsi_file_common) to acquire the common features.

For more details see: omsi.dataforamt.omsi_file.omsi_common.omsi_file_object_manager

Variables:analysis_parent – The h5py.Group parent object containing the instrument object to be managed.
create_analysis(analysis, flush_io=True, force_save=False, save_unsaved_dependencies=True, mpi_root=0, mpi_comm=None)

Add a new group for storing derived analysis results for the current experiment

Create the analysis group using omsi_file_analysis.__create___ which in turn uses omsi_file_analysis.__populate_analysis__(...) to populate the group with the appropriate data.

NOTE: Dependencies are generally resolved to point to file objects. However, if save_unsaved_dependencies is set to False and a given in-memory dependency has not been saved yet, then the value associated with that dependency will be saved instead as part of the parameters and, hence, only the value of the dependency is persevered in that case and not the full dependency chain.

NOTE: Dependencies if they only exists in memory are typically saved recursively unless save_unsaved_dependencies is set to False. I.e, calling create_analysis may result in the creating of multiple other dependent analyses if they have not been saved before.

Parameters:
  • analysis (omsi.analysis.analysis_base:) – Instance of omsi.analysis.analysis_base defining the analysis
  • flush_io (bool) – Call flush on the HDF5 file to ensure all HDF5 bufferes are flushed so that all data has been written to file
  • force_save (bool) – Should we save the analysis even if it has been saved in the same location before? If force_save is False (default) and the self.omsi_analysis_storage parameter of the analysis object contains a matching storage location—i.e., same file and experiment—, then the analysis will not be saved again, but the object will only be retrieved from file. If force_save is True, then the analysis will be saved either way and the self.omsi_analysis_storage parameter will be extended.
  • save_unsaved_dependencies (bool) – If there are unsaved (in-memory) dependencies, then should those be saved to file as well? Default value is True, i.e, by default all in-memory dependencies that have not been saved yet, i.e, for which the self.omsi_analysis_storage of the corresponding omsi_analysis_ base object is empty, are saved as well. If in-memory dependencies have been saved before, then a link to those dependencies will be established, rather than re-saving the dependency.
  • mpi_root – The root MPI process that should perform the writing. This is to allow all analyses to call the function and have communication in the analysis.write_analysis_data function be handled.
  • mpi_comm – The MPI communicator to be used. None if default should be used (ie., MPI.COMM_WORLD
Returns:

The omsi_file_analysis object for the newly created analysis group and the integer index of the analysis. NOTE: if force_save is False (default), then the group returned may not be new but may be simply the first entry in the list of existing storage locations for the given analysis. NOTE: If we are in MPI parallel and we are on a core that does not write any data, then None is returned instead.

static create_analysis_static(analysis_parent, analysis, flush_io=True, force_save=False, save_unsaved_dependencies=True, mpi_root=0, mpi_comm=None)

Same as create_analysis(...) but instead of relying on object-level, this function allows additional parameters (specifically the analysis_parent) to be provided as input, rather than being determined based on self

Parameters:
  • analysis_parent – The h5py.Group object or omsi.dataformat.omsi_file.common.omsi_file_common object where the analysis should be created
  • kwargs – Additional keyword arguments for create_analysis(...). See create_analysis(...) for details.
Returns:

The output of create_analysis

get_analysis(analysis_index)

Get the omsi_format_analysis analysis object for the experiment with the given index.

Parameters:analysis_index (Unsigned integer) – The index of the analysis
Returns:omsi_file_analysis object for the requested analysis. The function returns None in case the analysis object was not found.
get_analysis_by_identifier(analysis_identifier_string)

Get the omsi_format_analysis analysis object for the the analysis with the given identifier.

Parameters:analysis_identifier_string (string) – The string used as identifier for the analysis.
Returns:h5py obejct of the analysis or None in case the analysis is not found.
get_analysis_identifiers()

Get a list of all identifiers for all analysis stored for the experiment

Returns:List of strings of analysis identifiers.
get_num_analysis()

Get the number of raw mass spectrometry images stored for a given experiment

Returns:Integer indicating the number of analyses available for the experiment.
class omsi.dataformat.omsi_file.analysis.omsi_file_analysis(analysis_group)

Bases: omsi.dataformat.omsi_file.dependencies.omsi_dependencies_manager, omsi.dataformat.omsi_file.common.omsi_file_common

Class for managing analysis specific data in omsi hdf5 files

Initialize the analysis object given the h5py object of the analysis group.

Parameters:analysis_group – The h5py object with the analysis group of the omsi hdf5 file.
get_all_analysis_data(load_data=False)

Get all analysis data associated with the analysis.

Parameters:load_data – load_data: Should the data be loaded or just the h5py objects be stored in the dictionary.
Returns:List of analysis_data objects with the names and h5py or numpy objects. Access using [index][‘name’] and [index][‘data’].
get_all_parameter_data(load_data=False, exclude_dependencies=False)

Get all parameter data associated with the analysis.

Parameters:load_data – Should the data be loaded or just the h5py objects be stored in the dictionary.
Returns:List of parameter_data objects with names and h5py or numpy object. Access using [index][‘name’] and [index][‘data’].
get_all_runinfo_data(load_data=False)

Get a dict of all runtime information stored in the file

Returns:omsi.shared.run_info_data.run_info_dict type python dict with the runtime information restored.
get_analysis_data_names()

This function returns all dataset names (and groups) that are custom to the analysis, i.e., that are not part of the omsi file standard.

Returns:List of analysis-specific dataset names.
get_analysis_data_shapes_and_types()

This function returns two dictionaries with all dataset names (and groups) that are custom to the analysis, i.e., that are not part of the omsi file standard, and idenifies the shape of the analysis data objects.

Returns:Dictonary indicating for each analysis-specific dataset its name (key) and shape (value). And a second dictionariy indicating the name (key) and dtype of the dataset.
get_analysis_identifier()

Get the identifier name of the analysis.

Use get_analysis_identifier()[...] to retrive the identifier string.

Returns:h5py object for the dataset with the identifier string. Returns None, in case no identifer exisits. This should not be the case for a valid OpenMSI file.
get_analysis_index()

Based on the name of the group, get the index of the analysis.

Returns:Integer index of the analysis in the file.
get_analysis_type()

Get the type for the analysis.

Use get_analysis_type()[...] tor retrieve the type string.

Returns:h5py object with the dataset of the analysis string. Returns, None in case no analysis type exists. This should not be the case in a valid omsi file.
recreate_analysis(**kwargs)

Load an analysis from file and re-execute it. This is equivalent to omsi_analysis.base.restore_analysis().execute()

Parameters:kwargs – Additional keyword arguments to be passed to the execute function of the analysis
Returns:Instance of the specific analysis object (e.g, omsi_nmf) that inherits from omsi.analysis.analysis_base with the input parameters and dependencies restored from file. The output, however, is the result from re-executing the analysis. None is returned in case the analysis object cannot be created.
restore_analysis(load_data=True, load_parameters=True, load_runtime_data=True, dependencies_omsi_format=True)

Load an analysis from file and create an instance of the appropriate analysis object defined by the analysis type (i.e., a derived class of omsi.analysis. analysis_base)

Parameters:
  • load_data – Should the analysis data be loaded from file (default) or just stored as h5py data objects
  • load_parameters – Should parameters be loaded from file (default) or just stored as h5py data objects.
  • load_runtime_data – Should runtime data be loaded from file (default) or just stored as h5py data objects.
  • dependencies_omsi_format – Should dependencies be loaded as omsi_file API objects (default) or just as h5py objects.
Returns:

Instance of the specific analysis object (e.g, omsi_nmf) that inherits from omsi.analysis.analysis_base with the input parameters, output result, and dependencies restored. We can call execute(..) on the returned object to rerun an analysis. May return analysis_generic in case that the specific analysis is not known.

dependencies Module

Base module for managing of dependencies between data in OpenMSI HDF5 files

class omsi.dataformat.omsi_file.dependencies.omsi_dependencies_manager(dependencies_parent)

Bases: omsi.dataformat.omsi_file.common.omsi_file_object_manager

Dependencies manager helper class used to define common functionality needed for managing dependencies. Usually, a class that defines a format that contains an omsi_file_dependencies object will inherit from this class (in addition to omsi_file_common) to acquire the common features.

For more details see: omsi.dataforamt.omsi_file.omsi_common.omsi_file_object_manager

Variables:
  • dependencies_parent – h5py.Group object containing the dependencies object(s) to be managed
  • dependencies – omsi_file_dependencies object managed by this object or None
Parameters:

dependencies_parent – Parent group containing the dependencies object to be managed

add_dependency(dependency, flush_io=True)

Create a new dependency for this dataset

Parameters:
  • dependency – omsi.shared.dependency_dict object describing the data dependency
  • flush_io – Call flush on the HDF5 file to ensure all HDF5 bufferes are flushed so that all data has been written to file
Returns:

omsi_file_dependencydata object with the dependency data or None in case that an error occurred and the dependency has not been generated.

create_dependencies(dependencies_data_list=None)

Create a managed group for storing data dependencies if none exists and store the given set of dependencies in it. If a self.dependencies object already exists, then the given dependencies will be added.

This is effectively a shortcut to omsi_file_dependencies.__create___(...) with specific settings for the current dependencies object managed by self.

Parameters:dependencies_data_list – List of dependency_dict objects to be stored as dependencies. Default is None which is mapped to an empty list []
Returns:omsi_file_dependencies object created by the function.
get_all_dependency_data(omsi_dependency_format=True)

Get all direct dependencies associated with the data object.

This is convenience function providing access to self.dependencies.get_all_dependency_data(...) which is a function of omsi_file_dependencies class.

Parameters:omsi_dependency_format – Should the dependencies be retrieved as omsi_analysis_dependency object (True) or as an omsi_file_dependencydata object (False).
Returns:List dependency_dict objects containing either omsi file API objects or h5py objects for the dependencies. Access using [index][‘name’] and [index][‘data’].
get_all_dependency_data_graph(include_omsi_dependency=False, include_omsi_file_dependencydata=False, recursive=True, level=0, name_key='name', prev_nodes=None, prev_links=None, parent_index=None, metadata_generator=None, metadata_generator_kwargs=None)

Get all direct and indirect dependencies associated with the analysis in form of a graph describing all nodes and links in the provenance hierarchy.

This is convenience function providing access to self.dependencies.get_all_dependency_data_graph(...) which is a function of omsi_file_dependencies class.

Parameters:
  • include_omsi_dependency – Should the dependency_dict object be included in the entries in the nodes dict?
  • include_omsi_file_dependencydata – Should the omsi_file_dependencydata object be included in the entries in the nodes dict?
  • recursive – Should we trace dependencies recursively to construct the full graph, or only the direct dependencies. Default true (ie., trace recursively)
  • name_key – Which key should be used in the dicts to indicate the name of the object? Default value is ‘name’
  • level – Integer used to indicated the recursion level. Default value is 0.
  • prev_nodes – List of nodes that have been previously generated. Note, this list will be modified by the call. Note, each node is represented by a dict which is expected to have at least the following keys defined, path, name_key, level (name_key refers to the key defined by the input parameter name_key.
  • prev_links – Previouly established links in the list of nodes. Note, this list will be modified by the call.
  • parent_index – Index of the parent node in the list of prev_nodes for this call.
  • metadata_generator

    Optional parameter. Pass in a function that generates additional metadata about a given omsi API object. Note, the key’s level and path and name (i.e., name_key) are already set by this function. The metadata_generator may overwrite these key’s, however, the path has to be unique as it is used to identify duplicate nodes. Overwriting the path with a non-unique value, hence, will lead to errors (missing entries) when generating the graph. Note, the metadata_generator function needs to support the following keyword arguments:

    • inDict : The dictionary to which the metadata should be added to.
    • obj : The omsi file API object for which metadata should be generated
    • name : A qualifying name for the object
    • name_key : The key to be used for storing the name
  • metadata_generator_kwargs – Dictionary of additional keyword arguments that should be passed to the metadata_generator function.
Returns:

Dictionary containing two lists. 1) nodes : List of dictionaries, describing the elements

in the dependency graph. 2) links : List of tuples with the links in the graph. Each tuple consists of two integer indices for the nodes list. For each node the following entries are given:

  • dependency_dict: Optional key used to store the corresponding dependency_dict object.

    Used only of include_omsi_dependency is True.

  • omsi_file_dependencydata: Optional key used to store the corresponding

    omsi_file_dependencydata object. Used only of include_omsi_file_dependencydata is True.

  • name : Name of the dependency. The actual key is sepecified by name_key

  • level : The recursion level at which the object occurs.

  • ... : Any other key/value pairs from the dependency_dict dict.

get_all_dependency_data_recursive(omsi_dependency_format=True, omsi_main_parent=None, dependency_list=None)

Get all direct and indirect dependencies associated with the data object.

This is convenience function providing access to self.dependencies.get_all_dependency_data_recursive(...) which is a function of omsi_file_dependencies class.

NOTE: omsi_main_parent and omsi_main_parent are used primarily to ensure that the case of circular dependencies are supported properly. Circular dependencies may occur in the case of semantic dependencies (rather than pure use dependencies), e.g., two datasets that are related modalities may reference each other, e.g., MS1 pointing to related MS2 data and the MS2 datasets referencing the corresponding MS1 datasets.

Parameters:
  • omsi_dependency_format – Should the dependencies be retrieved as dependency_dict object (True) or as an omsi_file_dependencydata object (False)
  • omsi_main_parent – The main parent for which the dependencies are calculated. This is needed to avoid recursion back into the main parent for which we are computing dependencies and avoiding that it is added itself as a dependency for itself. If set to None, then we will use our own self.dependencies_parent object
  • dependency_list – List of previously visited/created dependencies. This is needed only to avoid deep recursion and duplication due to circular dependencies
Returns:

List analysis_data objects containing either omsi file API interface objects or h5py objects for the dependcies. Access using [index][‘name’] and [index][‘data’].

has_dependencies()

Check whether any dependencies exists for this datasets

class omsi.dataformat.omsi_file.dependencies.omsi_file_dependencies(dependencies_group)

Bases: omsi.dataformat.omsi_file.common.omsi_file_common

Class for managing collections of dependencies.

** Use of super()**

This class inherits from omsi.dataformat.omsi_file.common.omsi_file_common. Consistent with the design pattern for multiple inheritance of the omsi.dataformat.omsi_file module, the __init__ function calls super(...).__init__(manager_group) with a single parameter indicating the parent group.

static _omsi_file_dependencies__create_dependency_graph_node(level, name, path, dependency_object, omsi_object, include_omsi_dependency=False, include_omsi_file_dependencydata=False, name_key='name', metadata_generator=None, metadata_generator_kwargs=None)

Internal helper function used to create a new node in the graph

Parameters:
  • level – The recursion level at which the node exists
  • name – The name of the node
  • path – The path of the node
  • dependency_object – The omsi_file_dependencydata object. May be None in case a node to a specific object is set
  • omsi_object – The OpenMSI file API object. This is required and may NOT be None.
  • include_omsi_dependency – Should the dependency_dict object be included in the entries in the nodes dict?
  • include_omsi_file_dependencydata – Should the omsi_file_dependencydata object be included in the entries in the nodes dict?
  • name_key – Which key should be used in the dicts to indicate the name of the object? Default value is ‘name’
  • metadata_generator

    Optional parameter. Pass in a function that generates additional metadata about a given omsi API object. Note, the key’s level and path and name (i.e., name_key) are already set by this function. The metadata_generator may overwrite these key’s, however, the path has to be unique as it is used to identify duplicate nodes. Overwriting the path with a non-unique value, hence, will lead to errors (missing entries) when generating the graph. Note, the metadata_generator function needs to support the following keyword arguments:

    • in_dict : The dictionary to which the metadata should be added to.
    • obj : The omsi file API object for which metadata should be generated
    • name : A qualifying name for the object
    • name_key : The key to be used for storing the name
  • metadata_generator_kwargs – Dictionary of additional keyword arguments that should be passed to the metadata_generator function.
Returns:

Dict describing the new node, containing the ‘name’, ‘level’, and ‘path’ and optionally ‘dependency_dict’ and/or ‘omsi_file_dependencydata’ and any additional data generated by the metadata_generator function

add_dependency(dependency_data)

Add a new dependency to the collection.

Parameters:dependency_data (omsi.shared.omsi_dependency_data) – The analysis dependency specification.
Returns:the newly created omsi_file_dependencydata object
Raises:KeyError in case that a dependency with the same name already exists
get_all_dependency_data(omsi_dependency_format=True)

Get all direct dependencies associated with the analysis.

Parameters:omsi_dependency_format – Should the dependencies be retrieved as omsi_analysis_dependency object (True) or as an omsi_file_dependencydata object (False).
Returns:List dependency_dict objects containing either omsi file API objects or h5py objects for the dependencies. Access using [index][‘name’] and [index][‘data’].
get_all_dependency_data_graph(include_omsi_dependency=False, include_omsi_file_dependencydata=False, recursive=True, level=0, name_key='name', prev_nodes=None, prev_links=None, parent_index=None, metadata_generator=None, metadata_generator_kwargs=None)

Get all direct and indirect dependencies associated with the analysis in form of a graph describing all nodes and links in the provenance hierarchy.

Parameters:
  • include_omsi_dependency – Should the dependency_dict object be included in the entries in the nodes dict?
  • include_omsi_file_dependencydata – Should the omsi_file_dependencydata object be included in the entries in the nodes dict?
  • recursive – Should we trace dependencies recursively to construct the full graph, or only the direct dependencies. Default true (ie., trace recursively)
  • name_key – Which key should be used in the dicts to indicate the name of the object? Default value is ‘name’
  • level – Integer used to indicated the recursion level. Default value is 0.
  • prev_nodes – List of nodes that have been previously generated. Note, this list will be modified by the call. Note, each node is represented by a dict which is expected to have at least the following keys defined, path, name_key, level (name_key refers to the key defined by the input parameter name_key.
  • prev_links – Previously established links in the list of nodes. Note, this list will be modified by the call.
  • parent_index – Index of the parent node in the list of prev_nodes for this call. May be None in case the parent we are calling this function for is not yet in the list. If None, then we will add our own parent that contains the dependencies to the list.
  • metadata_generator

    Optional parameter. Pass in a function that generates additional metadata about a given omsi API object. Note, the key’s level and path and name (i.e., name_key) are already set by this function. The metadata_generator may overwrite these key’s, however, the path has to be unique as it is used to identify duplicate nodes. Overwriting the path with a non-unique value, hence, will lead to errors (missing entries) when generating the graph. Note, the metadata_generator function needs to support the following keyword arguments:

    • in_dict : The dictionary to which the metadata should be added to.
    • obj : The omsi file API object for which metadata should be generated
    • name : A qualifying name for the object
    • name_key : The key to be used for storing the name
  • metadata_generator_kwargs – Dictionary of additional keyword arguments that should be passed to the metadata_generator function.
Returns:

Dictionary containing two lists. 1) nodes : List of dictionaries, describing the elements

in the dependency graph. 2) links : List of tuples with the links in the graph. Each tuple consists of two integer indices for the nodes list. For each node the following entries are given:

  • dependency_dict: Optional key used to store the corresponding dependency_dict object.

    Used only of include_omsi_dependency is True.

  • omsi_file_dependencydata: Optional key used to store the corresponding

    omsi_file_dependencydata object. Used only of include_omsi_file_dependencydata is True.

  • name : Name of the dependency. The actual key is sepecified by name_key

  • level : The recursion level at which the object occurs.

  • path : The full path to the object

  • filename : The full path to the file

  • ... : Any other key/value pairs from the dependency_dict dict.

get_all_dependency_data_recursive(omsi_dependency_format=True, omsi_main_parent=None, dependency_list=None)

Get all direct and indirect dependencies associated with the analysis.

NOTE: omsi_main_parent and omsi_main_parent are used primarily to ensure that the case of circular dependencies are supported properly. Circular dependencies may occur in the case of semantic dependencies (rather than pure use dependencies), e.g., two datasets that are related modalities may reference each other, e.g., MS1 pointing to related MS2 data and the MS2 datasets referencing the corresponding MS1 datasets.

Parameters:
  • omsi_dependency_format – Should the dependencies be retrieved as dependency_dict object (True) or as an omsi_file_dependencydata object (False)
  • omsi_main_parent – The main parent for which the dependencies are calculated. This is needed to avoid recursion back into the main parent for which we are computing dependencies and avoiding that it is added itself as a dependency for itself. If set to None, then we will use the omsi_object associated with the parent group of the dependency group.
  • dependency_list – List of previously visited/created dependencies. This is needed only to avoid deep recursion and duplication due to circular dependencies
Returns:

List analysis_data objects containing either omsi file API interface objects or h5py objects for the dependcies. Access using [index][‘name’] and [index][‘data’].

get_dependency_omsiobject(name, recursive=True)
Get the omsi file API object corresponding to the object the dependency is pointing to.
Parameters:
  • name – Name of the dependency opbject to be loaded .
  • recursive – Should the dependency be resolved recursively, i.e., if the dependeny points to another dependencies. Default=True.
Returns:

An omsi file API object (e.g., omsi_file_analysis or omsi_file_msidata) if the link points to a group or the h5py.Dataset the link is pointing to.

get_omsi_file_dependencydata(name)

Retrieve the omsi_file_dependencydata object for the dependency with the given name.

class omsi.dataformat.omsi_file.dependencies.omsi_file_dependencydata(dependency_group)

Bases: omsi.dataformat.omsi_file.common.omsi_file_common

Class for managing data groups used for storing data dependencies

Create a new omsi_file_dependencydata object for the given h5py.Group

Parameters:dependency_group (h5py.Group with a corresponding omsi type) – h5py.Group object with the dependency data
get_dataset_name()

Get the string indicating the name of dataset. This may be empty as it is only used if the dependency points to an objec within a managed omsi API object.

Returns:String indicating the name of the optional dataset.
get_dependency_objecttype(recursive=True)

Indicated the type of the object the dependency is pointing to.

Parameters:recursive – Should dependencies be resolved recursively, i.e., if the dependency points to another dependencies. Default=True.
Returns:String indicating the class of the omsi file API class that is suited to manage the dependency link or the name of the corresponding h5py class.
get_dependency_omsiobject(recursive=True, external_mode=None)

Get the omsi file API object corresponding to the object the dependency is pointing to.

Parameters:
  • recursive – Should dependencies be resolved recursively, i.e., if the dependency points to another dependencies. Default=True.
  • external_mode – The file open mode (e.g., ‘r’, ‘a’) to be used when we encounter external dependencies, i.e., dependencies that are stored in external files. By default this is set to None, indicating that the same mode should be used in which this (i.e,. the current file describing the dependency) was opened. Allowed modes are ‘r’, ‘r+’, and ‘a’. The modes ‘w’, ‘w+, ‘x’ are prohibited to ensure that we do not break external files.
Returns:

An omsi file API object (e.g., omsi_file_analysis or omsi_file_msidata) if the link points to a group or the h5py.Dataset the link is pointing to.

get_dependency_type()

Get the string describing the type of the dependency

NOTE: If the type is missing in the file but we have a parameter name specified, then the default type ‘parameter’ will be returned other None is returned.

Returns:String indicating the type of the dependency or None if the type is not known.

Get the name of the dependency link

Returns:String indicating the name of the dependency link.
get_mainname()

Get the main name string describing the name of the object (and possibly path of the file if external)

Returns:String indicating the main name of the object that we link to
get_omsi_dependency()

Get the dependency information as an omsi.shared.dependency_dict object (as defined in the omsi.shared.dependency_dict module)

Returns:dependency_dict object with all the dependency data.
get_parameter_help()

Get the help string for the parameter name if available.

get_parameter_name()

Get the string indicating the name of the dependend parameter of the analysis.

Returns:String of the parameter name that has the dependency.
get_selection_string()

String indicating the applied selection. This is an empty string in case no selection was applied.

Returns:Selection string. See the omsi.shared.omsi_data_selection for helper functions to deal with selection strings.