analysis Package

omsi.analysis Package containing the base classes that facilitate the integration of new analysis with the BASTet software stack (e.g, the file format) and collection of specific analysis functionality.
omsi.analysis.base Module specifying the base analysis API for integrating new analysis with the toolkit and the OpenMSI science gateway.
omsi.analysis.generic Generic analysis class used to represent analyses of unknown type, e.g., when loading a custom user-defined analysis from file for which the indicate class may not be available with the local installation.
omsi.analysis.analysis_views Helper module with functions and classes for interfacing with different analysis algorithms.
omsi.analysis.compound_stats Package containing shared third-party code modules included here to reduce the need for external dependencies when only small parts of external code are used.
omsi.analysis.compound_stats.omsi_score_compounds
omsi.analysis.findpeaks Package of peak-finding related analysis modules.
omsi.analysis.msi_filtering Module with third-party modules, functions, classes used by some of the analysis modules in the containing package.
omsi.analysis.multivariate_stats Multivariate statistics analysis

analysis Package

Package containing the base classes that facilitate the integration of new analysis with the BASTet software stack (e.g, the file format) and collection of specific analysis functionality.

class omsi.analysis.analysis_data(name='undefined', data=None, dtype='float32')

Bases: dict

Define an output dataset for the analysis that should be written to the omsi HDF5 file

The class can be used like a dictionary but restricts the set of keys that can be used to the following required keys which should be provided during initalization.

Required Keyword Arguments:

Parameters:
  • name – The name for the dataset in the HDF5 format
  • data – The numpy array to be written to HDF5. The data write function omsi_file_experiment.create_analysis used for writing of the data to file can in principal also handel other primitive data types by explicitly converting them to numpy. However, in this case the dtype is determined based on the numpy conversion and correct behavior is not guaranteed. I.e., even single scalars should be stored as a 1D numpy array here. Default value is None which is mapped to np.empty( shape=(0) , dtype=dtype) in __init__
  • dtype

    The data type to be used during writing. For standard numpy data types this is just the dtype of the dataset, i.e., [‘data’].dtype. Other allowed datatypes are:

    • For string: omsi_format.str_type (omsi_format is located in omsi.dataformat.omsi_file )
    • To generate data links: ana_hdf5link (analysis_data)
class omsi.analysis.analysis_base

Bases: omsi.datastructures.analysis_data.parameter_manager

Base class for omsi analysis functionality. The class provides a large set of functionality designed to facilitate storage of analysis data in the omsi HDF5 file format. The class also provides a set of functions to enable easy intergration of new analysis with the OpenMSI web-based viewer (see Viewer functions below for details).

Slicing:

This class supports basic slicing to access data stored in the main member variables. By default the data is retrieved from __data_list and the __getitem__(key) function. which implements the [..] operator, returns __data_list[key][‘data’]. The key is a string indicating the name of the parameter to be retrieved. If the key is not found in the __data_list then the function will try to retrieve the data from self.parameters list instead. By adding “parameter/key” or “dependency/key” one may also explicitly retrieve values from the parameters.

Instance Variables:

Variables:
  • analysis_identifier – Define the name for the analysis used as key in search operations
  • __data_list – List of analysis_data to be written to the HDF5 file. Derived classes need to add all data that should be saved for the analysis in the omsi HDF5 file to this dictionary. See omsi.analysis.analysis_data for details.
  • parameters – List of parameter_data objects of all analysis parameters (including those that may have dependencies).
  • data_names – List of strings of all names of analysis output datasets. These are the target keys for __data_list.
  • profile_time_and_usage – Boolean indicating whether we should profile the execute_analysis(...) function when called as part of the execute(...) function. The default value is false. Use the enable_time_and_usage_profiling(..) function to determine which profiling should be performed. The time_and _usage profile uses pythons cProfile (or Profile) to monitor how often and for how long particular parts of the analysis code executed.
  • profile_memory – Boolean indicating whether we should monitor memory usage (line-by-line) when executing the execute_analysis(...) function. The default value is false. Use the enable_time_and_usage_profiling(..) function to determine which profiling should be performed.
  • omsi_analysis_storage – List of omsi_file_analysis object where the analysis is stored. The list may be empty.
  • mpi_comm – In case we are running with MPI, this is the MPI communicator used for runnign the analysis. Default is MPI.Comm_world/
  • mpi_root – In case we are running with MPI, this is the root rank where data is collected to (e.g., runtime data and analysis results)
  • update_analysis – If the value is True, then we should execute the analysis before using the outputs. If False, then the analysis has been executed with the current parameter settings.
  • driver – Workflow driver to be used when executing multiple analyses, e.g., via execute_recursive or execute_all. Default value is None in which case a new default driver will be used each time we execute a workflow.

Execution Functions:

  • execute : Then main function the user needs to call in order to execute the analysis
  • ``execute_analysis: This function needs to be implemented by child classes of analysis_base to implement the specifics of executing the analysis.

I/O functions:

These functions can be optionally overwritten to control how the analysis data should be written/read from the omsi HDF5 file. Default implementations are provided here, which should be sufficient for most cases.

  • add_custom_data_to_omsi_file: The default implementation is empty as the default data write is managed by the omsi_file_experiment.create_analysis() function. Overwrite this function, in case that the analysis needs to write data to the HDF5 omsi file beyond what the defualt omsi data API does.
  • read_from_omsi_file: The default implementation tries to reconstruct the original data as far as possible, however, in particular in case that a custom add_custom_data_to_omsi_file function has been implemented, the default implementation may not be sufficien. The default implementation reconstructs: i) analysis_identifier and reads all custom data into ii)__data_list. Note, an error will be raised in case that the analysis type specified in the HDF5 file does not match the analysis type specified by get_analysis_type(). This function can be optionally overwritten to implement a custom data read.

Viewer functions:

Several convenient functions are used to allow the OpenMSI online viewer to interact with the analysis and to visualize it. The default implementations provided here simply indicate that the analysis does not support the data access operations required by the online viewer. Overwrite these functions in the derived analysis classes in order to interface them with the viewer. All viewer-related functions start with v\_... .

NOTE: the default implementation of the viewer functions defined in analysis_base are designed to take care of the common requirement for providing viewer access to data from all dependencies of an analysis. In many cases, the default implementation is often sill called at the end of custom viewer functions.

NOTE: The viewer functions typically support a viewer_option parameter. viewer_option=0 is expected to refer to the analysis itself.

  • v_qslice: Retrieve/compute data slices as requested via qslice URL requests. The corresponding view of the DJANGO data access server already translates all input parameters and takes care of generating images/plots if needed. This function is only responsible for retrieving the data.
  • v_qspectrum: Retrieve/compute spectra as requested via qspectrum URL requests. The corresponding view of the DJANGO data access server already translates all input parameters and takes care of generating images/plots if needed. This function is only responsible for retrieving the data.
  • v_qmz: Define the m/z axes for image slices and spectra as requested by qspectrum URL requests.
  • v_qspectrum_viewer_options: Define a list of strings, describing the different viewer options available for the analysis for qspectrum requests (i.e., v_qspectrum). This feature allows the analysis developer to define multiple different visualization modes for the analysis. For example, when performing a data reduction (e.g., PCA or NMF) one may want to show the raw spectra or the loadings vector of the projection in the spectrum view (v_qspectrum). By providing different viewer options we allow the user to decide which option they are most interested in.
  • v_qslice_viewer_options: Define a list of strings, describing the different viewer options available for the analysis for qslice requests (i.e., v_qslice). This feature allows the analysis developer to define multiple different visualization modes for the analysis. For example, when performing a data reduction (e.g., PCA or NMF) one may want to show the raw spectra or the loadings vector of the projection in the spectrum view (v_qspectrum). By providing different viewer options we allow the user to decide which option they are most interested in.

Initialize the basic data members

add_custom_data_to_omsi_file(analysis_group)

This function can be optionally overwritten to implement a custom data write function for the analysis to be used by the omsi_file API.

Note, this function should be used only to add additional data to the analysis group. The data that is written by default is still written by the omsi_file_experiment.create_analysis() function, i.e., the following data is written by default: i) analysis_identifier ,ii) get_analysis_type, iii)__data_list, iv) parameters, v) runinfo . Since the omsi_file.experiment.create_analysis() functions takes care of setting up the basic structure of the analysis storage (included the subgroubs for storing parameters and data dependencies) this setup can generally be assumed to exist before this function is called. This function is called automatically at the end omsi_file.experiment.create_analysis() (i.e, actually omsi_file_analysis.__populate_analysis__(..) so that this function typically does not need to be called explicitly.

Parameters:analysis_group – The h5py.Group object where the analysis is stored.
add_parameter(name, help, dtype=<type 'unicode'>, required=False, default=None, choices=None, data=None, group=None)

Add a new parameter for the analysis. This function is typically used in the constructor of a derived analysis to specify the parameters of the analysis.

Parameters:
  • name – The name of the parameter
  • help – Help string describing the parameter
  • type – Optional type. Default is string.
  • required – Boolean indicating whether the parameter is required (True) or optional (False). Default False.
  • default – Optional default value for the parameter. Default None.
  • choices – Optional list of choices with allowed data values. Default None, indicating no choices set.
  • data – The data assigned to the parameter. None by default.
  • group – Optional group string used to organize parameters. Default None, indicating that parameters are automatically organized by driver class (e.g. in required and optional parameters)
Raises:

ValueError is raised if the parameter with the given name already exists.

analysis_identifier_defined()

Check whether the analysis identifier is defined by the user, i.e., set to value different than undefined :return: bool

check_ready_to_execute()

Check if all inputs are ready to determine if the analysis is ready to run.

Returns:List of omsi_analysis_parameter objects that are not ready. If the returned list is empty, then the analysis is ready to run.
clear_analysis()

Clear all analysis data—i.e., parameter, dependency data, output results, runtime data

clear_analysis_data()

Clear the list of analysis data

clear_and_restore(analysis_manager=None, resave=False)

Clear all analysis data and restore the results from file

Parameters:
  • analysis_manager – Instance of omsi_analysis_manager (e.g., an omsi_file_experiment) where the analysis should be saved.
  • resave – Boolean indicating whether the analysis should be saved again, even if it has been saved before. This parameter only has effect if analysis_manager is given.
Returns:

self, i.e., the updated analysis object with all data replaced with HDF5 references

clear_parameter_data()

Clear the list of parameter data

clear_run_info_data()

Clear the runtime information data

define_missing_parameters()

Called by the execute function before self.update_analysis_parameters to set any required parameters that have not been defined to their respective default values.

This function may be overwritten in child classes to customize the definition of default parameter values and to apply any modifications (or checks) of parameters before the analysis is executed. Any changes applied here will be recorded in the parameter of the analysis.

enable_memory_profiling(enable=True)

Enable or disable line-by-line profiling of memory usage of execute_analysis.

Parameters:enable_memory (bool) – Enable (True) or disable (False) line-by-line profiling of memory usage
Raises:ImportError is raised if a required package for profiling is not available.
enable_time_and_usage_profiling(enable=True)

Enable or disable profiling of time and usage of code parts of execute_analysis.

Parameters:enable (bool) – Enable (True) or disable (False) profiling
Raises:ImportError is raised if a required package for profiling is not available.
execute(**kwargs)

Use this function to run the analysis.

Parameters:kwargs – Parameters to be used for the analysis. Parameters may also be set using the __setitem__ mechanism or as batches using the set_parameter_values function.
Returns:This function returns the output of the execute analysis function.
Raises:AnalysisReadyError in case that the analysis is not ready to be executed. This may be the case, e.g, when a dependent input parameter is not ready to be used.
classmethod execute_all(force_update=False, executor=None)

Execute all analysis instances that are currently defined.

Parameters:
  • force_update – Boolean indicating whether we should force that all analyses are executed again, even if they have already been run with the same settings before. False by default.
  • executor – Optional workflow executor to be used for the execution of all analyses. The executor will be cleared and then all analyses will be added to executor. Default value is None, in which case the function creates a default executor to be used.
Returns:

The workflow executor used

execute_analysis()

Implement this function to implement the execution of the actual analysis.

This function may not require any input parameters. All input parameters are recorded in the parameters and dependencies lists and should be retrieved from there, e.g, using basic slicing self[ paramName ]

Input parameters may be added for internal use ONLY. E.g, we may add parameters that are used internally to help with parallelization of the execute_analysis function. Such parameters are not recorded and must be strictly optional so that analysis_base.execute(...) can call the function.

Returns:This function may return any developer-defined data. Note, all output that should be recorded must be put into the data list.
execute_recursive(**kwargs)

Recursively execute this analysis and all its dependencies if necessary

We use a workflow driver to control the execution. To define the workflow driver we can set the self.driver variable. If no workflow driver is given (i.e, self.driver==None), then the default driver will be created. To change the default driver, see omsi.workflow.base.workflow_executor_base.DEFAULT_EXECUTOR_CLASS

Parameters:kwargs – Parameters to be used for the analysis. Parameters may also be set using the __setitem__ mechanism or as batches using the set_parameter_values function.
Returns:Same as execute
get_all_analysis_data()

Get the complete list of all analysis datasets to be written to the HDF5 file

get_all_dependency_data()

Get the complete list of all direct dependencies to be written to the HDF5 file

NOTE: These are only the direct dependencies as specified by the analysis itself. Use get_all_dependency_data_recursive(..) to also get the indirect dependencies of the analysis due to dependencies of the dependencies themselves.

Returns:List of parameter_data objects that define dependencies.
get_all_parameter_data(exclude_dependencies=False)

Get the complete list of all parameter datasets to be written to the HDF5 file

Parameters:exclude_dependencies – Boolean indicating whether we should exclude parameters that define dependencies from the list
get_all_run_info()

Get the dict with the complete info about the last run of the analysis

get_analysis_data(index)

Given the index return the associated dataset to be written to the HDF5 file

:param index : Retrun the index entry of the private member __data_list.

get_analysis_data_by_name(dataname)

Given the key name of the data return the associated analysis_data object.

Parameters:dataname – Name of the analysis data requested from the private __data_list member.
Returns:The analysis_data object or None if not found.
get_analysis_data_names()

Get a list of all analysis dataset names.

get_analysis_identifier()

Return the name of the analysis used as key when searching for a particular analysis

classmethod get_analysis_instances()

Generator function used to iterate through all instances of analysis_base. The function creates references for all weak references stored in cls._analysis_instances and returns the references if it exists and cleans up the any invalid references after the iteration is complete. :return: References to analysis_base objects

get_analysis_type()

Return a string indicating the type of analysis performed

static get_default_dtypes()

Get a list of available default dtypes used for analyses. Same as data_dtypes.get_dtypes().

static get_default_parameter_groups()

Get a list of commonly used parameter groups and associated descriptions.

Use of default groups provides consistency and allows other system to design custom behavior around the semantic of parameter groups

Returns:Dictionary where the keys are the short names of the groups and the values are dicts with following keys:value pairs: ‘name’ , ‘description’. Use the ‘name’ to define the group to be used.
get_help_string()

Get a string describing the analysis.

Returns:Help string describing the analysis and its parameters
get_memory_profile_info()

Based on the memory profile of the execute_analysis(..) function get the string describing the line-by-line memory usage.

Returns:String describing the memory usage profile. None is returned in case that no memory profiling data is available.
get_num_analysis_data()

Retrun the number of analysis datasets to be wirtten to the HDF5 file

get_num_dependency_data()

Return the number of dependencies to be wirtten to the HDF5 file

get_num_parameter_data()

Return the number of parameter datasets to be wirtten to the HDF5 file

get_omsi_analysis_storage()

Get a list of known locations where this analysis has been saved.

Returns:List of omsi.dataformat.omsi_file.analysis. omsi_file_analysis objects where the analysis is saved.
get_parameter_data(index)

Given the index return the associated dataset to be written to the HDF5 file

:param index : Return the index entry of the private member parameters.

get_parameter_data_by_name(dataname)

Given the key name of the data return the associated parameter_data object.

Parameters:dataname – Name of the parameter requested from the parameters member.
Returns:The parameter_data object or None if not found
get_parameter_names()

Get a list of all parameter dataset names (including those that may define dependencies.

get_profile_stats_object(consolidate=True, stream=None)

Based on the execution profile of the execute_analysis(..) function get pstats.Stats object to help with the interpretation of the data.

Parameters:
  • consolidate – Boolean flag indicating whether multiple stats (e.g., from multiple cores) should be consolidated into a single stats object. Default is True.
  • stream – The optional stream parameter to be used fo the pstats.Stats object.
Returns:

A single pstats.Stats object if consolidate is True. Otherwise the function returns a list of pstats.Stats objects, one per recorded statistic. None is returned in case that the stats objects cannot be created or no profiling data is available.

has_omsi_analysis_storage()

Check whether a storage location is known where the anlaysis has been saved.

Returns:Boolean indicating whether self.omsi_analysis_storage is not empty
keys()

Get a list of all valid keys, i.e., a combination of all input parameter and output names.

Returns:List of strings with all input parameter and output names.
classmethod locate_analysis(data_object, include_parameters=False)

Given a data_object try to locate the analysis that creates the object as an output of its execution (and optionally analyses that have the object as an input).

Parameters:
  • data_object – The data object of interest.
  • include_parameters – Boolean indicating whether also input parameters should be considered in the search in addition to the outputs of an analysis
Returns:

dependency_dict pointing to the relevant object or None in case the object was not found.

read_from_omsi_file(analysis_object, load_data=True, load_parameters=True, load_runtime_data=True, dependencies_omsi_format=True, ignore_type_conflict=False)

This function can be optionally overwritten to implement a custom data read.

The default implementation tries to reconstruct the original data as far as possible, however, in particular in case that a custom add_custom_data_to_omsi_file function has been implemented, the default implementation may not be sufficient. The default implementation reconstructs: i) analysis_identifier and reads all custom data into iii)__data_list. Note, an error will be raised in case that the analysis type specified in the HDF5 file does not match the analysis type specified by get_analysis_type()

Parameters:
  • analysis_object – The omsi_file_analysis object associated with the hdf5 data group with the analysis data_list
  • load_data – Should the analysis data be loaded from file (default) or just stored as h5py data objects
  • load_parameters – Should parameters be loaded from file (default) or just stored as h5py data objects.
  • load_runtime_data – Should runtime data be loaded from file (default) or just stored as h5py data objects
  • dependencies_omsi_format – Should dependencies be loaded as omsi_file API objects (default) or just as h5py objects.
  • ignore_type_conflict – Set to True to allow the analysis to be loaded into the current analysis object even if the type indicated in the file does not match the class. Default value is False. This behavior can be useful when different analysis have compatible data structures or when we want to load the data in to a generic analysis container, e.g, analysis_generic.
Returns bool:

Boolean indicating whether the data was read successfully

Raise:

TypeError : A type error will be raised in case that the analysis type specified by the file does not match the analysis type provided by self.get_analysis_type()

record_execute_analysis_outputs(analysis_output)

Function used internally by execute to record the output of the custom execute_analysis(...) function to the __data_list.

This function may be overwritten in child classes in order to customize the behavior for recording data outputs. Eg., for some analyses one may only want to record a particular set of outputs, rather than all outputs generated by the analysis.

Parameters:analysis_output – The output of the execute_analysis(...) function to be recorded
results_ready()

Check whether the results of the analysis are ready to be used :return: Boolean

set_analysis_identifier(identifier)

Set the name of the analysis to identifer

Side Effects: This function modifies self.analysis_identifier

Parameters:identifier (str) – The new analysis identifier string to be used (should be unique)
set_parameter_values(**kwargs)

Set all parameters given as input to the function. The inputs are placed in the self.parameters list. If the parameter refers to an existing h5py.Dataset, h5py.Group, managed h5py object, or is an instance of an existing omis_analysi_base object, then a dependency_dict will be created and stored as value instead.

Parameters:kwargs – Dictionary of keyword arguments. All keys are expected to be strings. All values are expected to be either i) numpy arrays, ii) int, float, str or unicode variables, iii) h5py.Dataset or h5py.Group, iv) or any the omsi_file API class objects. For iii) and iv) one may provide a tuple consisting of the dataobject t[0] and an additional selection string t[1].
update_analysis_parameters(**kwargs)

Record the analysis parameters passed to the execute() function.

The default implementation simply calls the set_parameter_values(...) function. This function may be overwritten to customize the behavior of how parameters are recorded by the execute function.

Parameters:kwargs – Dictionary of keyword arguments with the parameters passed to the execute(..) function
classmethod v_qmz(analysis_object, qslice_viewer_option=0, qspectrum_viewer_option=0)

Get the mz axes for the analysis

Parameters:
  • analysis_object – The omsi_file_analysis object for which slicing should be performed
  • qslice_viewer_option – If multiple default viewer behaviors are available for a given analysis then this option is used to switch between them for the qslice URL pattern.
  • qspectrum_viewer_option – If multiple default viewer behaviors are available for a given analysis then this option is used to switch between them for the qspectrum URL pattern.
Returns:

The following four arrays are returned by the analysis:

  • mzSpectra : Array with the static mz values for the spectra.
  • labelSpectra : Label for the spectral mz axis
  • mzSlice : Array of the static mz values for the slices or None if identical to the mzSpectra.
  • labelSlice : Label for the slice mz axis or None if identical to labelSpectra.
  • values_x: The values for the x axis of the image (or None)
  • label_x: Label for the x axis of the image
  • values_y: The values for the y axis of the image (or None)
  • label_y: Label for the y axis of the image
  • values_z: The values for the z axis of the image (or None)
  • label_z: Label for the z axis of the image

classmethod v_qslice(analysis_object, z, viewer_option=0)

Get 3D analysis dataset for which z-slices should be extracted for presentation in the OMSI viewer

Parameters:
  • analysis_object – The omsi_file_analysis object for which slicing should be performed
  • z – Selection string indicting which z values should be selected.
  • viewer_option – If multiple default viewer behaviors are available for a given analysis then this option is used to switch between them.
Returns:

numpy array with the data to be displayed in the image slice viewer. Slicing will be performed typically like [:,:,zmin:zmax].

Raises:

NotImplementedError in case that v_qslice is not supported by the analysis.

classmethod v_qslice_viewer_options(analysis_object)

Get a list of strings describing the different default viewer options for the analysis for qslice. The default implementation tries to take care of handling the spectra retrieval for all the dependencies but can naturally not decide how the qspectrum should be handled by a derived class. However, this implementation is often called at the end of custom implementations to also allow access to data from other dependencies.

Parameters:analysis_object – The omsi_file_analysis object for which slicing should be performed. For most cases this is not needed here as the support for slice operations is usually a static decission based on the class type, however, in some cases additional checks may be needed (e.g., ensure that the required data is available).
Returns:List of strings indicating the different available viewer options. The list should be empty if the analysis does not support qslice requests (i.e., v_qslice(...) is not available).
classmethod v_qspectrum(analysis_object, x, y, viewer_option=0)

Get from which 3D analysis spectra in x/y should be extracted for presentation in the OMSI viewer

Developer Note: h5py currently supports only a single index list. If the user provides an index-list for both x and y, then we need to construct the proper merged list and load the data manually, or if the data is small enough, one can load the full data into a numpy array which supports multiple lists in the selection.

Parameters:
  • analysis_object – The omsi_file_analysis object for which slicing should be performed
  • x – x selection string
  • y – y selection string
  • viewer_option – If multiple default viewer behaviors are available for a given analysis then this option is used to switch between them.
Returns:

The following two elements are expected to be returned by this function :

  1. 1D, 2D or 3D numpy array of the requested spectra. NOTE: The mass (m/z) axis must be the last axis. For index selection x=1,y=1 a 1D array is usually expected. For indexList selections x=[0]&y=[1] usually a 2D array is expected. For ragne selections x=0:1&y=1:2 we one usually expects a 3D array.
  2. None in case that the spectra axis returned by v_qmz are valid for the returned spectrum. Otherwise, return a 1D numpy array with the m/z values for the spectrum (i.e., if custom m/z values are needed for interpretation of the returned spectrum).This may be needed, e.g., in cases where a per-spectrum peak analysis is performed and the peaks for each spectrum appear at different m/z values.

classmethod v_qspectrum_viewer_options(analysis_object)

Get a list of strings describing the different default viewer options for the analysis for qspectrum. The default implementation tries to take care of handling the spectra retrieval for all the dependencies but can naturally not decide how the qspectrum should be handled by a derived class. However, this implementation is often called at the end of custom implementations to also allow access to data from other dependencies.

Parameters:analysis_object – The omsi_file_analysis object for which slicing should be performed. For most cases this is not needed here as the support for slice operations is usually a static decision based on the class type, however, in some cases additional checks may be needed (e.g., ensure that the required data is available).
Returns:List of strings indicating the different available viewer options. The list should be empty if the analysis does not support qspectrum requests (i.e., v_qspectrum(...) is not available).
write_analysis_data(analysis_group=None)

This function is used to write the actual analysis data to file. If not implemented, then the omsi_file_analysis API’s default behavior is used instead.

Parameters:analysis_group – The h5py.Group object where the analysis is stored. May be None on cores that do not perform any writing but which need to participate in communication, e.g., to collect data for writing.
exception omsi.analysis.AnalysisReadyError(value, params=None)

Bases: exceptions.Exception

Custom exception used to indicate that an analysis is not ready to execute.

Initialize the AnalysisReadyError

Parameters:
  • value – Error message string
  • params – Optional list of dependent parameters that are not ready to be used.
class omsi.analysis.analysis_generic(name_key='undefined')

Bases: omsi.analysis.base.analysis_base

This analysis class is used if the specific anlaysis type is unknown, e.g., when loading custom user-defined analysis data that may have not be available in the standard omsi package used.

Initialize the basic data members

Parameters:name_key – The name for the analysis
DEFAULT_OUTPUT_PREFIX = 'output_'
execute(**kwargs)

Overwrite the default implementation of execute to update parameter specifications/types when wrapping functions where the types are not known a priori.

Parameters:kwargs – Custom analysis parameters
Returns:The result of execute_analysis()
execute_analysis()

Nothing to do here.

classmethod from_function(analysis_function, output_names=None, parameter_specs=None, name_key='undefined')

Create a generic analysis class for a given analysis function.

This functionality is useful to ease quick scripting on analyses but should not be used in production.

NOTE: __analysis_function is a reserved parameter name used to store the analysis function and may not be used as an input parameter for the analysis function.

Parameters:
  • analysis_function – The analysis function to be wrapped for provenance tracking and storage
  • output_names – Optionally, define a list of the names of the outputs
  • parameter_specs – Optional list of omsi.datastructures.analysis_data.parameter_data with additional information about the parameters of the function.
  • name_key – The name for the analysis, i.e., the analysis identifier
Returns:

A new generic analysis class

classmethod get_analysis_type()

Return a string indicating the type of analysis performed

get_real_analysis_type()

This class is designed to handle generic (including unkown) types of analysis. In cases, e.g., were this class is used to store analysis data from an HDF5 file we may have an actual analysis type available even if we do not have a special analysis class may not be available in the current installation

read_from_omsi_file(analysis_object, load_data=True, load_parameters=True, load_runtime_data=True, dependencies_omsi_format=True, ignore_type_conflict=False)

See omsi.analysis.analysis_base.read_from_omsi_file(...) for details. The function is overwritten here mainly to initialize the self.real_analysis_type instance variable but otherwise uses the default behavior.

classmethod v_qmz(analysis_object, qslice_viewer_option=0, qspectrum_viewer_option=0)

Implement support for qmz URL requests for the viewer

classmethod v_qslice(analysis_object, z, viewer_option=0)

Implement support for qslice URL requests for the viewer

classmethod v_qslice_viewer_options(analysis_object)

Define which viewer_options are supported for qspectrum URL’s

classmethod v_qspectrum(analysis_object, x, y, viewer_option=0)

Implement support for qspectrum URL requests for the viewer

classmethod v_qspectrum_viewer_options(analysis_object)

Define which viewer_options are supported for qspectrum URL’s

write_analysis_data(analysis_group=None)

This function is used to write the actual analysis data to file. If not implemented, then the omsi_file_analysis API’s default behavior is used instead.

Parameters:analysis_group – The h5py.Group object where the analysis is stored. May be None on cores that do not perform any writing but which need to participate in communication, e.g., to collect data for writing.
class omsi.analysis.omsi_findpeaks_global(name_key='undefined')

Bases: omsi.analysis.base.analysis_base

Basic global peak detection analysis. The default implementation computes the peaks on the average spectrum and then computes the peak-cube data, i.e., the values for the detected peaks at each pixel.

TODO: The current version assumes 2D data

Initialize the basic data members

execute_analysis()

Execute the global peak finding for the given msidata and mzdata.

classmethod v_qmz(analysis_object, qslice_viewer_option=0, qspectrum_viewer_option=0)

Implement support for qmz URL requests for the viewer

classmethod v_qslice(analysis_object, z, viewer_option=0)

Implement support for qslice URL requests for the viewer

classmethod v_qslice_viewer_options(analysis_object)

Define which viewer_options are supported for qspectrum URL’s

classmethod v_qspectrum(analysis_object, x, y, viewer_option=0)

Implement support for qspectrum URL requests for the viewer

classmethod v_qspectrum_viewer_options(analysis_object)

Define which viewer_options are supported for qspectrum URL’s

class omsi.analysis.omsi_findpeaks_local(name_key='undefined')

Bases: omsi.analysis.base.analysis_base

Class defining a basic gloabl peak finding. The default implementation computes the peaks on the average spectrum and then computes the peak-cube data, i.e., the values for the detected peaks at each pixel.

TODO: The current version assumes 2D data

Initialize the basic data members

execute_analysis(msidata_subblock=None)

Execute the local peak finder for the given msidata.

Parameters:msidata_subblock – Optional input parameter used for parallel execution of the analysis only. If msidata_subblock is set, then the given subblock will be processed in SERIAL instead of processing self[‘msidata’] in PARALLEL (if available). This parameter is strictly optional and intended for internal use only.
classmethod v_qmz(analysis_object, qslice_viewer_option=0, qspectrum_viewer_option=0)

Implement support for qmz URL requests for the viewer

classmethod v_qslice(analysis_object, z, viewer_option=0)

Implement support for qslice URL requests for the viewer

classmethod v_qslice_viewer_options(analysis_object)

Define which viewer_options are supported for qspectrum URL’s

classmethod v_qspectrum(analysis_object, x, y, viewer_option=0)

Implement support for qspectrum URL requests for the viewer

classmethod v_qspectrum_viewer_options(analysis_object)

Define which viewer_options are supported for qspectrum URL’s

write_analysis_data(analysis_group=None)

This function is used to write the actual analysis data to file. If not implemented, then the omsi_file_analysis API’s default behavior is used instead.

Parameters:analysis_group – The h5py.Group object where the analysis is stored.
class omsi.analysis.omsi_nmf(name_key='undefined')

Bases: omsi.analysis.base.analysis_base

Class defining a basic nmf analysis.

The function has primarily been tested we MSI datasets but should support arbitrary n-D arrays (n>=2). The last dimension of the input array must be the spectrum dimnensions.

Initalize the basic data members

execute_analysis()

Execute the nmf for the given msidata

classmethod v_qmz(analysis_object, qslice_viewer_option=0, qspectrum_viewer_option=0)

Implement support for qmz URL requests for the viewer

classmethod v_qslice(analysis_object, z, viewer_option=0)

Implement support for qslice URL requests for the viewer

classmethod v_qslice_viewer_options(analysis_object)

Define which viewer_options are supported for qspectrum URL’s

classmethod v_qspectrum(analysis_object, x, y, viewer_option=0)

Implement support for qspectrum URL requests for the viewer

classmethod v_qspectrum_viewer_options(analysis_object)

Define which viewer_options are supported for qspectrum URL’s

class omsi.analysis.omsi_cx(name_key='undefined')

Bases: omsi.analysis.base.analysis_base

Class used to implement CX factorization on MSI data.

Initalize the basic data members

classmethod comp_lev_exact(A, k, axis)

This function computes the column or row leverage scores of the input matrix.

Parameters:
  • A – n-by-d matrix
  • k – rank parameter, k <= min(n,d)
  • axis – 0: compute row leverage scores; 1: compute column leverage scores.
Returns:

1D array of leverage scores. If axis = 0, the length of lev is n. otherwise, the length of lev is d.

dimension_index = {'pixelDim': 1, 'imageDim': 0}
execute_analysis()

EDIT_ME:

Replace this text with the appropriate documentation for the analysis. Describe what your analysis does and how a user can use it. Note, a user will call the function execute(...) which takes care of storing parameters, collecting execution data etc., so that you only need to implement your analysis, the rest is taken care of by analysis_base. omsi uses Sphynx syntax for the documentation.

Keyword Arguments:

Parameters:mydata

...

classmethod v_qmz(analysis_object, qslice_viewer_option=0, qspectrum_viewer_option=0)

Get the mz axes for the analysis

Parameters:
  • analysis_object – The omsi_file_analysis object for which slicing should be performed
  • qslice_viewer_option – If multiple default viewer behaviors are available for a given analysis then this option is used to switch between them for the qslice URL pattern.
  • qspectrum_viewer_option – If multiple default viewer behaviors are available for a given analysis then this option is used to switch between them for the qspectrum URL pattern.
Returns:

The following four arrays are returned by the analysis:

  • mz_spectra : Array with the static mz values for the spectra.
  • label_spectra : Lable for the spectral mz axis
  • mz_slice : Array of the static mz values for the slices or None if identical to the mz_spectra.
  • label_slice : Lable for the slice mz axis or None if identical to label_spectra.

classmethod v_qslice(analysis_object, z, viewer_option=0)

Get 3D analysis dataset for which z-slices should be extracted for presentation in the OMSI viewer

Parameters:
  • analysis_object – The omsi_file_analysis object for which slicing should be performed
  • z – Selection string indicting which z values should be selected.
  • viewer_option – If multiple default viewer behaviors are available for a given analysis then this option is used to switch between them.
Returns:

numpy array with the data to be displayed in the image slice viewer. Slicing will be performed typically like [:,:,zmin:zmax].

classmethod v_qslice_viewer_options(analysis_object)

Get a list of strings describing the different default viewer options for the analysis for qslice. The default implementation tries to take care of handling the spectra retrieval for all the depencies but can naturally not decide how the qspectrum should be handled by a derived class. However, this implementation is often called at the end of custom implementations to also allow access to data from other dependencies.

Parameters:analysis_object – The omsi_file_analysis object for which slicing should be performed. For most cases this is not needed here as the support for slice operations is usually a static decision based on the class type, however, in some cases additional checks may be needed (e.g., ensure that the required data is available).
Returns:List of strings indicating the different available viewer options. The list should be empty if the analysis does not support qslice requests (i.e., v_qslice(...) is not available).
classmethod v_qspectrum(analysis_object, x, y, viewer_option=0)

Get from which 3D analysis spectra in x/y should be extracted for presentation in the OMSI viewer

Developer Note: h5py currently supports only a single index list. If the user provides an index-list for both
x and y, then we need to construct the proper merged list and load the data manually, or if the data is small enough, one can load the full data into a numpy array which supports mulitple lists in the selection.
Parameters:
  • analysis_object – The omsi_file_analysis object for which slicing should be performed
  • x – x selection string
  • y – y selection string
  • viewer_option – If multiple default viewer behaviors are available for a given analysis then this option is used to switch between them.
Returns:

The following two elemnts are expected to be returned by this function :

  1. 1D, 2D or 3D numpy array of the requested spectra. NOTE: The mass (m/z) axis must be the last axis. For index selection x=1,y=1 a 1D array is usually expected. For indexList selections x=[0]&y=[1] usually a 2D array is expected. For ragne selections x=0:1&y=1:2 we one usually expects a 3D array.
  2. None in case that the spectra axis returned by v_qmz are valid for the returned spectrum. Otherwise, return a 1D numpy array with the m/z values for the spectrum (i.e., if custom m/z values are needed for interpretation of the returned spectrum).This may be needed, e.g., in cases where a per-spectrum peak analysis is performed and the peaks for each spectrum appear at different m/z values.

classmethod v_qspectrum_viewer_options(analysis_object)

Get a list of strings describing the different default viewer options for the analysis for qspectrum. The default implementation tries to take care of handling the spectra retrieval for all the depencies but can naturally not decide how the qspectrum should be handled by a derived class. However, this implementation is often called at the end of custom implementations to also allow access to data from other dependencies.

Parameters:analysis_object – The omsi_file_analysis object for which slicing should be performed. For most cases this is not needed here as the support for slice operations is usually a static decission based on the class type, however, in some cases additional checks may be needed (e.g., ensure that the required data is available).
Returns:List of strings indicating the different available viewer options. The list should be empty if the analysis does not support qspectrum requests (i.e., v_qspectrum(...) is not available).
class omsi.analysis.omsi_kmeans(name_key='undefined')

Bases: omsi.analysis.base.analysis_base

Class defining a basic nmf analysis for a 2D MSI data file or slice of the data

Initalize the basic data members

execute_analysis()

Execute the kmeans clustering for the given msidata

class omsi.analysis.omsi_tic_norm(name_key='undefined')

Bases: omsi.analysis.base.analysis_base

TIC Normalization analysis.

Initalize the basic data members

execute_analysis()

Normalize the data based on the total intensity of a spectrum or the intensities of a select set of ions.

Calculations are performed using a memory map approach to avoid loading all data into memory. TIC normalization can as such be performed even on large files (assuming sufficient disk space).

Keyword Arguments:

Parameters:
  • msidata (h5py.dataset or numpu array (3D)) – The input MSI dataset
  • mzdata – The mz axsis do the dataset
  • maxCount

    ...

  • mzTol

    ...

  • infIons – List of informative ions
record_execute_analysis_outputs(analysis_output)

We are not returning any outputs here, but we are going to record them manually. :param analysis_output: The output of the execute_analysis(...) function.

classmethod v_qmz(analysis_object, qslice_viewer_option=0, qspectrum_viewer_option=0)

Get the mz axes for the analysis

Parameters:
  • analysis_object – The omsi_file_analysis object for which slicing should be performed
  • qslice_viewer_option – If multiple default viewer behaviors are available for a given analysis then this option is used to switch between them for the qslice URL pattern.
  • qspectrum_viewer_option – If multiple default viewer behaviors are available for a given analysis then this option is used to switch between them for the qspectrum URL pattern.
Returns:

The following four arrays are returned by the analysis:

  • mz_spectra : Array with the static mz values for the spectra.
  • label_spectra : Lable for the spectral mz axis
  • mz_slice : Array of the static mz values for the slices or None if identical to the mz_spectra.
  • label_slice : Lable for the slice mz axis or None if identical to label_spectra.

classmethod v_qslice(analysis_object, z, viewer_option=0)

Get 3D analysis dataset for which z-slices should be extracted for presentation in the OMSI viewer

Parameters:
  • analysis_object – The omsi_file_analysis object for which slicing should be performed
  • z – Selection string indicting which z values should be selected.
  • viewer_option – If multiple default viewer behaviors are available for a given analysis then this option is used to switch between them.
Returns:

numpy array with the data to be displayed in the image slice viewer. Slicing will be performed typically like [:,:,zmin:zmax].

classmethod v_qslice_viewer_options(analysis_object)

Get a list of strings describing the different default viewer options for the analysis for qslice. The default implementation tries to take care of handling the spectra retrieval for all the depencies but can naturally not decide how the qspectrum should be handled by a derived class. However, this implementation is often called at the end of custom implementations to also allow access to data from other dependencies.

Parameters:analysis_object – The omsi_file_analysis object for which slicing should be performed. For most cases this is not needed here as the support for slice operations is usually a static decision based on the class type, however, in some cases additional checks may be needed (e.g., ensure that the required data is available).
Returns:List of strings indicating the different available viewer options. The list should be empty if the analysis does not support qslice requests (i.e., v_qslice(...) is not available).
classmethod v_qspectrum(analysis_object, x, y, viewer_option=0)

Get from which 3D analysis spectra in x/y should be extracted for presentation in the OMSI viewer

Developer Note: h5py currently supports only a single index list. If the user provides an index-list for both
x and y, then we need to construct the proper merged list and load the data manually, or if the data is small enough, one can load the full data into a numpy array which supports multiple lists in the selection.
Parameters:
  • analysis_object – The omsi_file_analysis object for which slicing should be performed
  • x – x selection string
  • y – y selection string
  • viewer_option – If multiple default viewer behaviors are available for a given analysis then this option is used to switch between them.
Returns:

The following two elements are expected to be returned by this function :

  1. 1D, 2D or 3D numpy array of the requested spectra. NOTE: The mass (m/z) axis must be the last axis. For index selection x=1,y=1 a 1D array is usually expected. For indexList selections x=[0]&y=[1] usually a 2D array is expected. For ragne selections x=0:1&y=1:2 we one usually expects a 3D array/
  2. None in case that the spectra axis returned by v_qmz are valid for the returned spectrum. Otherwise, return a 1D numpy array with the m/z values for the spectrum (i.e., if custom m/z values are needed for interpretation of the returned spectrum).This may be needed, e.g., in cases where a per-spectrum peak analysis is performed and the peaks for each spectrum appear at different m/z values.

classmethod v_qspectrum_viewer_options(analysis_object)

Get a list of strings describing the different default viewer options for the analysis for qspectrum. The default implementation tries to take care of handling the spectra retrieval for all the depencies but can naturally not decide how the qspectrum should be handled by a derived class. However, this implementation is often called at the end of custom implementations to also allow access to data from other dependencies.

param analysis_object:
 The omsi_file_analysis object for which slicing should be performed. For most cases this is not needed here as the support for slice operations is usually a static decission based on the class type, however, in some cases additional checks may be needed (e.g., ensure that the required data is available).
returns:List of strings indicating the different available viewer options. The list should be empty if the analysis does not support qspectrum requests (i.e., v_qspectrum(...) is not available).

base Module

Module specifying the base analysis API for integrating new analysis with the toolkit and the OpenMSI science gateway.

exception omsi.analysis.base.AnalysisReadyError(value, params=None)

Bases: exceptions.Exception

Custom exception used to indicate that an analysis is not ready to execute.

Initialize the AnalysisReadyError

Parameters:
  • value – Error message string
  • params – Optional list of dependent parameters that are not ready to be used.
class omsi.analysis.base.analysis_base

Bases: omsi.datastructures.analysis_data.parameter_manager

Base class for omsi analysis functionality. The class provides a large set of functionality designed to facilitate storage of analysis data in the omsi HDF5 file format. The class also provides a set of functions to enable easy intergration of new analysis with the OpenMSI web-based viewer (see Viewer functions below for details).

Slicing:

This class supports basic slicing to access data stored in the main member variables. By default the data is retrieved from __data_list and the __getitem__(key) function. which implements the [..] operator, returns __data_list[key][‘data’]. The key is a string indicating the name of the parameter to be retrieved. If the key is not found in the __data_list then the function will try to retrieve the data from self.parameters list instead. By adding “parameter/key” or “dependency/key” one may also explicitly retrieve values from the parameters.

Instance Variables:

Variables:
  • analysis_identifier – Define the name for the analysis used as key in search operations
  • __data_list – List of analysis_data to be written to the HDF5 file. Derived classes need to add all data that should be saved for the analysis in the omsi HDF5 file to this dictionary. See omsi.analysis.analysis_data for details.
  • parameters – List of parameter_data objects of all analysis parameters (including those that may have dependencies).
  • data_names – List of strings of all names of analysis output datasets. These are the target keys for __data_list.
  • profile_time_and_usage – Boolean indicating whether we should profile the execute_analysis(...) function when called as part of the execute(...) function. The default value is false. Use the enable_time_and_usage_profiling(..) function to determine which profiling should be performed. The time_and _usage profile uses pythons cProfile (or Profile) to monitor how often and for how long particular parts of the analysis code executed.
  • profile_memory – Boolean indicating whether we should monitor memory usage (line-by-line) when executing the execute_analysis(...) function. The default value is false. Use the enable_time_and_usage_profiling(..) function to determine which profiling should be performed.
  • omsi_analysis_storage – List of omsi_file_analysis object where the analysis is stored. The list may be empty.
  • mpi_comm – In case we are running with MPI, this is the MPI communicator used for runnign the analysis. Default is MPI.Comm_world/
  • mpi_root – In case we are running with MPI, this is the root rank where data is collected to (e.g., runtime data and analysis results)
  • update_analysis – If the value is True, then we should execute the analysis before using the outputs. If False, then the analysis has been executed with the current parameter settings.
  • driver – Workflow driver to be used when executing multiple analyses, e.g., via execute_recursive or execute_all. Default value is None in which case a new default driver will be used each time we execute a workflow.

Execution Functions:

  • execute : Then main function the user needs to call in order to execute the analysis
  • ``execute_analysis: This function needs to be implemented by child classes of analysis_base to implement the specifics of executing the analysis.

I/O functions:

These functions can be optionally overwritten to control how the analysis data should be written/read from the omsi HDF5 file. Default implementations are provided here, which should be sufficient for most cases.

  • add_custom_data_to_omsi_file: The default implementation is empty as the default data write is managed by the omsi_file_experiment.create_analysis() function. Overwrite this function, in case that the analysis needs to write data to the HDF5 omsi file beyond what the defualt omsi data API does.
  • read_from_omsi_file: The default implementation tries to reconstruct the original data as far as possible, however, in particular in case that a custom add_custom_data_to_omsi_file function has been implemented, the default implementation may not be sufficien. The default implementation reconstructs: i) analysis_identifier and reads all custom data into ii)__data_list. Note, an error will be raised in case that the analysis type specified in the HDF5 file does not match the analysis type specified by get_analysis_type(). This function can be optionally overwritten to implement a custom data read.

Viewer functions:

Several convenient functions are used to allow the OpenMSI online viewer to interact with the analysis and to visualize it. The default implementations provided here simply indicate that the analysis does not support the data access operations required by the online viewer. Overwrite these functions in the derived analysis classes in order to interface them with the viewer. All viewer-related functions start with v\_... .

NOTE: the default implementation of the viewer functions defined in analysis_base are designed to take care of the common requirement for providing viewer access to data from all dependencies of an analysis. In many cases, the default implementation is often sill called at the end of custom viewer functions.

NOTE: The viewer functions typically support a viewer_option parameter. viewer_option=0 is expected to refer to the analysis itself.

  • v_qslice: Retrieve/compute data slices as requested via qslice URL requests. The corresponding view of the DJANGO data access server already translates all input parameters and takes care of generating images/plots if needed. This function is only responsible for retrieving the data.
  • v_qspectrum: Retrieve/compute spectra as requested via qspectrum URL requests. The corresponding view of the DJANGO data access server already translates all input parameters and takes care of generating images/plots if needed. This function is only responsible for retrieving the data.
  • v_qmz: Define the m/z axes for image slices and spectra as requested by qspectrum URL requests.
  • v_qspectrum_viewer_options: Define a list of strings, describing the different viewer options available for the analysis for qspectrum requests (i.e., v_qspectrum). This feature allows the analysis developer to define multiple different visualization modes for the analysis. For example, when performing a data reduction (e.g., PCA or NMF) one may want to show the raw spectra or the loadings vector of the projection in the spectrum view (v_qspectrum). By providing different viewer options we allow the user to decide which option they are most interested in.
  • v_qslice_viewer_options: Define a list of strings, describing the different viewer options available for the analysis for qslice requests (i.e., v_qslice). This feature allows the analysis developer to define multiple different visualization modes for the analysis. For example, when performing a data reduction (e.g., PCA or NMF) one may want to show the raw spectra or the loadings vector of the projection in the spectrum view (v_qspectrum). By providing different viewer options we allow the user to decide which option they are most interested in.

Initialize the basic data members

add_custom_data_to_omsi_file(analysis_group)

This function can be optionally overwritten to implement a custom data write function for the analysis to be used by the omsi_file API.

Note, this function should be used only to add additional data to the analysis group. The data that is written by default is still written by the omsi_file_experiment.create_analysis() function, i.e., the following data is written by default: i) analysis_identifier ,ii) get_analysis_type, iii)__data_list, iv) parameters, v) runinfo . Since the omsi_file.experiment.create_analysis() functions takes care of setting up the basic structure of the analysis storage (included the subgroubs for storing parameters and data dependencies) this setup can generally be assumed to exist before this function is called. This function is called automatically at the end omsi_file.experiment.create_analysis() (i.e, actually omsi_file_analysis.__populate_analysis__(..) so that this function typically does not need to be called explicitly.

Parameters:analysis_group – The h5py.Group object where the analysis is stored.
add_parameter(name, help, dtype=<type 'unicode'>, required=False, default=None, choices=None, data=None, group=None)

Add a new parameter for the analysis. This function is typically used in the constructor of a derived analysis to specify the parameters of the analysis.

Parameters:
  • name – The name of the parameter
  • help – Help string describing the parameter
  • type – Optional type. Default is string.
  • required – Boolean indicating whether the parameter is required (True) or optional (False). Default False.
  • default – Optional default value for the parameter. Default None.
  • choices – Optional list of choices with allowed data values. Default None, indicating no choices set.
  • data – The data assigned to the parameter. None by default.
  • group – Optional group string used to organize parameters. Default None, indicating that parameters are automatically organized by driver class (e.g. in required and optional parameters)
Raises:

ValueError is raised if the parameter with the given name already exists.

analysis_identifier_defined()

Check whether the analysis identifier is defined by the user, i.e., set to value different than undefined :return: bool

check_ready_to_execute()

Check if all inputs are ready to determine if the analysis is ready to run.

Returns:List of omsi_analysis_parameter objects that are not ready. If the returned list is empty, then the analysis is ready to run.
clear_analysis()

Clear all analysis data—i.e., parameter, dependency data, output results, runtime data

clear_analysis_data()

Clear the list of analysis data

clear_and_restore(analysis_manager=None, resave=False)

Clear all analysis data and restore the results from file

Parameters:
  • analysis_manager – Instance of omsi_analysis_manager (e.g., an omsi_file_experiment) where the analysis should be saved.
  • resave – Boolean indicating whether the analysis should be saved again, even if it has been saved before. This parameter only has effect if analysis_manager is given.
Returns:

self, i.e., the updated analysis object with all data replaced with HDF5 references

clear_parameter_data()

Clear the list of parameter data

clear_run_info_data()

Clear the runtime information data

define_missing_parameters()

Called by the execute function before self.update_analysis_parameters to set any required parameters that have not been defined to their respective default values.

This function may be overwritten in child classes to customize the definition of default parameter values and to apply any modifications (or checks) of parameters before the analysis is executed. Any changes applied here will be recorded in the parameter of the analysis.

enable_memory_profiling(enable=True)

Enable or disable line-by-line profiling of memory usage of execute_analysis.

Parameters:enable_memory (bool) – Enable (True) or disable (False) line-by-line profiling of memory usage
Raises:ImportError is raised if a required package for profiling is not available.
enable_time_and_usage_profiling(enable=True)

Enable or disable profiling of time and usage of code parts of execute_analysis.

Parameters:enable (bool) – Enable (True) or disable (False) profiling
Raises:ImportError is raised if a required package for profiling is not available.
execute(**kwargs)

Use this function to run the analysis.

Parameters:kwargs – Parameters to be used for the analysis. Parameters may also be set using the __setitem__ mechanism or as batches using the set_parameter_values function.
Returns:This function returns the output of the execute analysis function.
Raises:AnalysisReadyError in case that the analysis is not ready to be executed. This may be the case, e.g, when a dependent input parameter is not ready to be used.
classmethod execute_all(force_update=False, executor=None)

Execute all analysis instances that are currently defined.

Parameters:
  • force_update – Boolean indicating whether we should force that all analyses are executed again, even if they have already been run with the same settings before. False by default.
  • executor – Optional workflow executor to be used for the execution of all analyses. The executor will be cleared and then all analyses will be added to executor. Default value is None, in which case the function creates a default executor to be used.
Returns:

The workflow executor used

execute_analysis()

Implement this function to implement the execution of the actual analysis.

This function may not require any input parameters. All input parameters are recorded in the parameters and dependencies lists and should be retrieved from there, e.g, using basic slicing self[ paramName ]

Input parameters may be added for internal use ONLY. E.g, we may add parameters that are used internally to help with parallelization of the execute_analysis function. Such parameters are not recorded and must be strictly optional so that analysis_base.execute(...) can call the function.

Returns:This function may return any developer-defined data. Note, all output that should be recorded must be put into the data list.
execute_recursive(**kwargs)

Recursively execute this analysis and all its dependencies if necessary

We use a workflow driver to control the execution. To define the workflow driver we can set the self.driver variable. If no workflow driver is given (i.e, self.driver==None), then the default driver will be created. To change the default driver, see omsi.workflow.base.workflow_executor_base.DEFAULT_EXECUTOR_CLASS

Parameters:kwargs – Parameters to be used for the analysis. Parameters may also be set using the __setitem__ mechanism or as batches using the set_parameter_values function.
Returns:Same as execute
get_all_analysis_data()

Get the complete list of all analysis datasets to be written to the HDF5 file

get_all_dependency_data()

Get the complete list of all direct dependencies to be written to the HDF5 file

NOTE: These are only the direct dependencies as specified by the analysis itself. Use get_all_dependency_data_recursive(..) to also get the indirect dependencies of the analysis due to dependencies of the dependencies themselves.

Returns:List of parameter_data objects that define dependencies.
get_all_parameter_data(exclude_dependencies=False)

Get the complete list of all parameter datasets to be written to the HDF5 file

Parameters:exclude_dependencies – Boolean indicating whether we should exclude parameters that define dependencies from the list
get_all_run_info()

Get the dict with the complete info about the last run of the analysis

get_analysis_data(index)

Given the index return the associated dataset to be written to the HDF5 file

:param index : Retrun the index entry of the private member __data_list.

get_analysis_data_by_name(dataname)

Given the key name of the data return the associated analysis_data object.

Parameters:dataname – Name of the analysis data requested from the private __data_list member.
Returns:The analysis_data object or None if not found.
get_analysis_data_names()

Get a list of all analysis dataset names.

get_analysis_identifier()

Return the name of the analysis used as key when searching for a particular analysis

classmethod get_analysis_instances()

Generator function used to iterate through all instances of analysis_base. The function creates references for all weak references stored in cls._analysis_instances and returns the references if it exists and cleans up the any invalid references after the iteration is complete. :return: References to analysis_base objects

get_analysis_type()

Return a string indicating the type of analysis performed

static get_default_dtypes()

Get a list of available default dtypes used for analyses. Same as data_dtypes.get_dtypes().

static get_default_parameter_groups()

Get a list of commonly used parameter groups and associated descriptions.

Use of default groups provides consistency and allows other system to design custom behavior around the semantic of parameter groups

Returns:Dictionary where the keys are the short names of the groups and the values are dicts with following keys:value pairs: ‘name’ , ‘description’. Use the ‘name’ to define the group to be used.
get_help_string()

Get a string describing the analysis.

Returns:Help string describing the analysis and its parameters
get_memory_profile_info()

Based on the memory profile of the execute_analysis(..) function get the string describing the line-by-line memory usage.

Returns:String describing the memory usage profile. None is returned in case that no memory profiling data is available.
get_num_analysis_data()

Retrun the number of analysis datasets to be wirtten to the HDF5 file

get_num_dependency_data()

Return the number of dependencies to be wirtten to the HDF5 file

get_num_parameter_data()

Return the number of parameter datasets to be wirtten to the HDF5 file

get_omsi_analysis_storage()

Get a list of known locations where this analysis has been saved.

Returns:List of omsi.dataformat.omsi_file.analysis. omsi_file_analysis objects where the analysis is saved.
get_parameter_data(index)

Given the index return the associated dataset to be written to the HDF5 file

:param index : Return the index entry of the private member parameters.

get_parameter_data_by_name(dataname)

Given the key name of the data return the associated parameter_data object.

Parameters:dataname – Name of the parameter requested from the parameters member.
Returns:The parameter_data object or None if not found
get_parameter_names()

Get a list of all parameter dataset names (including those that may define dependencies.

get_profile_stats_object(consolidate=True, stream=None)

Based on the execution profile of the execute_analysis(..) function get pstats.Stats object to help with the interpretation of the data.

Parameters:
  • consolidate – Boolean flag indicating whether multiple stats (e.g., from multiple cores) should be consolidated into a single stats object. Default is True.
  • stream – The optional stream parameter to be used fo the pstats.Stats object.
Returns:

A single pstats.Stats object if consolidate is True. Otherwise the function returns a list of pstats.Stats objects, one per recorded statistic. None is returned in case that the stats objects cannot be created or no profiling data is available.

has_omsi_analysis_storage()

Check whether a storage location is known where the anlaysis has been saved.

Returns:Boolean indicating whether self.omsi_analysis_storage is not empty
keys()

Get a list of all valid keys, i.e., a combination of all input parameter and output names.

Returns:List of strings with all input parameter and output names.
classmethod locate_analysis(data_object, include_parameters=False)

Given a data_object try to locate the analysis that creates the object as an output of its execution (and optionally analyses that have the object as an input).

Parameters:
  • data_object – The data object of interest.
  • include_parameters – Boolean indicating whether also input parameters should be considered in the search in addition to the outputs of an analysis
Returns:

dependency_dict pointing to the relevant object or None in case the object was not found.

read_from_omsi_file(analysis_object, load_data=True, load_parameters=True, load_runtime_data=True, dependencies_omsi_format=True, ignore_type_conflict=False)

This function can be optionally overwritten to implement a custom data read.

The default implementation tries to reconstruct the original data as far as possible, however, in particular in case that a custom add_custom_data_to_omsi_file function has been implemented, the default implementation may not be sufficient. The default implementation reconstructs: i) analysis_identifier and reads all custom data into iii)__data_list. Note, an error will be raised in case that the analysis type specified in the HDF5 file does not match the analysis type specified by get_analysis_type()

Parameters:
  • analysis_object – The omsi_file_analysis object associated with the hdf5 data group with the analysis data_list
  • load_data – Should the analysis data be loaded from file (default) or just stored as h5py data objects
  • load_parameters – Should parameters be loaded from file (default) or just stored as h5py data objects.
  • load_runtime_data – Should runtime data be loaded from file (default) or just stored as h5py data objects
  • dependencies_omsi_format – Should dependencies be loaded as omsi_file API objects (default) or just as h5py objects.
  • ignore_type_conflict – Set to True to allow the analysis to be loaded into the current analysis object even if the type indicated in the file does not match the class. Default value is False. This behavior can be useful when different analysis have compatible data structures or when we want to load the data in to a generic analysis container, e.g, analysis_generic.
Returns bool:

Boolean indicating whether the data was read successfully

Raise:

TypeError : A type error will be raised in case that the analysis type specified by the file does not match the analysis type provided by self.get_analysis_type()

record_execute_analysis_outputs(analysis_output)

Function used internally by execute to record the output of the custom execute_analysis(...) function to the __data_list.

This function may be overwritten in child classes in order to customize the behavior for recording data outputs. Eg., for some analyses one may only want to record a particular set of outputs, rather than all outputs generated by the analysis.

Parameters:analysis_output – The output of the execute_analysis(...) function to be recorded
results_ready()

Check whether the results of the analysis are ready to be used :return: Boolean

set_analysis_identifier(identifier)

Set the name of the analysis to identifer

Side Effects: This function modifies self.analysis_identifier

Parameters:identifier (str) – The new analysis identifier string to be used (should be unique)
set_parameter_values(**kwargs)

Set all parameters given as input to the function. The inputs are placed in the self.parameters list. If the parameter refers to an existing h5py.Dataset, h5py.Group, managed h5py object, or is an instance of an existing omis_analysi_base object, then a dependency_dict will be created and stored as value instead.

Parameters:kwargs – Dictionary of keyword arguments. All keys are expected to be strings. All values are expected to be either i) numpy arrays, ii) int, float, str or unicode variables, iii) h5py.Dataset or h5py.Group, iv) or any the omsi_file API class objects. For iii) and iv) one may provide a tuple consisting of the dataobject t[0] and an additional selection string t[1].
update_analysis_parameters(**kwargs)

Record the analysis parameters passed to the execute() function.

The default implementation simply calls the set_parameter_values(...) function. This function may be overwritten to customize the behavior of how parameters are recorded by the execute function.

Parameters:kwargs – Dictionary of keyword arguments with the parameters passed to the execute(..) function
classmethod v_qmz(analysis_object, qslice_viewer_option=0, qspectrum_viewer_option=0)

Get the mz axes for the analysis

Parameters:
  • analysis_object – The omsi_file_analysis object for which slicing should be performed
  • qslice_viewer_option – If multiple default viewer behaviors are available for a given analysis then this option is used to switch between them for the qslice URL pattern.
  • qspectrum_viewer_option – If multiple default viewer behaviors are available for a given analysis then this option is used to switch between them for the qspectrum URL pattern.
Returns:

The following four arrays are returned by the analysis:

  • mzSpectra : Array with the static mz values for the spectra.
  • labelSpectra : Label for the spectral mz axis
  • mzSlice : Array of the static mz values for the slices or None if identical to the mzSpectra.
  • labelSlice : Label for the slice mz axis or None if identical to labelSpectra.
  • values_x: The values for the x axis of the image (or None)
  • label_x: Label for the x axis of the image
  • values_y: The values for the y axis of the image (or None)
  • label_y: Label for the y axis of the image
  • values_z: The values for the z axis of the image (or None)
  • label_z: Label for the z axis of the image

classmethod v_qslice(analysis_object, z, viewer_option=0)

Get 3D analysis dataset for which z-slices should be extracted for presentation in the OMSI viewer

Parameters:
  • analysis_object – The omsi_file_analysis object for which slicing should be performed
  • z – Selection string indicting which z values should be selected.
  • viewer_option – If multiple default viewer behaviors are available for a given analysis then this option is used to switch between them.
Returns:

numpy array with the data to be displayed in the image slice viewer. Slicing will be performed typically like [:,:,zmin:zmax].

Raises:

NotImplementedError in case that v_qslice is not supported by the analysis.

classmethod v_qslice_viewer_options(analysis_object)

Get a list of strings describing the different default viewer options for the analysis for qslice. The default implementation tries to take care of handling the spectra retrieval for all the dependencies but can naturally not decide how the qspectrum should be handled by a derived class. However, this implementation is often called at the end of custom implementations to also allow access to data from other dependencies.

Parameters:analysis_object – The omsi_file_analysis object for which slicing should be performed. For most cases this is not needed here as the support for slice operations is usually a static decission based on the class type, however, in some cases additional checks may be needed (e.g., ensure that the required data is available).
Returns:List of strings indicating the different available viewer options. The list should be empty if the analysis does not support qslice requests (i.e., v_qslice(...) is not available).
classmethod v_qspectrum(analysis_object, x, y, viewer_option=0)

Get from which 3D analysis spectra in x/y should be extracted for presentation in the OMSI viewer

Developer Note: h5py currently supports only a single index list. If the user provides an index-list for both x and y, then we need to construct the proper merged list and load the data manually, or if the data is small enough, one can load the full data into a numpy array which supports multiple lists in the selection.

Parameters:
  • analysis_object – The omsi_file_analysis object for which slicing should be performed
  • x – x selection string
  • y – y selection string
  • viewer_option – If multiple default viewer behaviors are available for a given analysis then this option is used to switch between them.
Returns:

The following two elements are expected to be returned by this function :

  1. 1D, 2D or 3D numpy array of the requested spectra. NOTE: The mass (m/z) axis must be the last axis. For index selection x=1,y=1 a 1D array is usually expected. For indexList selections x=[0]&y=[1] usually a 2D array is expected. For ragne selections x=0:1&y=1:2 we one usually expects a 3D array.
  2. None in case that the spectra axis returned by v_qmz are valid for the returned spectrum. Otherwise, return a 1D numpy array with the m/z values for the spectrum (i.e., if custom m/z values are needed for interpretation of the returned spectrum).This may be needed, e.g., in cases where a per-spectrum peak analysis is performed and the peaks for each spectrum appear at different m/z values.

classmethod v_qspectrum_viewer_options(analysis_object)

Get a list of strings describing the different default viewer options for the analysis for qspectrum. The default implementation tries to take care of handling the spectra retrieval for all the dependencies but can naturally not decide how the qspectrum should be handled by a derived class. However, this implementation is often called at the end of custom implementations to also allow access to data from other dependencies.

Parameters:analysis_object – The omsi_file_analysis object for which slicing should be performed. For most cases this is not needed here as the support for slice operations is usually a static decision based on the class type, however, in some cases additional checks may be needed (e.g., ensure that the required data is available).
Returns:List of strings indicating the different available viewer options. The list should be empty if the analysis does not support qspectrum requests (i.e., v_qspectrum(...) is not available).
write_analysis_data(analysis_group=None)

This function is used to write the actual analysis data to file. If not implemented, then the omsi_file_analysis API’s default behavior is used instead.

Parameters:analysis_group – The h5py.Group object where the analysis is stored. May be None on cores that do not perform any writing but which need to participate in communication, e.g., to collect data for writing.

generic Module

Generic analysis class used to represent analyses of unknown type, e.g., when loading a custom user-defined analysis from file for which the indicate class may not be available with the local installation. In this case we want to at least be able to load and investigate the data.

class omsi.analysis.generic.analysis_generic(name_key='undefined')

Bases: omsi.analysis.base.analysis_base

This analysis class is used if the specific anlaysis type is unknown, e.g., when loading custom user-defined analysis data that may have not be available in the standard omsi package used.

Initialize the basic data members

Parameters:name_key – The name for the analysis
DEFAULT_OUTPUT_PREFIX = 'output_'
execute(**kwargs)

Overwrite the default implementation of execute to update parameter specifications/types when wrapping functions where the types are not known a priori.

Parameters:kwargs – Custom analysis parameters
Returns:The result of execute_analysis()
execute_analysis()

Nothing to do here.

classmethod from_function(analysis_function, output_names=None, parameter_specs=None, name_key='undefined')

Create a generic analysis class for a given analysis function.

This functionality is useful to ease quick scripting on analyses but should not be used in production.

NOTE: __analysis_function is a reserved parameter name used to store the analysis function and may not be used as an input parameter for the analysis function.

Parameters:
  • analysis_function – The analysis function to be wrapped for provenance tracking and storage
  • output_names – Optionally, define a list of the names of the outputs
  • parameter_specs – Optional list of omsi.datastructures.analysis_data.parameter_data with additional information about the parameters of the function.
  • name_key – The name for the analysis, i.e., the analysis identifier
Returns:

A new generic analysis class

classmethod get_analysis_type()

Return a string indicating the type of analysis performed

get_real_analysis_type()

This class is designed to handle generic (including unkown) types of analysis. In cases, e.g., were this class is used to store analysis data from an HDF5 file we may have an actual analysis type available even if we do not have a special analysis class may not be available in the current installation

read_from_omsi_file(analysis_object, load_data=True, load_parameters=True, load_runtime_data=True, dependencies_omsi_format=True, ignore_type_conflict=False)

See omsi.analysis.analysis_base.read_from_omsi_file(...) for details. The function is overwritten here mainly to initialize the self.real_analysis_type instance variable but otherwise uses the default behavior.

classmethod v_qmz(analysis_object, qslice_viewer_option=0, qspectrum_viewer_option=0)

Implement support for qmz URL requests for the viewer

classmethod v_qslice(analysis_object, z, viewer_option=0)

Implement support for qslice URL requests for the viewer

classmethod v_qslice_viewer_options(analysis_object)

Define which viewer_options are supported for qspectrum URL’s

classmethod v_qspectrum(analysis_object, x, y, viewer_option=0)

Implement support for qspectrum URL requests for the viewer

classmethod v_qspectrum_viewer_options(analysis_object)

Define which viewer_options are supported for qspectrum URL’s

write_analysis_data(analysis_group=None)

This function is used to write the actual analysis data to file. If not implemented, then the omsi_file_analysis API’s default behavior is used instead.

Parameters:analysis_group – The h5py.Group object where the analysis is stored. May be None on cores that do not perform any writing but which need to participate in communication, e.g., to collect data for writing.
omsi.analysis.generic.bastet_analysis(output_names=None, parameter_specs=None, name_key='undefined')

Decorator used to wrap a function and replace it with an analysis_generic object that behaves like a function but adds the ability for saving the analysis to file and tracking provenance

This is essentially the same as analysis_generic.from_function(....).

Parameters:
  • func – The function to be wrapped
  • output_names – Optional list of strings with the names of the outputs
  • parameter_specs – Optional list of omsi.datastructures.analysis_data.parameter_data with additional information about the parameters of the function.
  • name_key – Optional name for the analysis, i.e., the analysis identifier
Returns:

analysis_generic instance for the wrapped function

analysis_views Module

Helper module with functions and classes for interfacing with different analysis algorithms. Many of these functions are used to ease interaction with the analysis module in a generic fashion, without having to explicitly know about all the different available modules, e.g., we can just look up modules by name and interact with them directly.

class omsi.analysis.analysis_views.analysis_views

Bases: object

Helper class for interfacing different analysis algorithms with the web-based viewer

Nothing to do here.

classmethod analysis_name_to_class(class_name)

Convert the given string indicating the class to a python class.

Parameters:class_name – Name of the analysis class. This may be a fully qualified name, e.g., omsi.analysis.multivariate_stat.omsi_nmf or a name relative to the omis.analysis module, e.g, multivariate_stat.omsi_nmf.
Raises:Attribute error in case that the class cannot be restored.
classmethod available_analysis()

Get all available analysis, i.e., all analysis that are subclasses of analysis_base.

Returns:Dictionary where the dict-keys are the full qualified name of the module and the values are the analysis class corresponding to that module.
classmethod available_analysis_descriptions()

Get all available analysis, i.e., all analysis that are subclasses of analysis_base. For each analysis compile the list of input parameters, outputs, the corresponding class etc.

Returns:Dictionary where the dict-keys are the full qualified name of the module and the values are dicts with class, list of analysis paremeter names, list of analysis outputs.
classmethod get_axes(analysis_object, qslice_viewer_option=0, qspectrum_viewer_option=0)

Get the mz axes for the analysis

Parameters:
  • analysis_object – The omsi_file_analysis object for which slicing should be performed
  • qslice_viewer_option – If multiple default viewer behaviors are available for a given analysis then this option is used to switch between them for the qslice URL pattern.
  • qspectrum_viewer_option – If multiple default viewer behaviors are available for a given analysis then this option is used to switch between them for the qspectrum URL pattern.
Returns:

The following four arrays are returned by the analysis:

  • mz_spectra : Array with the static mz values for the spectra.
  • label_spectra : Lable for the spectral mz axis
  • mz_slice : Array of the static mz values for the slices or None if identical to the mz_spectra.
  • label_slice : Lable for the slice mz axis or None if identical to label_spectra.
  • values_x: The values for the x axis of the image (or None)
  • label_x: Label for the x axis of the image
  • values_y: The values for the y axis of the image (or None)
  • label_y: Label for the y axis of the image
  • values_z: The values for the z axis of the image (or None)
  • label_z: Label for the z axis of the image

classmethod get_qslice_viewer_options(analysis_object)

Get a list of strings describing the different default viewer options for qslice.

Parameters:analysis_object – The omsi_file_analysis object for which slicing should be performed.
Returns:Array of strings indicating the different available viewer options. The array may be empty if now viewer_options are available, i.e., get_slice and get_spectrum are undefined for the given analysis.
classmethod get_qspectrum_viewer_options(analysis_object)

Get a list of strings describing the different default viewer options for qspectrum.

Parameters:analysis_object – The omsi_file_analysis object for which slicing should be performed.
Returns:Array of strings indicating the different available viewer options. The array may be empty if now viewer_options are available, i.e., get_slice and get_spectrum are undefined for the given analysis.
classmethod get_slice(analysis_object, z, operations=None, viewer_option=0)

Get 3D analysis dataset for which z-slices should be extracted for presentation in the OMSI viewer

Parameters:
  • analysis_object – The omsi_file_analysis object for which slicing should be performed
  • z – Selection string indicting which z values should be selected.
  • operations – JSON string with list of dictionaries or a python list of dictionaries. Each dict specifies a single data transformation or data reduction that are applied in order. See omsi.shared.omsi_data_selection.transform_and_reduce_data(...) for details.
  • viewer_option – If multiple default viewer behaviors are available for a given analysis then this option is used to switch between them.
Returns:

numpy array with the data to be displayed in the image slice viewer. Slicing will be performed typically like [:,:,zmin:zmax].

classmethod get_spectra(analysis_object, x, y, operations=None, viewer_option=0)

Get from which 3D analysis spectra in x/y should be extracted for presentation in the OMSI viewer

Developer Note: h5py currently supports only a single index list. If the user provides an index-list for both x and y, then we need to construct the proper merged list and load the data manually, or if the data is small enough, one can load the full data into a numpy array which supports multiple lists in the selection.

Parameters:
  • analysis_object – The omsi_file_analysis object for which slicing should be performed
  • x – x selection string
  • y – y selection string
  • operations – JSON string with list of dictionaries or a python list of dictionaries. Each dict specifies a single data transformation or data reduction that are applied in order. See omsi.shared.omsi_data_selection.transform_and_reduce_data(...) for details.
  • viewer_option – If multiple default viewer behaviors are available for a given analysis then this option is used to switch between them.
Returns:

2D or 3D numpy array of the requested spectra. The mass (m/z) axis must be the last axis.

classmethod supports_slice(analysis_object)

Get whether a default slice selection behavior is defined for the analysis.

Parameters:analysis_object – The omsi_file_analysis object for which slicing should be performed
Returns:Boolean indicating whether get_slice(...) is defined for the analysis object.
classmethod supports_spectra(analysis_object)

Get wheter a default spectra selection behavior is defined for the analysis.

Parameters:analysis_object – The omsi_file_analysis object for which slicing should be performed.
Returns:Boolean indicating whether get_spectra(...) is defined for the analysis object.