shared Package

omsi.shared Package used to implement shared functionality and helper functions.
omsi.shared.data_selection Module for defining and processing data selections.
omsi.shared.log Module providing functionality for logging based on the python logging module.
omsi.shared.mpi_helper Module used to ease the use of MPI and distributed parallel implementations using MPI
omsi.shared.omsi_web_helper Module with helper functions for interactions with the OpenMSI web infrastructure, e.g.
omsi.shared.spectrum_layout This module provides capabilities for computing different layouts for spectra
omsi.shared.third_party Package containing shared third-party code modules included here to reduce the need for external dependencies when only small parts of external code are used.
omsi.shared.third_party.cloudpickle This class is defined to override standard pickle functionality The goals of it follow: -Serialize lambdas and nested functions to compiled byte code -Deal with main module correctly -Deal with other non-serializable objects It does not include an unpickler, as standard python unpickling suffices.

data_selection Module

Module for defining and processing data selections. This includes the definition of selections using strings as well as transformation and reduction of data.

TODO: We may want to expose some of the following numpy functions currently not yet
supported through the transform and reduce data operations:
  • array2string
  • array_equal
  • array_equiv
  • array_repr
  • array_split
  • array_str
  • asanyarray
  • asarray
  • asarray_chkfinite
  • ascontiguousarray
  • asfarray
  • asfortranarray
  • asmatrix
  • asscalar
  • atleast_1d
  • atleast_2d
  • atleast_3d
  • binary_repr
  • convolve
  • conjugate
  • cross
  • dot
  • extract
  • fft.*
  • histogram, histogram2D, histogramdd
  • kron
  • linalg.*
  • swapaxes(a, axis1, axis2)
  • transpose

#Simple data transformation and reduction example from omsi.shared.omsi_data_selection import * import numpy as np import json t = [ {‘transformation’:’threshold’, ‘threshold’:60} , {‘reduction’:’max’, ‘axis’:2} ] tj = json.dumps(t) tj a = np.arange(125).reshape((5,5,5)) apro = transform_and_reduce_data(data=a, operations=tj, http_error=True) apro

#Another simple example from omsi.shared.omsi_data_selection import * import numpy as np import json a = np.arange(10)+5 print a # 1) substract minimum # 2) divide by the maximum value with the maximum value converted to float # NOTE: The conversion to float is to avoid division of integers, i.e., # 5/10 = 0, whereas 5/float(10) = 0.5 # NOTE: The specification of ‘x1’:’data’ can be omitted as this is the default. # ‘x1’:’data’ simply explicitly specifies that the input data should be # assigned to the first operand of the arithmetic operation. t = [{‘transformation’:’dualDataTransform’ , ‘operation’:’subtract’, ‘x1’:’data’, ‘x2’:[{‘reduction’:’min’}]} ,

{‘transformation’:’dualDataTransform’ , ‘operation’:’divide’ , ‘x1’:’data’, ‘x2’:[{‘reduction’:’max’} ,
{‘transformation’:’astype’, ‘dtype’:’float’} ]}]

b = transform_and_reduce_data(data=a, operations=t) print b t = [{‘transformation’:’threshold’ , ‘threshold’:[{‘reduction’:’median’}]}] print t c = transform_and_reduce_data(data=a, operations=t) print c

#Construct a JSON description of a transformation/reduction from omsi.shared.omsi_data_selection import * #Construct the different pieces of the transformation and reduction pipeline #1) Compute the maximum data value and convert it to float #1.1) Compute the maximum value max_value = construct_reduce_dict( reduction_type=’max’ , axis=None) #1.2) Convert data to float value_as_float = construct_transform_dict( trans_type=’astype’ , dtype=’float’ ) #1.3) Merge the two steps to compute the maximum data value as float max_value_as_float = construct_transform_reduce_list( max_value, value_as_float ) #2) Normalize the data by dividing by the maximum value divide_by_max_value = construct_transform_dict( trans_type=’dualDataTransform’,

operation=’divide’ , axes=None , x2=max_value_as_float)

#3) Project along the last axis (i.e., the mz axis) to compute a maximum project image max_projection = construct_reduce_dict( reduction_type=’max’ , axis=-1) #4) Merge the different steps and construct the json string json_string = transform_reduce_description_to_json( divide_by_max_value , max_projection ) #Just copy the result of the following print statement as your JSON description print json_string

omsi.shared.data_selection.check_selection_string(selection_string)

Check whether the given selection string is valid, and indicate which type of selection the string defined. Checking the selection string is meant as a safeguard to prevent attackers from being able to insert malicious code.

Parameters:selection_string – String given by the user with the desired selection
Returns:String indicating the type of selection as defined in selection_type:
  • ‘indexlist’ : Selection of the form [1,2,3]
  • ‘all’ : Selection of the form ‘:’
  • ‘range: Selection of the form ‘a:b’
  • ‘index: A single index selection, e.g., ‘1’
  • ‘invalid’: An unsupported selection
omsi.shared.data_selection.construct_reduce_dict(reduction_type, **kwargs)

Helper function used to construct reduction dictionary.

Required Keyword arguments:

Parameters:reduction_type – The reduction type to be used.

Optional Keyword arguments:

Parameters:
  • axis – Some reduction functions support the axis parameters, describing along which axis the reduction should be performed.
  • x1 – By default the reductions are performed on the output of the previous data operation (x1=’data’). We may reference the output of, e.g., the fifth data operation by setting x1=’data5’. x1 itself may also specify a separate data transformation and reduction pipeline that operates on ‘data’.
  • min_dim – Minimum number of dimensions the input data should have in order for the reduction should be applied.
Returns:

Dictionary with the description of the reduction operation.

omsi.shared.data_selection.construct_transform_dict(trans_type, axes=None, **kwargs)

Helper function used to construct a dictionary describing a data transformation.

Parameters:
  • trans_type – The transformation type to be used. See transformation_type dict.
  • axes – The axes along which the data should be split. Default is None.
  • kwargs – Additional keyword parameters for the transformation functions.
Returns:

Dictionary with the description of the transformation.

Raises:

KeyError is raised in case that a parameter is missing. ValueError is raised in case that a given parameter value is invalid.

omsi.shared.data_selection.construct_transform_reduce_list(*args)

Merge a series of transformations and reductions into a single list describing a pipeline of transformation and reduction operations to be performed.

Args:Ordered series of dictionaries describing transformation and reduction operations.
Returns:List of all transformation and reduction operations
omsi.shared.data_selection.evaluate_transform_parameter(parameter, data=None, secondary_data=None)

Evaluate the given query parameter. This function is used to enable the use of data transformation and reductions as part of transformation parameters. E.g., a user may want to substract the minimum, or divide by the maximum etc.

Parameters:
  • parameter – The parameter to be evaluated. This may be a JSON string or list/dictionary-based description of a data transformation. Or any other valid data parameter. If the parameter describes as data reduction or transformation then the transformation will be evaluated and the result is returned, otherwise the parameter itself is returned.
  • data – The input numpy array that should be transformed.
  • secondary_data – Other data from previous data iterations a user may reference.
Returns:

The evaluated parameter result.

omsi.shared.data_selection.is_transform_or_reduce(parameter)

Check if the given parameter defines a description of a data transformation or data reduction

Parameters:parameter (JSON string, dict or list of dicts with transformation parameter.) – The parameter to be checked.
Returns:Boolean
omsi.shared.data_selection.json_to_transform_reduce_description(json_string)

Convert the json string to the transformation/reduction dict.

Parameters:json_string – The json string to be converted.
Returns:Python list or dict with the description
omsi.shared.data_selection.perform_reduction(data, reduction, secondary_data, min_dim=None, http_error=False, **kwargs)

Helper function used reduce the data of a given numpy array.

Parameters:
  • data – The input numpy array that should be reduced
  • reduction (String) – Data reduction to be applied to the input data. Reduction operations are defined as strings indicating the numpy function to be used for reduction. Valid reduction operations include e.g.: mins, max, mean, median, std, var etc.
  • axis – The axis along which the reduction should be applied
  • secondary_data – Other data from previous data iterations a user may reference.
  • http_error – Define which type of error message the function should return. If false then None is returned in case of error. Otherwise a DJANGO HttpResponse is returned.
  • min_dim – Minimum number of dimensions the input data must have in order for the reduction to be applied.
  • kwargs – Additional optional keyword arguments.
Returns:

Reduced numpy data array or in case of error None or HttpResonse with a description of the error that occurred (see http_error option).

omsi.shared.data_selection.reduction_allowed_numpy_function = ['all', 'alltrue', 'amax', 'amin', 'angle', 'any', 'append', 'argmax', 'argmin', 'argwhereaverage', 'bincount', 'corrcoef', 'cumprod', 'cumproduct', 'cumsum', 'count_nonzero', 'diag', 'diag_indices', 'diagflat', 'diagonal', 'diff', 'max', 'min', 'median', 'mean', 'percentile', 'product', 'prod', 'ptp', 'select_values', 'squeeze', 'std', 'var', 'transpose', 'sum']

List of allowed numpy data reduction operations. Reduction operations are any single data operations that may change the shape of the data. NOTE: Some operations may have additional optional or required keyword arguments. HELP: For full documentation of the different functions see the numpy documentation.

Additional input parameters are often:

  • ‘x1’ : The data operand specifying the data the reduction should be performed on. \

    The input data will be used by default if x1 is not specified. You may also specify ‘data’ to explicitly indicate that the input data should be assigned to x1. You may specify data0 to indicate that the output of another data operation should be used. Note, data0 here refers to the input to the full data operation pipeline. Data from other parts of the pipeline, are then indexed using 1-based indices. E.g,. to access the output of the first data operation set x1=’data0’

  • ‘axis’ : Integer indicating the axis along which the data should be reduced. \

    The default behavior, if axis is not specified, depends on the behavior of the corresponding numpy function. However, in most cases (if not all cases) the data operation will be applied to the full input data if no axis is specified.

  • ‘min_dim’ : Integer specifying the minimum number of data dimensions the input data \

    must have in order for the reduction operation to be applied.

Here the list of allowed data reduction operations.

  • ‘all’ : out = numpy.all(data, axis)
  • ‘amax’ : out = numpy.amax(data, axis)
  • ‘amin’ : out = numpy.amin(data, axis)
  • ‘alltrue’ : out = numpy.alltrue(data, axis)
  • ‘angle’ : out = numpy.angle(z, deg)
  • ‘any’ : out = numpy.any(data, axis)
  • ‘append’ : out = numpy.append(data, values, axis)
  • ‘argmax’ : out = numpy.argmax(data, axis)
  • ‘argmin’ : out = numpy.argmin(data, axis)
  • ‘argwhere’ : out = numpy.argwhere(data)
  • ‘average’ : out = numpy.average(data, axis)
  • ‘bincount’ : out = numpy.bincount(x, weights=None, minlength=None)
  • ‘corrcoef’ : out = numpy.corrcoef(data)
  • ‘count_nonzero’ : out = numpy.count_nonzero(data)
  • ‘cumprod’ : out = numpy.cumprod(data,axis)
  • ‘cumproduct’: out = numpy.cumproduct(data,axis)
  • ‘cumsum’ : out = numpy.cumsum(data,axis)
  • ‘diag’ : out = numpy.diag(data,k=0)
  • ‘diag_indices: out = numpu.diag_indices(data, ndim=2)
  • ‘diagflat’ : out = numpy.diagflat(data, k=0)
  • ‘diagonal’ : out = numpy.diagonal(data, offset=0, axis1=0, axis2=1)
  • ‘diff’ : out = numpy.diff(a, n=1, axis=-1)
  • ‘max’ : out = numpy.max(data, axis)
  • ‘min’ : out = numpy.min(data, axis)
  • ‘median’ : out = numpy.median(data, axis)
  • ‘mean’ : out = numpy.mean(data, axis)
  • ‘percentile’: out = numpy.percentile(data, q, axis)
  • ‘product’ : out = numpy.product(data, axis)
  • ‘prod’ : out = numpy.prod(data,axis)
  • ‘ptp’ : out = numpy.ptp(data,axis)
  • ‘squeeze’ : out = numpy.squeeze(data)
  • ‘std’ : out = numpy.std(data, axis)
  • ‘swapaxes: out = numpy.swapaxes(x1, axis1, axis2)
  • ‘var’ : out = numpy.var(data, axis)
  • ‘transpose’ : out = numpy.transpose(data)
  • ‘sum’ : out = numpy.sum(data, axis)

None-numpy data reduction operations:

  • ‘select_values’ : out = data[ selection ]
omsi.shared.data_selection.selection_string_to_object(selection_string, list_to_index=False)

Convert the given selection string to a python selection object, i.e., either a slice, list or integer index.

Parameters:
  • selection_string – A selection string of the type indexlist
  • list_to_index – Should we turn the list into an index if the list contains only a single value. Default value is False, i.e., the list is not modified.
Returns:

  • An integer index if an index selection is specified
  • A python list of indices if a list specified in the string
  • A python slice object if a slice operation is specified by the string

omsi.shared.data_selection.selection_to_indexlist(selection_string, axis_size=0)

Parse the indexlist selection string and return a python list of indices

Parameters:
  • selection_string – A selection string of the type indexlist
  • axis_size – Size of the dimensions for which the selection is defined. Only needed in case that a range selection is given. This should be a list of sizes, in case that a multiaxis selection is given.
Returns:

  • A python list of point indices for the selection.
  • None in case the list is empty or in case an error occurred.

omsi.shared.data_selection.selection_to_string(selection)

Convert the given selection, which may be either an int, a list of ints, a slice object or a tuple of the mentioned types which is used to define a selection along multiple axes. :param selection: The selection to be converted to a string :type selection: int, list, slice, or a tuple of int, list, slice objects :return: The selection string

omsi.shared.data_selection.selection_type = {'index': 0, 'all': 3, 'indexlist': 2, 'invalid': -1, 'range': 4, 'multiaxis': 5}

This an extended list of types indicated by the check_selection_string function. Indices <0 are assumed to be invalid selections.

omsi.shared.data_selection.transform_and_reduce_data(data, operations, secondary_data=None, http_error=False)

Helper function used to apply a series of potentially multiple operations to a given numpy dataset. This function uses the transform_data_single(...) function to apply each indicated transformation to the data. This function uses the perform_reduction function to perform data reduction operations.

Parameters:
  • data – The input numpy array that should be transformed.
  • operations

    JSON string with list of dictionaries or a python list of dictionaries. Each dict specifies a single data transformation or data reduction. The operations are applied in order, i.e., operations[0] is applied first, then operations[1] and so on. The dicts must be structured according to one of the following specifications:

    • {‘transformation’:<op>} : Single transformation applied to all data at once.
    • {‘transformation’:<op>, ‘axes’:[..]} : Apply a single transformation to data chunks defined by the axes parameter. The data is split into chunks along the dimensions defined by the axes parameter. E.g., if we have a 3D MSI dataset and we want to op ion images independently, then we need to set axes=[2]. Accordingly, if we want to op spectra individually, then we need to split the two image dimensions into chunks by setting axes=[0,1].
    • {‘reduction’:<reduction>, ‘axis’:int} : Define the reduction operations to be applied and the axis along which the data should be reduced. If reduction along all axis should be done then set axis ot None (in python) or null in JSON.
  • secondary_data – Other data from previous data iterations a user may reference.
  • http_error – Define which type of error message the function should return. If false then None is returned in case of error. Otherwise a DJANGO HttpResponse is returned.
Returns:

Reduced numpy data array or HttpResonse with a description of the error that occurred.

omsi.shared.data_selection.transform_data_single(data, transformation='minusMinDivideMax', axes=None, secondary_data=None, http_error=False, transform_kwargs=None)

Helper function used to transform data of a numpy array. The function potentially splits the array into independent chunks that are normalized separately (depending on how the axes parameter is defined). The actual data transformations are implemented by transform_datachunk(...).

Parameters:
  • data – The input numpy array that should be transformed.
  • transformation – Data transformation option to be used. Available options are: ‘minusMinDivideMax’ ,...
  • axes – List of data axis that should be split into chunks that are treated independently during the transformation. By default transformation is applied based on the full dataset (axes=None). E.g, if transformation should be performed on a per image basis, then we need to split the m/z dimension into individual chunks and set axes=[2]. If we want to transform spectra individually, then we need to split the two image dimensions into chunks by setting axes=[0,1].
  • secondary_data – Other data from previous data iterations a user may reference.
  • http_error – Define which type of error message the function should return. If false then None is returned in case of error. Otherwise a DJANGO HttpResponse is returned.
  • transform_kwargs – Dictionary of additional keyword arguments to be passed to the transform_datachunk(...) function.
Returns:

Reduced numpy data array or HttpResonse with a description of the error that occurred.

omsi.shared.data_selection.transform_datachunk(data, transformation='minusMinDivideMax', secondary_data=None, **kwargs)

Helper function used to transform a given data chunk. In contrast to transform_data, this function applies the transformation directly to the data provided, without consideration of axis information. This function is used by transform_data(...) to implement the actual normalization for independent data chunks that need to be normalized.

Required keyword arguments:

Parameters:
  • data – The input numpy array that should be transformed.
  • transformation – Data transformation option to be used. For available options see the transformation_type dictionary.
  • secondary_data – Other data from previous data iterations a user may reference.

Additional transformation-dependent keyword arguments:

Parameters:kwargs

Additional keyword arguments that are specific for different data transformation. Below a list of additional keyword arguments used for different transformation options

  • transformation: ‘threshold’
    ** ‘threshold’ : The threshold parameter to be used for
    the thresold operation. If threshold is not specified, then the 5th %tile will be used as threshold value instead, ie., the bottom 5% of the data are set to 0.
Returns:This function returns the normalized data array. If an unsupported transformation option is given, then the function simply return the unmodified input array.
omsi.shared.data_selection.transform_reduce_description_to_json(*args)

Convert the dictionary describing the transformation/reduction operations to a JSON string.

Parameters:args – The list or dictionaries with the description of the transformation and reduction operations.
Returns:JSON string
omsi.shared.data_selection.transformation_allowed_numpy_dual_data = ['add', 'arctan2', 'bitwise_and', 'bitwise_not', 'bitwise_or', 'bitwise_xor', 'corrcoef', 'cov', 'divide', 'equal', 'fmax', 'fmin', 'fmod', 'greater', 'greater_equal', 'left_shift', 'less', 'less_equal', 'logical_and', 'logical_or', 'logical_xor', 'mod', 'multiply', 'not_equal', 'power', 'right_shift', 'subtract']

List of allowed dual data transformations. Dual data transformation, are operation that operate on a two data input datasets but which do not change the shape of the data. Below a list of available numpy function options. NOTE: Some operations may have additional optional or required keyword arguments. HELP: For full documentation of the different functions see the numpy documentation.

  • ‘add’ : out = x1 + x2 = numpy.add(x1,x2)
  • ‘arctan2’ : out = numpy.arctan2(x1,x2)
  • ‘bitwise_and’ : out = x1 && x2 = numpy.bitwise_and(x1,x2)
  • ‘bitwise_not’ : out = numpy.bitwise_not(x1,x2)
  • ‘bitwise_or’, : out = x1 || x2 = numpy.bitwise_or(x1,x2)
  • ‘bitwise_xor’ : out = numpy.bitwise_xor(x1,x2)
  • ‘corrcoef’ : out = numpy.corrcoef(x1,x2)
  • ‘cov’ : out = numpy.cov(x1, x2, rowvar=1, bias=0, ddof=None)
  • ‘divide’ : out = x1 / x2 = numpy.divide(x1,x2)
  • ‘equal’ : out = x1 == x2 = numpy.equal(x1,x2)
  • ‘fmax’ : out = numpy.fmax(x1,x2)
  • ‘fmin’ : out = numpy.fmin(x1,x2)
  • ‘fmod’ : out = numpy.fmod(x1,x2)
  • ‘greater’ : out = x1 > x2 = numpy.greater(x1,x2)
  • ‘greater_equal’ : out = x1 >= x2 = numpy.greater_equal(x1,x2)
  • ‘left_shift’ : out = numpy.left_shift(x1,x2)
  • ‘less’ : out = x1 < x2 = numpy.less(x1,x2)
  • ‘less_equal’ : out = x1 <= x2 = numpy.less_equal(x1,x2)
  • ‘logical_and’ : out = numpy.logical_and(x1,x2)
  • ‘logical_not’ : See transformation_allowed_numpy_single_data instead.
  • ‘logical_or’ : out = numpy.logical_or(x1,x2)
  • ‘logical_xor’ : out = numpy.logical_xor(x1,x2)
  • ‘mod’ : out = numpy.mod(x1,x2)
  • ‘multiply’ : out = x1 * x2 = numpy.multiply(x1,x2)
  • ‘not_equal’ : out = x1 != x2 = numpy.not_equal(x1,x2)
  • ‘power’ : out = numpy.power(x1,x2)
  • ‘subtract’ : out = x1 - x2 = numpy.subtract(x1,x2)
  • ‘right_shift : out = np.right_shift(x1,x2)
omsi.shared.data_selection.transformation_allowed_numpy_single_data = ['abs', 'arccos', 'arccosh', 'arcsin', 'arcsinh', 'arctan', 'arctanh', 'argsort', 'around', 'ceil', 'clip', 'cos', 'cosh', 'deg2rad', 'degrees', 'exp', 'exp2', 'fabs', 'floor', 'hypot', 'invert', 'log', 'log2', 'log10', 'logical_not', 'negative', 'sign', 'round', 'sin', 'sinc', 'sinhsqrt', 'sort', 'tan', 'tanh']

List of allowed single data transformations. Single data transformation, are operations that operate on a single data input and which do not change the shape of the data. Below a list of available numpy options. NOTE: Some operations may have additional optional or required keyword arguments. HELP: For full documentation of the different functions see the numpy documentation.

  • ‘abs’ : out = numpy.abs(x1)

  • ‘arccos’ : out = numpy.arccos(x1)

  • ‘arccosh’: out = numpy.arccosh(x1)

  • ‘arcsin’ : out = numpy.arcsin(x1)

  • ‘arcsinh’: out = numpy.arcsinh(x1)

  • ‘arctan’ : out = numpy.arctan(x1)

  • ‘arctanh’: out = numpy.arctanh1(x1)

  • ‘argsort’ : out = numpy.argsort(data, axis, kind=’quicksort’, order=None)

  • ‘around’ : out = numpy.around(x1, decimals)

  • ‘ceil’ : out = numpy.ceil(x1)

  • ‘cos’ : out = numpy.cos(x1)

  • ‘cosh’ : out = numpy.cosh(x1)

  • ‘clip’ : out = numpy.clip(x1, a_min, a_max)

  • ‘deg2rad’: out = numpy.deg2rad(x1)

  • ‘degrees : out = numpy.degrees(x1)

  • ‘exp’ : out = numpy.exp(x1)

  • ‘exp2’ : out = numpy.exp2(x1)

  • ‘fabs’ : out = numpy.fabs(x1)

  • ‘floor’ : out = numpy.floor(x1)

  • ‘hypot’ : out = numpy.hypot(x1)

  • ‘invert’ : out = numpy.invert(x1)

  • ‘log’ : out[x1>0] = log(x1[x1>0]) \

    out[x1<0] = log(x1[x1<0]*-1)*-1 out[x1==0] = 0

  • ‘log2 ‘: out[x1>0] = log2(x1[x1>0])

    out[x1<0] = log2(x1[x1<0]*-1)*-1 out[x1==0] = 0

  • ‘log10’: out[x1>0] = log10(x1[x1>0])

    out[x1<0] = log10(x1[x1<0]*-1)*-1 out[x1==0] = 0

  • ‘logical_not’ : out = numpy.logical_not(x1)

  • ‘negative’ : out = np.negative(x1)

  • ‘round’ : out = numpy.round(x1, decimals)

  • ‘sqrt’ : out[x1>0] = sqrt(x1[x1>0]) \

    out[x1<0] = sqrt(x1[x1<0]*-1)*-1 out[x1==0] = 0

  • ‘sign’ : out = numpy.sign(x1)

  • ‘sin’ : out = numpy.sin(x1)

  • ‘sinc’ : out = numpy.sinc(x1)

  • ‘sinh’ : out = numpy.sinh(x1)

  • ‘sort’ : out = numpy.sort(x1, axis=-1, kind=’quicksort’, order=None)

  • ‘swapaxes: out = numpy.swapaxes(x1, axis1, axis2)

  • ‘tan’ : out = numpy.tan(x1)

  • ‘tanh’ : out = numpy.tanh(x1)

omsi.shared.data_selection.transformation_type = {'singleDataTransform': 'singleDataTransform', 'scale': 'scale', 'divideMax': 'divideMax', 'astype': 'astype', 'threshold': 'threshold', 'minusMinDivideMax': 'minusMinDivideMax', 'dualDataTransform': 'dualDataTransform', 'arithmetic': 'arithmetic'}

Dictionary of available data transformation options. Available options are:

  • ‘arithmetic’ : Same as ‘dualDataTransform’. See ‘dualDataTransform’ below for details.

  • ‘divideMax’ : Divide the data by the current maximum value.

  • ‘minusMinDivideMax’ : Substract the minimum value from the data and \

    then divide the data by maximum of the data (with the minimum already substracted.

  • ‘dualDataTransform’ : Apply arbitrary arithmetic operation to the data. Additional parameter \

    required for this option are:

    • operation : String defining the arithmetic operations to be applied. Supported operations are: ‘add’, ‘divide’, ‘greater’, ‘greater_equal’, ‘multiply’, ‘subtract’

    • ‘x1’ : The first data operand of the arithmetic operation. \

      The input data will be used by default if x1 is not specified. You may also specify ‘data’ to explicitly indicate that the input data should be assigned to x1. You may specify data0 to indicate that the output of another data operation should be used. Note, data0 here refers to the input to the full data operation pipeline. Data from other parts of the pipeline, are then indexed using 1-based indices. E.g,. to access the output of the first data operation set x1=’data0’

    • ‘x2’ : The second data operand of the arithmetic operation. \

      The input data will be used by default if x2 is not specified. You may also specify ‘data’ to explicitly indicate that the input data should be assigned to x2. You may specify data0 to indicate that the output of another data operation should be used. Note, data0 here refers to the input to the full data operation pipeline. Data from other parts of the pipeline, are then indexed using 1-based indices. E.g,. to access the output of the first data operation set x2=’data0’

    • ... any additional parameters needed for the numpy function.

  • ‘singleDataTransform’ : Apply scaling transformation to the data. Additional parameters \

    required for this options are. NOTE: operation==’log or operation==’sqrt’: If the minimum value is 0 then the transformation is applied topositive values only and 0 values remain as is. If the minimum value is larger then 0, then the log-scale is applied as is, i.e., np.log(data). If the minimum data value is negative, then the log scale is applied independently to the positive values and the negative values, ie., outdata[posvalues] = np.log(data[posvalues]) outdata[negvalues] = np.log(data[negvalues]*-1.)*-1.

    • ‘operation’ : String defining the scaling operations to be applied. See the transformation_allowed_numpy_single_data list for a complete list of allowed scaling operations. Some of the more commonly used scalingoperations include: ‘abs’, ‘log’, ‘sqrt’, ‘around’ etc.

    • ‘x1’ : The first data operand for the scaling.

      The input data will be used by default if x1 is not specified. You may also specify ‘data’ to explicitly indicate that the input data should be assigned to x1.

    Additional optional keyword arguments depending on the used operation:

    • ‘decimals’ : Number of decimal places to round to when using numpy.around or numpy.round \

      (default: 0). If decimals is negative, it specifies the number of positions to the left of the decimal point.

    • ‘a_min’, ‘a_max’ : Lower and upper bound when using numpy.clip.

    • ‘axis’, ‘kind’, ‘order’ : Additional optional arguments for numpy.argsort and numpy.sort.

    • ...

  • ‘scale’ : Same as ‘singleDataTransform’. See ‘singleDataTransform’ for details.

  • ‘threshold’ : Threshold the data. Set all values that are smaller than threshold \

    to 0. Additional parameters required for this option are:

    • ‘threshold’. If threshold is missing, then the threshold will be

    set ot the 5%’ile so that the bottom 5% of the data will be set to 0.

  • ‘astype’ : Change the type of the data. Additional required parameters are:

    • ‘dtype’ : The numpy data type to be used. Default dtype=’float’.

omsi_web_helper Module

Module with helper functions for interactions with the OpenMSI web infrastructure, e.g. update job status, explicitly add a file to the OpenMSI database, update file permissions so that Apache can access it etc.

class omsi.shared.omsi_web_helper.UserInput

Bases: object

Collection of helper functions used to collect user input

static userinput_with_timeout(timeout, default='')

Read user input. Return default value given after timeout. This function decides which platform-dependent version should be used to retrieve the user input.

Parameters:
  • timeout – Number of seconds till timeout
  • default (String) – Default string to be returned after timeout
Returns:

String

static userinput_with_timeout_default(timeout, default='')

Read user input. Return default value given after timeout.

Parameters:
  • timeout – Number of seconds till timeout
  • default (String) – Default string to be returned after timeout
Returns:

String

static userinput_with_timeout_windows(timeout, default='')
Read user input. Return default value given after timeout.
This function is used when running on windows-based systems.
Parameters:
  • timeout – Number of seconds till timeout
  • default (String) – Default string to be returned after timeout
Returns:

String

class omsi.shared.omsi_web_helper.WebHelper

Bases: object

Class providing a collection of functions for web-related file conversion tasks, e.g, : i) adding files to the web database, ii) notifying users via email, iii) setting file permissions for web-access.

allowed_nersc_locations = ['/project/projectdirs/openmsi/omsi_data_private', '/global/project/projectdirs/openmsi/omsi_data_private', '/data/openmsi/omsi_data']
default_db_server_url = 'https://openmsi.nersc.gov/'
static register_file_with_db(filepath, db_server, file_user_name, jobid=None, check_add_nersc=True)

Function used to register a given file with the database

Parameters:
  • filepath – Path of the file to be added to the database
  • db_server – The database server url
  • file_user_name – The user to be used, or None if the user should be determined based on the file URL.
  • jobid – Optional input parameter defining the jobid to be updated. If the jobid is given then the job will be updated with the database instead of adding the file explicitly. I.e., instead of register_filer_with_db the update_job_status call is executed.
Returns:

Boolean indicating whether the operation was successful

static send_email(subject, body, sender='convert@openmsi.nersc.gov', email_type='success', email_success_recipients=None, email_error_recipients=None)

Send email notification to users.

Parameters:
  • subject – Subject line of the email
  • body – Body text of the email.
  • sender – The originating email address
  • email_type – One of ‘success, ‘error’, ‘warning’. Error messages are sent to ConvertSettings.email_error_recipients, success messages to ConvertSettings.email_success_recipients and warning messages are sent to both lists.
  • email_success_recipients – List of user that should receive an email if the status is success or warning.
  • email_error_recipients – List of users that should receive an email if the status is error or warning.
static set_apache_acl(filepath)

Helper function used to set acl permissions for apache to make the given file accesible to Apache at NERSC. This necessary to make the file readable for adding it to the database.

super_users = ['bpb', 'oruebel']
static update_job_status(filepath, db_server, jobid, status='complete')

Function used to update the status of the job on the server

Parameters:
  • filepath – Path of the file to be added to the database (only needed update file permissions)
  • db_server – The database server url
  • jobid – The id of the current job.
  • status – One of ‘running’, ‘complete’ or ‘error’

spectrum_layout Module

This module provides capabilities for computing different layouts for spectra

omsi.shared.spectrum_layout.compute_hilbert_spectrum(original_coords, original_intensities, left=0, right=0)

Given a 1D spectrum, interpolate the spectrum onto the closest 2D hilbert curve.

Parameters:
  • original_coords (1D numpy array in increasing order.) – The original coordinate values (m/z). Values must be increasing.
  • original_intensities (1D numpy array of same length as original_coords) – The original intensity values. Same length as original_coords.
  • left – Optional. Value to be used for padding data at the lower bound during interpolation
  • right – Optional. Value to be used for padding data at the upper bound during interpolation
Type:

float

Type:

float

Returns:

2D numpy array with the coordinate (m/z) values for the hilbert spectrum and separate 2D numpy array for the interpolated intensity values.

Raises:

ValueError If original_coords and original_intensities have different length.

omsi.shared.spectrum_layout.hilbert_curve(order=2)

Compute a 2D hilbert curve.

Parameters:order (Integer that defines a power of 2 (>=2)) – The order of the hilber curve. This is the length of the sides of the square, i.e., the number of points in x and y.
Returns:Returns two numpy arrays of integers x,y, indicating the locations of the vertices of the hilbert curve.
omsi.shared.spectrum_layout.plot_2d_spectrum_as_image(hilbert_intensities, show_plot=False, show_axis=False)

Plot image with pixels colored according to hilbert_intensities.

Parameters:
  • hilbert_intensities (2D numpy array.) – 2D numpy array with the intensity values for the spectrum.
  • show_plot (Boolean) – Show the generated plot in a window.
  • show_axis (Boolean) – Show x,y axis for the plot. Default is False.
Returns:

matplotlib image plot or None in case that the plotting failed.

omsi.shared.spectrum_layout.reinterpolate_spectrum(coords, original_coords, original_intensitities, left=0, right=0)

Given a 1D spectrum, interpolate the spectrum onto a new axis.

Parameters:
  • coords – The coordinate values (m/z) for which intensities should be computed.
  • original_coords – The original coordinate values (m/z). Values must be increasing.
  • original_intensitities – The original intensity values. Same length as original_coords.
  • left – Optional. Value to be used if coords < orignal_coords
  • right – Optional. Value to be used if coords > orignal_coords
Returns:

y : {float, ndarray} The interpolated values, same shape as coords.

Raises:

ValueError If original_coords and original_intensities have different length.

log Module

Module providing functionality for logging based on the python logging module. The module is intended toease the use of logging while a developer can still access the standard python logging mechanism if needed.

class omsi.shared.log.log_helper

Bases: object

BASTet helper module to ease the use of logging

Class Variables:

Variables:log_levels – Dictionary describing the different available logging levels.
classmethod critical(module_name, message, root=0, comm=None, *args, **kwargs)

Create a critical log entry. This function is typically called as:

log_helper.critical(module_name=__name__, message=”your message”)

Parameters:
  • module_name – __name__ of the calling module or None in case the ROOT logger should be used.
  • message – The message to be added to the log
  • root – The root process to be used for output when running in parallel. If None, then all calling ranks will perform logging. Default is 0.
  • comm – The MPI communicator to be used to determin the MPI rank. None by default, in which case mpi.comm_world is used.
  • args – Additional positional arguments for the python logger.debug function. See the python docs.
  • kwargs – Additional keyword arguments for the python logger.debug function. See the python docs.
classmethod debug(module_name, message, root=0, comm=None, *args, **kwargs)

Create a debug log entry. This function is typically called as:

log_helper.debug(module_name=__name__, message=”your message”)

Parameters:
  • module_name – __name__ of the calling module or None in case the ROOT logger should be used.
  • message – The message to be added to the log
  • root – The root process to be used for output when running in parallel. If None, then all calling ranks will perform logging. Default is 0.
  • comm – The MPI communicator to be used to determin the MPI rank. None by default, in which case mpi.comm_world is used.
  • args – Additional positional arguments for the python logger.debug function. See the python docs.
  • kwargs – Additional keyword arguments for the python logger.debug function. See the python docs.
classmethod error(module_name, message, root=0, comm=None, *args, **kwargs)

Create a error log entry. This function is typically called as:

log_helper.error(module_name=__name__, message=”your message”)

Parameters:
  • module_name – __name__ of the calling module or None in case the ROOT logger should be used.
  • message – The message to be added to the log
  • root – The root process to be used for output when running in parallel. If None, then all calling ranks will perform logging. Default is 0.
  • comm – The MPI communicator to be used to determin the MPI rank. None by default, in which case mpi.comm_world is used.
  • args – Additional positional arguments for the python logger.debug function. See the python docs.
  • kwargs – Additional keyword arguments for the python logger.debug function. See the python docs.
classmethod exception(module_name, message, root=0, comm=None, *args, **kwargs)

Create a exception log entry. This function is typically called as:

log_helper.exception(module_name=__name__, message=”your message”)

Parameters:
  • module_name – __name__ of the calling module or None in case the ROOT logger should be used.
  • message – The message to be added to the log
  • root – The root process to be used for output when running in parallel. If None, then all calling ranks will perform logging. Default is 0.
  • comm – The MPI communicator to be used to determin the MPI rank. None by default, in which case mpi.comm_world is used.
  • args – Additional positional arguments for the python logger.debug function. See the python docs.
  • kwargs – Additional keyword arguments for the python logger.debug function. See the python docs.
classmethod get_default_format()

Get default formatting string.

classmethod get_logger(module_name)

Get the logger for a particular module. The module_name should always be set to the __name__ variable of the calling module.

Parameters:module_name – __name__ of the calling module or None in case the ROOT logger should be used.
Returns:Python logging.Logger retrieved via logging.getLogger.
global_log_level = 20
classmethod info(module_name, message, root=0, comm=None, *args, **kwargs)

Create a info log entry. This function is typically called as:

log_helper.info(module_name=__name__, message=”your message”)

Parameters:
  • module_name – __name__ of the calling module or None in case the ROOT logger should be used.
  • message – The message to be added to the log
  • root – The root process to be used for output when running in parallel. If None, then all calling ranks will perform logging. Default is 0.
  • comm – The MPI communicator to be used to determin the MPI rank. None by default, in which case mpi.comm_world is used.
  • args – Additional positional arguments for the python logger.debug function. See the python docs.
  • kwargs – Additional keyword arguments for the python logger.debug function. See the python docs.
initialized = False
classmethod log(module_name, message, root=0, comm=None, level=None, *args, **kwargs)

Convenience function used to select the log message level using an input parameter rathern than by selecting the approbriate function.

Parameters:
  • module_name – __name__ of the calling module or None in case the ROOT logger should be used.
  • message – The message to be added to the log
  • root – The root process to be used for output when running in parallel. If None, then all calling ranks will perform logging. Default is 0.
  • comm – The MPI communicator to be used to determine the MPI rank. None by default, in which case mpi.comm_world is used.
  • level – To which logging level should we send the message
  • args – Additional positional arguments for the python logger.debug function. See the python docs.
  • kwargs – Additional keyword arguments for the python logger.debug function. See the python docs.
log_levels = {'INFO': 20, 'WARNING': 30, 'CRITICAL': 50, 'ERROR': 40, 'DEBUG': 10, 'NOTSET': 0}
classmethod log_var(module_name, root=0, comm=None, level=None, **kwargs)

Log one or more variable values

Parameters:
  • module_name – __name__ of the calling module or None in case the ROOT logger should be used.
  • message – The message to be added to the log
  • root – The root process to be used for output when running in parallel. If None, then all calling ranks will perform logging. Default is 0.
  • comm – The MPI communicator to be used to determin the MPI rank. None by default, in which case mpi.comm_world is used.
  • kwargs – Variables+values to be logged
classmethod set_log_level(level)

Set the logging level for all BASTet loggers

Parameters:level – The logging levels to be used, one of the values specified in log_helper.log_levels.
classmethod setup_logging(level=None)

Call this function at the beginning of your code to initiate logging.

Parameters:level – The default log level to be used. One of log_helper.log_level.
classmethod warning(module_name, message, root=0, comm=None, *args, **kwargs)

Create a warning log entry. This function is typically called as:

log_helper.warning(module_name=__name__, message=”your message”)

Parameters:
  • module_name – __name__ of the calling module or None in case the ROOT logger should be used.
  • message – The message to be added to the log
  • root – The root process to be used for output when running in parallel. If None, then all calling ranks will perform logging. Default is 0.
  • comm – The MPI communicator to be used to determin the MPI rank. None by default, in which case mpi.comm_world is used.
  • args – Additional positional arguments for the python logger.debug function. See the python docs.
  • kwargs – Additional keyword arguments for the python logger.debug function. See the python docs.

mpi_helper Module

Module used to ease the use of MPI and distributed parallel implementations using MPI

omsi.shared.mpi_helper.barrier(comm=None)

MPI barrier operation or no-op when running without MPI

Parameters:comm – MPI communicator. If None, then MPI.COMM_WORLD will be used.
omsi.shared.mpi_helper.broadcast(data, comm=None, root=0)

MPI broadcast operation to broadcast data from one rank to all other ranks

Parameters:
  • data – The data to be gathered
  • comm – MPI communicator. If None, then MPI.COMM_WORLD will be used.
  • root – The rank where the data is sned from
Returns:

The data object

omsi.shared.mpi_helper.gather(data, comm=None, root=0)

MPI gather operation or return a list with just [data,] if MPI is not available

Parameters:
  • data – The data to be gathered
  • comm – MPI communicator. If None, then MPI.COMM_WORLD will be used.
  • root – The rank where the data should be collected to. Default value is 0
Returns:

List of data objects from all the ranks

omsi.shared.mpi_helper.get_comm_world()

Get MPI.COMM_WORLD :return: mpi communicator or None if MPI is not available

omsi.shared.mpi_helper.get_rank(comm=None)

Get the current process rank :param comm: MPI communicator. If None, then MPI.COMM_WORLD will be used. :return: The integer index of the rank

omsi.shared.mpi_helper.get_size(comm=None)

Get the size of the current communication domain/ :param comm: MPI communicator. If None, then MPI.COMM_WORLD will be used. :return: The integer index of the rank

omsi.shared.mpi_helper.imports_mpi(python_object)

Check whether the given class import mpi

The implementation inspects the source code of the analysis to see if MPI is imported by the code.

omsi.shared.mpi_helper.is_mpi_available()

Check if MPI is available. Same as MPI_AVAILABLE :return: bool indicating whether MPI is available

omsi.shared.mpi_helper.mpi_type_from_dtype(dtype)

Ge the the corresponding MPI type for the given basic numpy dtype

Parameters:dtype – Basic numpy dtype to be mapped to the MPI type
Returns:The MPI type or None if not found
class omsi.shared.mpi_helper.parallel_over_axes(task_function, task_function_params, main_data, split_axes, main_data_param_name, schedule='STATIC_1D', root=0, comm=None)

Bases: object

Helper class used to parallelize the execution of a function using MPI by splitting the input data into sub-blocks along a given set of axes.

Variables:
  • task_function – The function we should run.
  • task_function_params – Dict with the input parameters for the function. may be None or {} if no parameters are needed.
  • main_data – Dataset over which we should parallelize
  • split_axes – List of integer axis indicies over which we should parallelize
  • main_data_param_name – The name of data input parameter of the task function
  • root – The master MPI rank (Default=0)
  • schedule – The task scheduling schema to be used (see parallel_over_axes.SCHEDULES
  • collect_output – Should we collect all the output from the ranks on the master rank?
  • schedule – The parallelization schedule to be used. See also parallel_over_axes.schedule
  • result – The result form the task_function. If self.__data_collected is set and we are the root then this will a list with the the output of all tasks
  • blocks – List with tuples describing the selected subset of data processed by the given block task. If self.__data_collected is set and we are the root rank then this is a list of all the blocks processed by each rank.
  • block_times – List of times in seconds used to process the data block with the given index. NOTE: The block times include also any required communications and other operations to initialize and complete the task, and not just the execution of the task function itself.
  • run_time – Float time in seconds for executing the run function.
  • comm – The MPI communicator used for the parallelization. Default value is MPI.COMM_WORLD
Parameters:
  • task_function – The function we should run.
  • task_function_params – Dict with the input parameters for the function. May be None or {} if no parameters are needed.
  • main_data – Dataset over which we should parallelize
  • split_axes – List of integer axis indicies over which we should parallelize
  • main_data_param_name – The name of data input parameter of the task function
  • root – The master MPI rank (Default=0)
  • schedule – The task scheduling schema to be used (see parallel_over_axes.SCHEDULES
  • comm – The MPI communicator used for the parallelization. Default value is None, in which case MPI.COMM_WORLD is used
MPI_MESSAGE_TAGS = {'BLOCK_MSG': 12, 'COLLECT_MSG': 13, 'RANK_MSG': 11}
SCHEDULES = {'DYNAMIC': 'DYNAMIC', 'STATIC_1D': 'STATIC_1D', 'STATIC': 'STATIC'}
collect_data(force_collect=False)

Collect the results from the parallel execution to the self.root rank.

NOTE: On the root the self.result, self.blocks, and self.block_times variables are
updated with the collected data as well and self.__data_collected will be set
NOTE: If the data has already been collected previously (ie., collect_data has been called
before), then the collection will not be performed again, unless force_collect is set.
Parameters:force_collect – Set this parameter to force that data collection is performed again. By default the collect_data is performed only once for each time the run(..) function is called and the results are reused to ensure consistent data structures. We can force that collect will be reexecuted anyways by setting force_collect.
Returns:On worker ranks (i.e., MPI_RANK!=self.root) this is simply the self.result and self.blocks containing the result created by run function. On the root rank (i.e., MPI_RANK!=self.root) this is a tuple of two lists containing the combined data of all self.result and self.blocks from all ranks respectively.
run()

Call this function to run the function in parallel.

Returns:Tuple with the following elements:
  1. List with the results from the local execution of the task_function. Each entry is the result from one return of the task_function. In the case of static execution, this is always a list of length 1.
  2. List of block_indexes. Each block_index is a tuple with the selection used to divide the data into sub-blocks. In the case of static decomposition we have a range slice object along the axes used for decomposition whereas in the case of dynamic scheduling we usually have single integer point selections for each task.
omsi.shared.mpi_helper.test_mpi_available()

This function import MPI in a seperate process to safely check if MPI is available. This precaution is necessary as on Cray systems importing MPI can lead to a crash on, e.g., login nodes where the use of MPI is not permitted. By executing the import in a separate process we avoid crashing the main process and we can safely check whether the process aborted or not.

Returns:False if the import failed, otherwise return True