mspasspy.algorithms

MTPowerSpectrumEngine

class mspasspy.algorithms.MTPowerSpectrumEngine.MTPowerSpectrumEngine(winsize, tbp, number_tapers, nfft=0, iadapt=0)[source]

Bases: object

Wrapper class to use German Prieto’s multitaper package as a plug in replacement for the MsPASS class of the same name. The MsPASS version is in C++ and there are some collisions in concept. It should be used only in a python script where it might be useful to use features of Prieto’s library not available in the cruder implementation in MsPASS. (e.g. adaptive weights in multitaper power spectrum estimation).

The biggest collision in concept is that the MsPASS version with this class name was designed to be created and moved around to avoid the overhead of recreating the Slepian tapers on each use. Prieto’s class creates these in the constructor but caches a number of things when first created that allows secondary calls to run faster. The implementation attempts to overcome this collision by caching an instance of Prieto’s mstspec class on the first call to the apply method. Secondary calls to apply test for consistency with the previous call and only recreate the mtspec instance if something that invalidates the previous call has changed.

Parameters:: winsize – Number of samples expected for data window. Note this

argument defines the size of the eigentapers so it defines the size of input time series the object expect to process. :param tbp: multitaper time-bandwidth product :param number_tapers: number of tapers to use for estimators (See Prieto’s) documentation for details) :param nfft: optional size of the fft work arrays to use. nfft should be greater than or equal winsize. When greater zero padding is used in the usual way. Default is 0 which sets nfft to 2*winsize. If nfft<winsize a diagnostic message will be printed and nfft will be set equal to winsize. :param iadapt: integer argument passed to Prieto’s MTspec that sets the eigentaper weighting scheme. See Prieto’s MTspect class documentation for options. Default is 0 which enables the adaptive weighting scheme.

apply(d, dt=1.0)[source]: need to support vector input or a TimeSeries - returns a PowerSpectrum

frequencies()[source]: Return a numpy array of the frequencies

number_tapers() → int[source]: Return the number of tapers defined for this operator.

set_df(df)[source]

Explicit setter for frequency bin interval. Needed when changing sample interval for input data. Rarely of use but included for compatibility with the C++ class with the same name.

This maybe should just be pass or throw an error.

time_bandwidth_product() → float[source]: Return time bandwidth product for this operator.

RFdeconProcessor

This file contains a class definition for a wrapper for the suite of scalar deconvolution methods supported by mspass. It demonstrates the concept of a processing object created by wrapping C code.

Created on Fri Jul 31 06:24:10 2020

@author: Gary Pavlis

mspasspy.algorithms.RFdeconProcessor.RFdecon(d, alg='LeastSquares', pf='RFdeconProcessor.pf', wavelet=None, noisedata=None, wcomp=2, ncomp=2, object_history=False, alg_name='RFdecon', alg_id=None, dryrun=False)[source]

Use this function to compute conventional receiver functions from a single three component seismogram. In this function, an instance of wrapper class RFdeconProcessor will be built and initialized with alg and pf.

Default assumes d contains all data sections required to do the deconvolution with the wavelet in component 2 (3 for matlab and FORTRAN people). By default the data and noise (if required by the algorithm) sections will be extracted from the (assumed larger) data section using time windows defined internally in the processor pf definition. For variations (e.g. adding tapering to one or more of the time series inputs) use the d, wavelet, and (if required) noise arguments to load each component separately. Note d is dogmatically required to be three component data while optional wavelet and noisedata series are passed as plain numpy vectors (i.e. without the decoration of a TimeSeries).

To make use of the extended outputs from RFdeconProcessor algorithms (e.g. actual output of the computed operator) call those methods after this function returns successfully with a three-component seismogram output. That is possible because the processor object caches the most recent wavelet and inverse used for the deconvolution. An exception is that all algorithms call their QCmetrics method of processor and push them to the headers of the deconvolved output. QCmetric attributes are algorithm dependent.

The ProcessingHistory feature can optionally be enabled by setting the save_history argument to True. When enable one should normally set a unique id for the algid argument.

Parameters:: d – Seismogram input data. See notes above about

time span of these data. :param alg: The algorithm to be applied, used for initializing

a RFdeconProcessor object

Parameters:

pf – The pf file to be parsed, used for inititalizing a RFdeconProcessor
wavelet – vector of doubles (numpy array or the std::vector container internal to TimeSeries object) defining the wavelet to use to compute deconvolution operator. Default is None which assumes processor was set up to use a component of d as the wavelet estimate.
noisedata – vector of doubles (numpy array or the std::vector container internal to TimeSeries object) defining noise data to use for computing regularization. Not all RF estimation algorithms use noise estimators so this parameter is optional. It can also be extracted from d depending on parameter file options.
wcomp –
When defined from Seismogram d the wavelet estimate in conventional RFs is one of the components that are most P wave dominated. That is always one of three things: Z, L of LQT, or the L component from the output of Kennett’s free surface transformation operator. The default is 2, which for ccore.Seismogram is always one of the above. This parameter would be changed only if the data has undergone some novel transformation not yet invented and the best wavelet estimate was on in 2 (3 with FORTRAN and matlab numbering). :param ncomp: component number to use to compute noise. This is used only if the algorithm in processor requires a noise estimate. Normally it should be the same as wcomp and is by default (2). :param object_history: boolean to enable or disable saving object

level history. Default is False. Note this functionality is implemented via the mspass_func_wrapper decorator.

param alg_name:

When history is enabled this is the algorithm name assigned to the stamp for applying this algorithm. Default (“WindowData”) should normally be just used. Note this functionality is implemented via the mspass_func_wrapper decorator.

param ald_id:

algorithm id to assign to history record (used only if object_history is set True.) Note this functionality is implemented via the mspass_func_wrapper decorator.

param dryrun:

When true only the arguments are checked for validity. When true nothing is calculated and the original data are returned. Note this functionality is implemented via the mspass_func_wrapper decorator.

Returns:

Seismogram object containing the RF estimates. The orientations are always the same as the input.

class mspasspy.algorithms.RFdeconProcessor.RFdeconProcessor(alg='LeastSquares', pf='RFdeconProcessor.pf')[source]

Bases: object

This class is a wrapper for the suite of receiver function deconvolution methods we call scalar methods. That is, the operation is reducable to two time series: wavelet signal and the data (TimeSeries) signal. That is in contrast to three component methods that always treat the data as vector samples. The class should be created as a global processor object to be used in a spark job. The design assumes the processor object will be passed as an argument to the RFdecon function that should appear as a function in a spark map call.

QCMetrics()[source]: All decon algorithms compute a set of algorithm dependent quality control metrics. This method returns the metrics as a set of fixed name:value pairs in a mspass.Metadata object. The details are algorithm dependent. See related documentation for metrics computed by different algorithms.

actual_output()[source]

The actual output of a decon operator is the inverse filter applied to the wavelet. By design it is an approximation of the shaping wavelet defined for this operator.

Returns:: Actual output of the operator as a ccore.TimeSeries object.

The Metadata of the return is bare bones. The most important factor about this result is that because actual output waveforms are normally a zero phase wavelet of some kind the result is time shifted to be centered (i.e. t0 is rounded n/2 where n is the length of the vector

returned).

apply()[source]

Compute the RF estimate using the algorithm defined internally.

Returns:: vector of data that are the RF estimate computed from previously loaded data.

change_parameters(md)[source]

Use this method to change the internal parameter setting of the processor. It can be used, for example, to switch from the damped least square method to the water level method. Note the input must be a complete definition for a parameter set defining a particular algorithm. i.e. this is not an update method but t reinitializes the processor.

Parameters:: md – is a mspass.Metadata object containing required parameters

for the alternative algorithm.

property dwin

ideal_output()[source]: The ideal output of a decon operator is the same thing we call a shaping wavelet. This method returns the ideal output=shaping wavelet as a TimeSeries object. Like the actual output method the return function is circular shifted so the function peaks at 0 time located at n/2 samples from the start sample. Graphic displays will then show the wavelet centered and peaked at time 0. The prediction error can be computed as the difference between the actual_output and ideal_output TimeSeries objects. The norm of the prediction error is a helpful metric to display the stability and accuracy of the inverse.

inverse_filter()[source]

This method returns the actual inverse filter that if convolved with he original data will produce the RF estimate. Note the filter is meaningful only if the source wavelet is minimum phase. A standard theorem from time series analysis shows that the inverse of mixed phase wavelet is usually unstable and a maximum phase wavelet is always unstable. Fourier-based methods can still compute a stable solution even with a mixed phase wavelet because of the implied circular convolution.

The result is returned as TimeSeries object.

loaddata(d, dtype='Seismogram', component=0, window=False)[source]

Loads data for processing. When window is set true use the internal pf definition of data time window and window the data. The dtype parameter changes the behavior of this algorithm significantly depending on the setting. It can be one of the following: Seismogram, TimeSeries, or raw_vector. For the first two the data to process will be extracted in a pf specfied window if window is True. If window is False TimeSeries data will be passed directly and Seismogram data will have the data defined by the component parameter copied to the internal data vector workspace. If dtype is set to raw_vector d is assumed to be a raw numpy vector of doubles or an the aliased std::vector used in ccore, for example, in the TimeSeries object s vector. Setting dtype to raw_vector and window True will result in this method throwing a RuntimeError exception as the combination is not possible since raw_vector data have no time base.

Parameters:: d – input data (contents expected depend upon

value of dtype parameter). :param dtype: string defining the form d is expected

to be (see details above)

Parameters:

component – component of Seismogram data to load as data vector. Ignored if dtype is raw_vector or TimeSeries.
window – boolean controlling internally defined windowing. (see details above)

Returns:

Nothing (not None nothing) is returned

loadnoise(n, dtype='Seismogram', component=2, window=False)[source]

loadwavelet(w, dtype='Seismogram', component=2, window=False)[source]

property nwin

property uses_noise

basic

mspasspy.algorithms.basic.ExtractComponent(data, component, *args, object_history=False, alg_name=None, alg_id=None, dryrun=False, inplace_return=False, function_return_key=None, **kwargs)[source]

Extract single component from three-component data.

The function creates a scalar TimeSeries object from a three component Seismogram object Or a TimeSeriesEnsemble object from a SeismogramEnsemble object

Parameters:

data (either Seismogram or SeismogramEnsemble) – data object to extract from.
component (int) – the index of component that will be extracted, it can only be 0, 1, or 2
object_history – True to preserve the processing history. For details, refer to mspass_func_wrapper.
alg_name (str) – alg_name is the name the func we are gonna save while preserving the history.
alg_id (bson.objectid.ObjectId) – alg_id is a unique id to record the usage of func while preserving the history.
dryrun – True for dry-run, which return “OK”. Used in the mspass_func_wrapper.
inplace_return – True to return data in mspass_func_wrapper. This is set false to handle exception directly in the function, without passing it to mspass_func_wrapper.
function_return_key – Some functions one might want to wrap with this decorator return something that is appropriate to save as Metadata. If so, use this argument to define the key used to set that field in the data that is returned.

mspasspy.algorithms.basic.ator(data, tshift, *args, object_history=False, alg_name=None, alg_id=None, dryrun=False, inplace_return=True, function_return_key=None, **kwargs)[source]

Absolute to relative time conversion.

Sometimes we want to convert data from absolute time (epoch times) to a relative time standard. Examples are conversions to travel time using an event origin time or shifting to an arrival time reference frame. This operation simply switches the tref variable and alters t0 by tshift.

Parameters:

data (either mspasspy.ccore.seismic.TimeSeries or mspasspy.ccore.seismic.Seismogram) – data object to be converted.
tshift (float) – time shift applied to data before switching data to relative time mode.
object_history – True to preserve the processing history. For details, refer to mspass_func_wrapper.
alg_name (str) – alg_name is the name the func we are gonna save while preserving the history.
alg_id (bson.objectid.ObjectId) – alg_id is a unique id to record the usage of func while preserving the history.
dryrun – True for dry-run, which return “OK”. Used in the mspass_func_wrapper.
inplace_return – True to return data in mspass_func_wrapper. This is necessary to be used in mapreduce.
function_return_key – Some functions one might want to wrap with this decorator return something that is appropriate to save as Metadata. If so, use this argument to define the key used to set that field in the data that is returned.

mspasspy.algorithms.basic.cosine_taper(data, t0head, t1head, t1tail, t0tail, *args, object_history=False, alg_name=None, alg_id=None, dryrun=False, inplace_return=True, function_return_key=None, **kwargs)[source]

Taper front and/or end of a data object with a half cosine function.

A cosine taper is a common, simple approach to taper data. When applied at the front it defnes a half cycle of a cosine curve +1.0 in range -pi to 0. On the right it defines the same function for the range 0 to pi. The period of the left and right operator can be different. Turn off left or right by giving illegal start and end points and the operator will silently be only one sided.

Parameters:

data (either TimeSeries or Seismogram) – data object to be processed.
t0head (float) – t0 of the head taper
t1head (float) – t1 of the head taper
t1tail (float) – t1 of the tail taper
t0tail (float) – t0 of the tail taper
object_history – True to preserve the processing history. For details, refer to mspass_func_wrapper.
alg_name (str) – alg_name is the name the func we are gonna save while preserving the history.
alg_id (bson.objectid.ObjectId) – alg_id is a unique id to record the usage of func while preserving the history.
dryrun – True for dry-run, which return “OK”. Used in the mspass_func_wrapper.
inplace_return – True to return data in mspass_func_wrapper. This is necessary to be used in mapreduce.
function_return_key – Some functions one might want to wrap with this decorator return something that is appropriate to save as Metadata. If so, use this argument to define the key used to set that field in the data that is returned.

mspasspy.algorithms.basic.free_surface_transformation(data, uvec=None, vp0=None, vs0=None, ux_key='ux', uy_key='uy', vp0_key='vp0', vs0_key='vs0', *args, object_history=False, alg_name=None, alg_id=None, dryrun=False, inplace_return=True, function_return_key=None, **kwargs)[source]

Computes and applies the Kennett [1991] free surface transformation matrix.

Kennett [1991] gives the form for a free surface transformation operator that defines a transformation matrix that provides optimal separation of P, SV, and SH for a receiver at the free surface of an isotropic medium with constant P and S velocity. The transformation is only valid if the incident wavefield is not evanescent (i.e. at or past the SV critical angle). It is important to realize the tranformation matrix does not define an orthogonal transformation. It is also, of course, only an approximation since it is strictly correct only for a constant velocity medium. The tranformation operator requires: (1) a slowness vector, (2) an estimate of surface P wave velocity, and (3) an estimate of surface shear wave velocity. In all cases the input for the slowness vector uses a local cartesian system with the vector specified as component ux and uy. ux is the component of the slowness of an incident wavefield in the geographic x direction, which means positive at local east. uy is the complementary component for the y direction defined by local north. The components of the slowness vector are assumed to be in units of s/km. (Warning: obspy’s and other implementations of the commonly used tau-p calculator return slowness (ray parameter) in spherical units of s/radian - r sin(theta)/v)

The required parameters (slowness vector and velocities) can be passed to the function one of two ways. Because it is much simpler to implement in a map operator the default expect those parameters to be set in the data object’s Metadata container. The default keys for fetching each attribute are defined by the four arguments “ux_key”, “uy_key”, “vp0_key”, and “vs0_key”. All four have standard defaults defined in the function signature and below. The Metadata fetching algorithm can be overridden by defining the same data through the three optional arguments with the key names “uvec”, “vp0”, and “vs0”. (see below for detailed descriptions)

The switching between Metadata fetching and a constant arguments is not all or none. If a slowness vector is defined through uvec it will always override metadata. Handing of vp0 and vs0 is independent. i.e. you can define uvec and not define vp0 and vs0 or conversely you can (more commonly) define vp0 and vs0 but not define uvec. In all cases not defining an arg is a signal to fetch it from Metadata.

When this function is applied to ensembles and the Metadata fetch approach is used (default) the function assumes all ensemble members have the required metadata keys defined. Any that do not are killed. The same happens for atomic data passed to this function if any of the required keys are missing. In fact, you should realize the ensemble algorithm simply applies this function in a recursion over all the members.

The output components are in the order defined in Kennett’s original paper. The order is 0=SH, 1=SV, 2=L.

Parameters:: data – data object to be transformed. For ensembles the transformation

is applied to all members. Note the Metadata fetch mechanism is the only recommended way to handle ensembles. An elog message will be posted to the ensemble’s elog container if you try to use a constant slowness vector passed via uvec. It will not warn about constant vp0 and vs0 as that case is common. :type data: Seismogram or SeismogramEnsemble :param ux_key: key to use to fetch EW component of slowness vector from Metadata container. Default is “ux”. :type ux_key: string :param uy_key: key to use to fetch NS component of slowness vector from Metadata container. Default is “uy”. :type uy_key: string :param vp0_key: key to use to fetch free surface P wave velocity from Metadata container. Default is “vp0”. :type vp0_key: string :param vs0_key: key to use to fetch free surface S wave velocity from Metadata container. Default is “vs0”. :type vs0_key: string :param uvec: slowness vector of the incident wavefield defined via custom C++ class SlownessVector. Default is None which is taken as a signal to fetch the slowness vector components from Metadata using ux_key and uy_key. :type uvec: SlownessVector :param vp0: Surface P wave velocity. Default is None which is taken as a signal to fetch this quantity from Metadata using the vp0_key. :type vp0: float :param vs0: Surface S wave velocity. Default is None which is taken as a signal to fetch this quantity from Metadata using the vs0_key. :type vs0: float :param object_history: True to preserve the processing history. For details, refer to

mspass_func_wrapper.

Parameters:

alg_name (str) – alg_name is the name the func we are gonna save while preserving the history.
alg_id (bson.objectid.ObjectId) – alg_id is a unique id to record the usage of func while preserving the history.
dryrun – True for dry-run, which return “OK”. Used in the mspass_func_wrapper.
inplace_return – True to return data in mspass_func_wrapper. This is necessary to be used in mapreduce.
function_return_key – Some functions one might want to wrap with this decorator return something that is appropriate to save as Metadata. If so, use this argument to define the key used to set that field in the data that is returned.

mspasspy.algorithms.basic.linear_taper(data, t0head, t1head, t1tail, t0tail, *args, object_history=False, alg_name=None, alg_id=None, dryrun=False, inplace_return=True, function_return_key=None, **kwargs)[source]

Taper front and/or end of a data object with a linear taper.

Linear tapers are defined here as a time spanning a ramp running from 0 to 1. Data will be zeroed on each end of a 0 mark and a linear weight applied between 0 points and 1 points. Postive ramp slope on left and negative slope ramp on right. Setting t0 == t1 will disable the taper on the specified end (e.g., t0head == t1head).

Parameters:

data (either TimeSeries or Seismogram) – data object to be processed.
t0head (float) – t0 of the head taper
t1head (float) – t1 of the head taper
t1tail (float) – t1 of the tail taper
t0tail (float) – t0 of the tail taper
object_history – True to preserve the processing history. For details, refer to mspass_func_wrapper.
alg_name (str) – alg_name is the name the func we are gonna save while preserving the history.
alg_id (bson.objectid.ObjectId) – alg_id is a unique id to record the usage of func while preserving the history.
dryrun – True for dry-run, which return “OK”. Used in the mspass_func_wrapper.
inplace_return – True to return data in mspass_func_wrapper. This is necessary to be used in mapreduce.
function_return_key – Some functions one might want to wrap with this decorator return something that is appropriate to save as Metadata. If so, use this argument to define the key used to set that field in the data that is returned.

mspasspy.algorithms.basic.rotate(data, rotate_param, *args, object_history=False, alg_name=None, alg_id=None, dryrun=False, inplace_return=True, function_return_key=None, **kwargs)[source]

Rotate data using a P wave type coordinate definition.

This function can apply three different types of rotation depending on the type of parameter given. If a SphericalCoordinate is given, it will rotate the data into a coordinate system defined by the direction defined by the spherical coordinate. The data are rotated such that x1 becomes the transverse component, x2 becomes radial, and x3 becomes longitudinal.

If an unite vector of three components that defines the direction of x3 direction (longitudinal) is give, it will turn the vector into a SphericalCoordinate object and calles the related rotate with it.

If a float number is given, it will rotate the horizontal components by this much angle in radians.

Parameters:

data (Seismogram) – data object to be rotated.
rotate_param (see above for details.) – the parameter that defines the rotation.
object_history – True to preserve the processing history. For details, refer to mspass_func_wrapper.
alg_name (str) – alg_name is the name the func we are gonna save while preserving the history.
alg_id (bson.objectid.ObjectId) – alg_id is a unique id to record the usage of func while preserving the history.
dryrun – True for dry-run, which return “OK”. Used in the mspass_func_wrapper.
inplace_return – True to return data in mspass_func_wrapper. This is necessary to be used in mapreduce.
function_return_key – Some functions one might want to wrap with this decorator return something that is appropriate to save as Metadata. If so, use this argument to define the key used to set that field in the data that is returned.

mspasspy.algorithms.basic.rotate_to_standard(data, *args, object_history=False, alg_name=None, alg_id=None, dryrun=False, inplace_return=True, function_return_key=None, **kwargs)[source]

Apply inverse transformation matrix to return data to cardinal direction components.

It is frequently necessary to make certain a set of three component data are oriented to the standard reference frame (EW, NS, Vertical). This function does this. For efficiency it checks the components_are_cardinal variable and does nothing if it is set true. Otherwise, it applies the inverse transformation and then sets this variable true. Note even if the current transformation matrix is not orthogonal it will be put back into cardinal coordinates.

If inversion of the transformation matrix is not possible (e.g. two components are colinear) an error thrown by the C++ function is caught, posted to elog, and the datum return will be marked dead.

Parameters:

data (Seismogram) – data object to be rotated.
object_history – True to preserve the processing history. For details, refer to mspass_func_wrapper.
alg_name (str) – alg_name is the name the func we are gonna save while preserving the history.
alg_id (bson.objectid.ObjectId) – alg_id is a unique id to record the usage of func while preserving the history.
dryrun – True for dry-run, which return “OK”. Used in the mspass_func_wrapper.
inplace_return – True to return data in mspass_func_wrapper. This is necessary to be used in mapreduce.
function_return_key – Some functions one might want to wrap with this decorator return something that is appropriate to save as Metadata. If so, use this argument to define the key used to set that field in the data that is returned.

Exception:

MsPASSError thrown if the an inversion of the transformation matrix is required and that matrix is singular. This can happen if the transformation matrix is incorrectly defined or the actual data are coplanar.

mspasspy.algorithms.basic.rtoa(data, *args, object_history=False, alg_name=None, alg_id=None, dryrun=False, inplace_return=True, function_return_key=None, **kwargs)[source]

Relative to absolute time conversion.

Sometimes we want to convert data from relative time to to an UTC time standard. An example would be converting segy shot data to something that could be processed like earthquake data in a css3.0 database. This function returns data previously converted to relative back to UTC using the internally stored time shift attribute.

Parameters:

data (either TimeSeries or Seismogram) – data object to be converted.
object_history – True to preserve the processing history. For details, refer to mspass_func_wrapper.
alg_name (str) – alg_name is the name the func we are gonna save while preserving the history.
alg_id (bson.objectid.ObjectId) – alg_id is a unique id to record the usage of func while preserving the history.
dryrun – True for dry-run, which return “OK”. Used in the mspass_func_wrapper.
inplace_return – True to return data in mspass_func_wrapper. This is necessary to be used in mapreduce.
function_return_key – Some functions one might want to wrap with this decorator return something that is appropriate to save as Metadata. If so, use this argument to define the key used to set that field in the data that is returned.

mspasspy.algorithms.basic.transform(data, matrix, *args, object_history=False, alg_name=None, alg_id=None, dryrun=False, inplace_return=True, function_return_key=None, **kwargs)[source]

Applies an arbitrary transformation matrix to the data.

i.e. after calling this function the data will have been multiplied by the matrix and the transformation matrix will be updated. The later allows cascaded transformations to data.

Parameters:

data (Seismogram) – data object to be transformed.
matrix (numpy.array) – a 3x3 matrix that defines the transformation.
object_history – True to preserve the processing history. For details, refer to mspass_func_wrapper.
alg_name (str) – alg_name is the name the func we are gonna save while preserving the history.
alg_id (bson.objectid.ObjectId) – alg_id is a unique id to record the usage of func while preserving the history.
dryrun – True for dry-run, which return “OK”. Used in the mspass_func_wrapper.
inplace_return – True to return data in mspass_func_wrapper. This is necessary to be used in mapreduce.
function_return_key – Some functions one might want to wrap with this decorator return something that is appropriate to save as Metadata. If so, use this argument to define the key used to set that field in the data that is returned.

mspasspy.algorithms.basic.transform_to_LQT(data, seaz_key='seaz', ema_key='ema', phi=None, theta=None, angle_units='degrees', object_history=False, alg_name=None, alg_id=None, dryrun=False, inplace_return=False, function_return_key=None)[source]

Applies coordinate transform to LQT version of ray coordinates.

LQT is an orthogonal coordinate transformation for Seismogram objects that cause the output data to have x1-Longitudinal (positive up normally set to the predicted emergence angle of P particle motion), x2 - Q a rotated radial direction (in propagation direction but tilted by theta). and x3 - transverse (T) in direction to define a right handed coordinate system. The function produces this transformation by the product f three transformation:

Uses rotate_to_standard to assure we start from cardinal directions.

Transformation to what might be called TQL using the Seismogram C++ method rotate using a SphericalCoordinate definition.

Transformation to LQT to rearrange the order of the Seismogram data matrix (also requires a sign change on T to keep the output right handed)

The form of the transformation can be specified in one of two completely different ways. By default the function attempts to extract the back azimuth from station to event with using the metadata key defined by ‘seaz_key’ (default ‘seaz’) and the P emergence angle with the metadata key defined by the ‘ema_key’ argument (default ‘ema’). If the arguments ‘phi’ and ‘theta’ are defined they are assumed angles to be used to defined the transformation and no attempt will be made to fetch the value from Metaata (Rarely a good idea with ensemble but can be useful for atomic data. Using Metadata is strongly preferred because it also preserves what the angles used where. They are not if the argument approach is used.) Metadata values are required to be in degree units. If the argument approach is used radian values can to used if you also set the argument “angle_units” to “radians”.

Note the operation handles the singular case of theta==0.0 where it simply uses the phi value and rotates the coordinates in a variant of RTZ (variant because order is different). :param seaz_key: key to use to fetch back azimuth value (assumed degrees always) :type key: string (default “seaz”) :param ema_key: key to use to fetch emergence angle defining L direction relative to local vertical. :type ema_key: string (defautl “ema”) :param phi: angle to rotate around vertical to define the transformation (positive anticlockwise convention NOT azimuth convention) Default is None which means ignore this parameter and use key. Setting this value to something other than None causes the Metadata fetch method to be overridden. WARNING: this is not backazimuth but angle relative to E direction. Note that is not at all what is expected when using a Metadata key :type phi: float :param theta: angle relative to local vertical for defining the L coordinate direction. It is the same as theta in spherical coordinates with emergence angle pointing upward. Default is None which causes the algorithm to automatically assume the Metadata key method should be used. :type theta: float :param angle_units: should be either ‘degrees’ (default) or ‘radians’. An invalid value will be treated as an attempt to switch to radians but will generate an elog warning message. This argument is ignored unless phi is not null (None type) :type angle_units: string :param alg_name: alg_name is the name the func we are gonna save while preserving the history. :type alg_name: str :param alg_id: alg_id is a unique id to record the usage of func while preserving the history. :type alg_id: bson.objectid.ObjectId :param dryrun: True for dry-run, which return “OK”. Used in the mspass_func_wrapper. :param inplace_return: True to return data in mspass_func_wrapper. This is necessary to be used in mapreduce. :param function_return_key: Some functions one might want to wrap with this decorator

return something that is appropriate to save as Metadata. If so, use this argument to define the key used to set that field in the data that is returned.

return:

transformed version of input. For ensembles the entire ensemble

is transformed.

mspasspy.algorithms.basic.transform_to_RTZ(data, key='seaz', phi=None, angle_units='degrees', key_is_backazimuth=True, object_history=False, alg_name=None, alg_id=None, dryrun=False, inplace_return=False, function_return_key=None)[source]

Applies coordinate transform to RTZ version of ray coordinates.

RTZ is the simplest transformation for three component data to what is commonly called ray coordinates. R-radial (SV), T-transverse (SH). and Z=vertical (bad measure of longitudinal). The function first forces the data to cardinal direction using rotate_to_standard and then rotates the coordinates around the vertical axis. The rotation can be defined in one of two ways. The default behavior is to attempt to extract the back azimuth from the station to the source with the key defined by the “key” argument (defaults to ‘seaz’). That behavior will be overriden if the “phi” kwarg value is set. Phi is assumed to be the angle to rotate the coordinates using the math convention for the phi angle with positive anticlockwise. The units of the value retrieved with the key argument or the value passed via the phi argument are by default assumed to be in degrees. If using the phi argument you specify the angle in radians if you also set the “angle_units” argument to “radians”.

Parameters:

data (Seismogram or SeismogramEnsemble.) – data object to be transformed
key (string) – key to use to fetch back azimuth value (assumed degrees always)
phi – angle to rotate around vertical to define the transformation

(positive anticlockwise convention NOT azimuth convention) Default is None which means ignore this parameter and use key. Setting this value to something other than None causes the key method to be overridden. :type phi: float :param angle_units: should be either ‘degrees’ (default) or ‘radians’. An invalid value will be treated as an attempt to switch to radians but will generate an elog warning message. This argument is ignored unless phi is not null (None type) :type angle_units: string :param key_is_backazimuth: boolean that when True (default) assumes the angle extrated with the key argument value is a backazimuth in degrees. If set False, the angle will be assumed to be a rotation angle in with the anticlockwise positive convention. :param alg_name: alg_name is the name the func we are gonna save while preserving the history. :type alg_name: str :param alg_id: alg_id is a unique id to record the usage of func while preserving the history. :type alg_id: bson.objectid.ObjectId :param dryrun: True for dry-run, which return “OK”. Used in the mspass_func_wrapper. :param inplace_return: True to return data in mspass_func_wrapper. This is necessary to be used in mapreduce. :param function_return_key: Some functions one might want to wrap with this decorator

return something that is appropriate to save as Metadata. If so, use this argument to define the key used to set that field in the data that is returned.

Returns:: transformed version of input. For ensembles the entire ensemble

is transformed.

mspasspy.algorithms.basic.vector_taper(data, taper_array, *args, object_history=False, alg_name=None, alg_id=None, dryrun=False, inplace_return=True, function_return_key=None, **kwargs)[source]

Apply a general taper defined by a vector to the data object.

This method provides a simple way to build a taper from a set of uniformly spaced points. The apply methods will dogmatically only accept input data of the same length as the taper defined in the operator.

Parameters:

data (either TimeSeries or Seismogram) – data object to be processed.
taper_array (numpy.array) – the array that defines the taper
object_history – True to preserve the processing history. For details, refer to mspass_func_wrapper.
alg_name (str) – alg_name is the name the func we are gonna save while preserving the history.
alg_id (bson.objectid.ObjectId) – alg_id is a unique id to record the usage of func while preserving the history.
dryrun – True for dry-run, which return “OK”. Used in the mspass_func_wrapper.
inplace_return – True to return data in mspass_func_wrapper. This is necessary to be used in mapreduce.
function_return_key – Some functions one might want to wrap with this decorator return something that is appropriate to save as Metadata. If so, use this argument to define the key used to set that field in the data that is returned.

bundle

This is a set of thin wrappers for C++ code to do a process we call “bundling”. That is, a Seismogram object can be constructed from 3 TimeSeries objects spanning a common time period and having the same sample rate but with orientation pointing in 3 linearly independent directions. There are a lot of complexities to assembling such data and manipulating the data objects, which is why the process was implemented in C++. The algorithms here are not completely generic and will likely need additions at some future data for nonstandard data. “standard” in this context means passive array data archived in the SEED (Standard Earthquake Exchange Data) format (aka miniseed which is a standard subset of seed used for archives). The algorithms here depend upon seismic channels being uniquely defined by four metadata keys: net, sta, chan, and loc. Bundles are formed by sorting data in the following key order: net, sta, loc, chan. The algorithms here are further limited to applications to ensembles that are the generalization of a reflection seismology “shot gather”. That means an implict assumption is the ensembles contain data assembled from one and only one event so there is a one-to-one relationship between each channel and an event tag. if data from multiple events or something other than a shot gather are to be handled used the BundleGroup function station by station sorting the inputs by some other method to create triples of channels that can be merged into one Seismogram object.

Created on Mon Jan 11 05:34:10 2021

@author: pavlis

mspasspy.algorithms.bundle.BundleSEEDGroup(d, i0=0, iend=2)[source]

Combine a grouped set of TimeSeries into one Seismogram.

A Seismogram object is a bundle of TimeSeries objects that define a nonsingular tranformation matrix that can be used to reconstruct vector group motion. That requires three TimeSeries objects that have define directions that are linearly independent. This function does not directly test for linear independence but depends upon channel codes to assemble one or more bundles needed to build a Seismogram. The algorithm used here is simple and ONLY works if the inputs have been sorted so the channels define a group of three unique channel codes. For example,

HHE, HHN, HHZ

would form a typical seed channel grouping.

The function will attempt to handle duplicates. By that I mean if the group has two of the same channel code like these sequences:

HHE, HHE, HHN, HHZ or HHE, HHN, HHN, HHZ, HHZ

If the duplicates are pure duplicates there is no complication and the result will be clean. If the time spans of the duplicate channels are different the decision of which to use keys on a simple idea that is most appropriate for data assembled by event with mistakes in associations. That is, it attempts to scans the group for the earliest start time. When duplicates are found it uses the one with a start time closest to the minimum as the one merged to make the output Seismogram.

The output will be marked as dead data with no valid data in one of two conditions: (1) less than 3 unique channel names or (2) more than three inputs with an inconsistent set of SEED names. That “inconsistent” test is obscure and yet another example that SEED is a four letter word. Commentary aside, the rules are:

The net code must be defined and the same in all TimeSeries passed

The station (sta) code must also be the same for all inputs

Similarly the loc code must be the same in all inputs.

Finally, there is a more obscure test on channel names. They must

all have the same first two characters. That is, BHE, BHN, BHN, BHZ is ok but BHE, BHN, BHZ, HHE will cause an immediate exit with no attempt to resolve the ambiguity - that is viewed a usage error in defining the range of the bundle.

In all cases where the bundling is not possible the function does not throw an exception but does four things:

Merges the Metadata of all inputs (uses the += operator so only the last values of duplicate keys will be preserved in the return)

If ProcessingHistory is defined in the input they history records are posted to the returned Seismogram using as if the data were live but the number of input will always be a number different from 3.

The return is marked dead.

The function posts a (hopefully) informative message to elog of the returned Seismogram.

ProcessingHistory is handled internally by this function. If all the components in a group have a nonempty ProcessingHistory the data to link the outputs to the inputs will be posted to ProcessingHistory.

Parameters:: d – This is assumed to be an array like object of TimeSeries data

that are to be used to build the Seismogram objects. They must be sorted as described above or the algorithm will fail. Two typical array like objects to use are the member attribute of a TimeSeriesEnsemble or a python array constructed from a (sorted) collection of TimeSeries objects. :param i0: starting array position for constructing output(s). The default is 0 which would be the normal request for an full ensemble or a single grouping assembled by some other mechanism. A nonzero is useful to work through a larger container one Seismogram at a time. :param iend: end array position. The function will attempt to assemble one or more Seismograms from TimeSeries in the range d[i0] to d[iend]. Default is 2 for a single Seismogram without duplicates.

mspasspy.algorithms.bundle.bundle_seed_data(ensemble)[source]

This function can be used to take an (unordered) input ensemble of TimeSeries objects generated from miniseed data and produce an output ensemble of Seismograms produced by bundles linked to the seed name codes net, sta, chan, and loc. An implicit assumption of the algorithm used here is that the data are a variant of a shot gather and the input ensemble defines one net:sta:chan:loc:time_interval for each record that is to be bundled. It can only properly handle pure duplicates for a given net:sta:chan:loc combination. (i.e. if the input has the same TimeSeries defined by net:sta:chan:loc AND a common start and end time). Data with gaps broken into multiple net:sta:chan:loc TimeSeries with different start and end times will produce incomplete results. That is, Seismograms in the output associated with such inputs will either be killed with an associated error log entry or in the best case truncated to the overlap range of one of the segments with the gap(s) between.

Irregular start times of any set of TimeSeries forming a single bundle are subject to the same truncation or discard rules described in the related function Bundle3C.

Note there is not guarantee the Seismogram objects returned will be in standard coordinates. In fact, they will never be with standard channel names because of the internal sorting. It would normally be highly recommended the user call the rotate_to_standard method on each Seismogram before any use.

Parameters:: ensemble – is the input ensemble of TimeSeries to be processed.
Returns:: ensemble of Seismogram objects made by bundling input data
Return type:: SeismogramEnsemble
Exception:: Can throw a MsPASSError for a number of conditions.

Caller should be enclosed in a handler if run on a large data set.

edit

class mspasspy.algorithms.edit.Add(key, value_to_add=1)[source]

Bases: MetadataOperator

Used to implement += operator on a specified Metadata key. Example: to add 2 to data, d, with key=’icdp’ could use this

op = Add(‘icdp’,2) d = op.apply(d)

apply(d, apply_to_members=False)[source]

Implementations are required to implement this method. It must accept a MsPASS data object, d, (TimeSeries, Seismogram, TimeSeriesEnsemble, or SeimogramEnsemble) and apply the operator the implementation defines to the Metadata of d. It should throw a MsPASSError exception if d is not one of the expected data types. It should, on the other hand, handle d==None gracefully and do nothing if d is None. Ensemble handlers need to deal with an ambiguity in which metadata the operation refers to. Both the overall ensemble container and individual members have (normally independent) Metadata containers. Implementations of this method should have a apply_to_members argument as in the signature for this base class. When True the editing is done by a loop over all members while if False (which should be the default) the ensemble’s container should be used.

This method must return an (potentially but not guarnateed) edited version of d. That is essential to allow this method to be used as the function in a parallel map operator.

If a required metadata key is missing this method should do nothing unless a verbose flag is set. In that case it should log that as an error. Best practice for any use of operators based on this base class is to apply a kill operator for (MetadataDefined) on all data before running a calculator. That way you can guarantee the data needed for all operations is valid before trying to do calculations on on required metadata. There is some overhead in validating metadata so all implementations should include use of “fast_mode”. When set true the safeties will be bypassed for speed. That includes at least the two requried methods “check_keys” and “check_operation”. Implementations may add other safties.

check_keys(d)[source]: checks that the key defined for the operator is defined for d. Returns true if is defined and false if not. the method assumes d is a valid child of Metadata so the is_defined method will not generate an exception. that means this method should ALWAYS be called after a test with _input_is_valid.

check_operation(d)[source]: This method checks to make sure the value works with the operation required. It provides a (required by the base class) standardization of testing for validity of the operation.

class mspasspy.algorithms.edit.Add2(key0, key1, key2)[source]

Bases: MetadataOperator

Used to implement + operator that adds two Metadata attributes together. The attributes are feteched with two keys set when the operator is constructed. Let a be the value in a datum associated with key2 (arg1 of constructor) and b be the value associated with key2 (arg2 of constructor). The apply method of this class computes a+b and sets the Metadata attribute defined by key0 (arg0 of constructor) to that value (a+b) i.e. d[key0] = d[key1] + d[key2]

Note key0 can be the same as either key1 or key2. The contents of the left hand side (key0) are always set by this operator unless that input was previously marked dead. Further key1 and key2 can be the same although it is hard to conceive how that could be useful.

Example: to compute ix as the d[‘sx’] + d[‘chan’] use

op = Add2(‘ix1’,’sx’,’chan’) d = op.apply(d)

apply(d, apply_to_members=False)[source]

Implementations are required to implement this method. It must accept a MsPASS data object, d, (TimeSeries, Seismogram, TimeSeriesEnsemble, or SeimogramEnsemble) and apply the operator the implementation defines to the Metadata of d. It should throw a MsPASSError exception if d is not one of the expected data types. It should, on the other hand, handle d==None gracefully and do nothing if d is None. Ensemble handlers need to deal with an ambiguity in which metadata the operation refers to. Both the overall ensemble container and individual members have (normally independent) Metadata containers. Implementations of this method should have a apply_to_members argument as in the signature for this base class. When True the editing is done by a loop over all members while if False (which should be the default) the ensemble’s container should be used.

This method must return an (potentially but not guarnateed) edited version of d. That is essential to allow this method to be used as the function in a parallel map operator.

If a required metadata key is missing this method should do nothing unless a verbose flag is set. In that case it should log that as an error. Best practice for any use of operators based on this base class is to apply a kill operator for (MetadataDefined) on all data before running a calculator. That way you can guarantee the data needed for all operations is valid before trying to do calculations on on required metadata. There is some overhead in validating metadata so all implementations should include use of “fast_mode”. When set true the safeties will be bypassed for speed. That includes at least the two requried methods “check_keys” and “check_operation”. Implementations may add other safties.

check_keys(d)[source]: checks that the keys required for the operator are defined for d. If either required key are missing from d return False. Return True if both are set in d. The method assumes d is a valid child of Metadata so that the is_defined method will not generate an exception. that means this method should ALWAYS be called after a test with _input_is_valid.

check_operation(d)[source]: This method checks to make sure the value works with the operation required. It provides a (required by the base class) standardization of testing for validity of the operation.

class mspasspy.algorithms.edit.ChangeKey(oldkey, newkey, erase_old=True)[source]

Bases: MetadataOperator

apply(d, apply_to_members=False, fast_mode=False, verbose=False)[source]

Implementations are required to implement this method. It must accept a MsPASS data object, d, (TimeSeries, Seismogram, TimeSeriesEnsemble, or SeimogramEnsemble) and apply the operator the implementation defines to the Metadata of d. It should throw a MsPASSError exception if d is not one of the expected data types. It should, on the other hand, handle d==None gracefully and do nothing if d is None. Ensemble handlers need to deal with an ambiguity in which metadata the operation refers to. Both the overall ensemble container and individual members have (normally independent) Metadata containers. Implementations of this method should have a apply_to_members argument as in the signature for this base class. When True the editing is done by a loop over all members while if False (which should be the default) the ensemble’s container should be used.

This method must return an (potentially but not guarnateed) edited version of d. That is essential to allow this method to be used as the function in a parallel map operator.

If a required metadata key is missing this method should do nothing unless a verbose flag is set. In that case it should log that as an error. Best practice for any use of operators based on this base class is to apply a kill operator for (MetadataDefined) on all data before running a calculator. That way you can guarantee the data needed for all operations is valid before trying to do calculations on on required metadata. There is some overhead in validating metadata so all implementations should include use of “fast_mode”. When set true the safeties will be bypassed for speed. That includes at least the two requried methods “check_keys” and “check_operation”. Implementations may add other safties.

check_keys(d, apply_to_members)[source]: All implementation should implement this checker even if all it contains is a pass. It should validate the keys are defined in data to be handled in the apply method. An early call in apply should always be to call this method.

check_operation(d)[source]: there is no operation on a ChangeKey so this method does nothing. Because it is an abstract base we have to have this stub.

class mspasspy.algorithms.edit.Divide(key, value_to_divide=1)[source]

Bases: MetadataOperator

Used to implement /= operator on a specified Metadata key. Example: to divide metadata in, d, with key=’Pamp’ by 2.0 you could use this

op = Divide(‘Pamp’,2.-) d = op.apply(d)

apply(d, apply_to_members=False)[source]

Implementations are required to implement this method. It must accept a MsPASS data object, d, (TimeSeries, Seismogram, TimeSeriesEnsemble, or SeimogramEnsemble) and apply the operator the implementation defines to the Metadata of d. It should throw a MsPASSError exception if d is not one of the expected data types. It should, on the other hand, handle d==None gracefully and do nothing if d is None. Ensemble handlers need to deal with an ambiguity in which metadata the operation refers to. Both the overall ensemble container and individual members have (normally independent) Metadata containers. Implementations of this method should have a apply_to_members argument as in the signature for this base class. When True the editing is done by a loop over all members while if False (which should be the default) the ensemble’s container should be used.

This method must return an (potentially but not guarnateed) edited version of d. That is essential to allow this method to be used as the function in a parallel map operator.

If a required metadata key is missing this method should do nothing unless a verbose flag is set. In that case it should log that as an error. Best practice for any use of operators based on this base class is to apply a kill operator for (MetadataDefined) on all data before running a calculator. That way you can guarantee the data needed for all operations is valid before trying to do calculations on on required metadata. There is some overhead in validating metadata so all implementations should include use of “fast_mode”. When set true the safeties will be bypassed for speed. That includes at least the two requried methods “check_keys” and “check_operation”. Implementations may add other safties.

check_keys(d)[source]: checks that the key defined for the operator is defined for d. Returns true if is defined and false if not. the method assumes d is a valid child of Metadata so the is_defined method will not generate an exception. that means this method should ALWAYS be called after a test with _input_is_valid.

check_operation(d)[source]: This method checks to make sure the value works with the operation required. It provides a (required by the base class) standardization of testing for validity of the operation.

class mspasspy.algorithms.edit.Divide2(key0, key1, key2)[source]

Bases: MetadataOperator

Used to implement / operator that divides two Metadata attributes. The attributes are feteched with two keys set when the operator is constructed. Let a be the value in a datum associated with key2 (arg1 of constructor) and b be the value associated with key2 (arg2 of constructor). The apply method of this class computes a/b and sets the Metadata attribute defined by key0 (arg0 of constructor) to that value (a/b) i.e. d[key0] = d[key1] / d[key2]

Note key0 can be the same as either key1 or key2. The contents of the left hand side (key0) are always set by this operator unless that input was previously marked dead. Further key1 and key2 can be the same although it is hard to conceive how that could be useful.

Example: to compute ix as the d[‘sx’] / d[‘chan’] use

op = Divide2(‘ix1’,’sx’,’chan’) d = op.apply(d)

apply(d, apply_to_members=False)[source]

Implementations are required to implement this method. It must accept a MsPASS data object, d, (TimeSeries, Seismogram, TimeSeriesEnsemble, or SeimogramEnsemble) and apply the operator the implementation defines to the Metadata of d. It should throw a MsPASSError exception if d is not one of the expected data types. It should, on the other hand, handle d==None gracefully and do nothing if d is None. Ensemble handlers need to deal with an ambiguity in which metadata the operation refers to. Both the overall ensemble container and individual members have (normally independent) Metadata containers. Implementations of this method should have a apply_to_members argument as in the signature for this base class. When True the editing is done by a loop over all members while if False (which should be the default) the ensemble’s container should be used.

This method must return an (potentially but not guarnateed) edited version of d. That is essential to allow this method to be used as the function in a parallel map operator.

If a required metadata key is missing this method should do nothing unless a verbose flag is set. In that case it should log that as an error. Best practice for any use of operators based on this base class is to apply a kill operator for (MetadataDefined) on all data before running a calculator. That way you can guarantee the data needed for all operations is valid before trying to do calculations on on required metadata. There is some overhead in validating metadata so all implementations should include use of “fast_mode”. When set true the safeties will be bypassed for speed. That includes at least the two requried methods “check_keys” and “check_operation”. Implementations may add other safties.

check_keys(d)[source]: checks that the keys required for the operator are defined for d. If either required key are missing from d return False. Return True if both are set in d. The method assumes d is a valid child of Metadata so that the is_defined method will not generate an exception. that means this method should ALWAYS be called after a test with _input_is_valid.

check_operation(d)[source]: This method checks to make sure the value works with the operation required. It provides a (required by the base class) standardization of testing for validity of the operation.

class mspasspy.algorithms.edit.Executioner[source]

Bases: ABC

Abstract base class for family of python classes used for killing mspass data objects failing to pass a particular metric. It is made abstract to define required methods a particular instance must create. As in any good OOP that also means subclasses can add additional methods. This class should be used only as base class as it has no functionality by itself.

edit_ensemble_members(ensemble)[source]: Subclasses should call this method if the input data are an ensemble. A trick of inheritance allows the algorithm of self to then be applied to whole ensemble. Putting this in the base class avoids the duplication of duplicate code in all subclasses.

abstract kill_if_true(d)[source]: This method should run a test on d that will call the kill method on MsPASS atomic object d if the test defined by the implementation fails. This is the main working method of this class of function objects.

log_kill(d, testname, message, severity=<ErrorSeverity.Informational: 5>)[source]

This base class method is used to standardize the error logging functionality of all Executioners. It writes a standardized message to simplify writing of subclasses - they need only define the testname (normally the name of the subclass) and format a specific message to be posted.

Note most subclasses will may want to include a verbose option (or the reciprocal silent) and only write log messages when verbose is set true.

Parameters:

d – MsPASS data object to which elog message is to be written.
testname – is the string assigned to the “algorithm” field

of the message posted to d.elog. :param message: specialized message to post - this string is added to an internal generic message. :param severity: ErrorSeverity to assign to elog message (See ErrorLogger docstring). Default is Informational

class mspasspy.algorithms.edit.FiringSquad(executioner_list)[source]

Bases: Executioner

Used to apply multiple Executioners in a single pass - hence the name FiringSquare image; facing more than one thing that could kill you. The implementation in kill_if_true iterates through a list of Executioners. Once the datum is killed it is immediately returned. If the Executions are running in verbose mode that means some tests can shadow others. It is like a firing squad where the guns are fired in sequence and the body is removed as soon as there is a hit. The victim walks away only if all the guns miss.

Note the class has a += operator to allow appending additional tests to the chain.

kill_if_true(d, apply_to_members=False)[source]: Implementation of base class method. In this case failure is defined as not passing one of the set of tests loaded when the object was created. As noted earlier the tests are performed in the same order they were passed to the constructor of added on with the += operator. :param d: is a mspass data object to be checked

class mspasspy.algorithms.edit.IntegerDivide(key, value=1)[source]

Bases: MetadataOperator

Used to implement // operator on a specified Metadata key. The // operator is a bit obscure but it implements the common need to truncate a division result to an integer. This will work on floats but the result will always be close to and integer value as if the operation were done with integers. Note also IntegerDivide is the complement to Mod which returns the remainder of such a division.

Example: to apply integer division to metadata in, d, with key=’icdp’ by 5 you could use this

op = IntegerDivide(‘icdp’,5) d = op.apply(d)

apply(d, apply_to_members=False)[source]

Implementations are required to implement this method. It must accept a MsPASS data object, d, (TimeSeries, Seismogram, TimeSeriesEnsemble, or SeimogramEnsemble) and apply the operator the implementation defines to the Metadata of d. It should throw a MsPASSError exception if d is not one of the expected data types. It should, on the other hand, handle d==None gracefully and do nothing if d is None. Ensemble handlers need to deal with an ambiguity in which metadata the operation refers to. Both the overall ensemble container and individual members have (normally independent) Metadata containers. Implementations of this method should have a apply_to_members argument as in the signature for this base class. When True the editing is done by a loop over all members while if False (which should be the default) the ensemble’s container should be used.

This method must return an (potentially but not guarnateed) edited version of d. That is essential to allow this method to be used as the function in a parallel map operator.

If a required metadata key is missing this method should do nothing unless a verbose flag is set. In that case it should log that as an error. Best practice for any use of operators based on this base class is to apply a kill operator for (MetadataDefined) on all data before running a calculator. That way you can guarantee the data needed for all operations is valid before trying to do calculations on on required metadata. There is some overhead in validating metadata so all implementations should include use of “fast_mode”. When set true the safeties will be bypassed for speed. That includes at least the two requried methods “check_keys” and “check_operation”. Implementations may add other safties.

check_keys(d)[source]: checks that the key defined for the operator is defined for d. Returns true if is defined and false if not. the method assumes d is a valid child of Metadata so the is_defined method will not generate an exception. that means this method should ALWAYS be called after a test with _input_is_valid.

check_operation(d)[source]: This method checks to make sure the value works with the operation required. It provides a (required by the base class) standardization of testing for validity of the operation.

class mspasspy.algorithms.edit.IntegerDivide2(key0, key1, key2)[source]

Bases: MetadataOperator

Used to implement // operator between two Metadata attributes. The attributes are feteched with two keys set when the operator is constructed. Let a be the value in a datum associated with key2 (arg1 of constructor) and b be the value associated with key2 (arg2 of constructor). The apply method of this class computes a//b and sets the Metadata attribute defined by key0 (arg0 of constructor) to that value (a+b) i.e. d[key0] = d[key1] // d[key2]

Note key0 can be the same as either key1 or key2. The contents of the left hand side (key0) are always set by this operator unless that input was previously marked dead. Further key1 and key2 can be the same although it is hard to conceive how that could be useful.

Example: to compute ix as the d[‘sx’] // d[‘chan’] use

op = IntegerDivide2(‘ix1’,’sx’,’chan’) d = op.apply(d)

apply(d, apply_to_members=False)[source]

Implementations are required to implement this method. It must accept a MsPASS data object, d, (TimeSeries, Seismogram, TimeSeriesEnsemble, or SeimogramEnsemble) and apply the operator the implementation defines to the Metadata of d. It should throw a MsPASSError exception if d is not one of the expected data types. It should, on the other hand, handle d==None gracefully and do nothing if d is None. Ensemble handlers need to deal with an ambiguity in which metadata the operation refers to. Both the overall ensemble container and individual members have (normally independent) Metadata containers. Implementations of this method should have a apply_to_members argument as in the signature for this base class. When True the editing is done by a loop over all members while if False (which should be the default) the ensemble’s container should be used.

This method must return an (potentially but not guarnateed) edited version of d. That is essential to allow this method to be used as the function in a parallel map operator.

If a required metadata key is missing this method should do nothing unless a verbose flag is set. In that case it should log that as an error. Best practice for any use of operators based on this base class is to apply a kill operator for (MetadataDefined) on all data before running a calculator. That way you can guarantee the data needed for all operations is valid before trying to do calculations on on required metadata. There is some overhead in validating metadata so all implementations should include use of “fast_mode”. When set true the safeties will be bypassed for speed. That includes at least the two requried methods “check_keys” and “check_operation”. Implementations may add other safties.

check_keys(d)[source]: checks that the keys required for the operator are defined for d. If either required key are missing from d return False. Return True if both are set in d. The method assumes d is a valid child of Metadata so that the is_defined method will not generate an exception. that means this method should ALWAYS be called after a test with _input_is_valid.

check_operation(d)[source]: This method checks to make sure the value works with the operation required. It provides a (required by the base class) standardization of testing for validity of the operation.

class mspasspy.algorithms.edit.MetadataDefined(key, verbose=False)[source]

Bases: Executioner

Implementation of Executioner using an existence test. The constructor loads only a key string. The test for the kill_if_true method is then simply for the existence of a value associated with the loaded key. Data will be killed if the defined key exists in the Metadata (header).

kill_if_true(d, apply_to_members=False)[source]

Implementation of this abstract method for this tester. Kills d if self.key is defined for this datum

Returns a (potentially edited) copy of the input to allow use in a parallel map operation.

class mspasspy.algorithms.edit.MetadataEQ(key, value, verbose=False)[source]

Bases: Executioner

Implementation of Executioner using an equality test of a Metadata attribute. Both the metadata key and the threshold for the kill test are set on creation. This implementation should work on any value pairs for which the == operator in python works. That is true for pairs of numeric types, for example, but will fail if one of the pair is a string (an error anyway for any rational use of this). It should also work for any pair of pyobjects for which operator >= is defined, although anything nonstandard should be tested carefully before using this class for editing data with such nonstandard types. This was mostly intended for simple numeric attributes. It can be used for booleans even though few would write an if statement testing if two booleans were equal.

kill_if_true(d, apply_to_members=False)[source]

Implementation of this abstract method for this tester. Kills d if the d[self.key] == value stored with the class.

Returns a (potentially edited) copy of the input to allow use in a parallel map operation.

class mspasspy.algorithms.edit.MetadataGE(key, value, verbose=False)[source]

Bases: Executioner

Implementation of Executioner using a greater than or equal test of a Metadata attribute. Both the metadata key and the threshold for the kill test are set on creation. This implementation should work on any value pairs for which the >= operator in python works. That is true for pairs of numeric types, for example, but will fail if one of the pair is a string (an error anyway for any rational use of this). It should also work for any pair of pyobjects for which operator >= is defined, although anything nonstandard should be tested carefully before using this class for editing data with such nonstandard types. This was mostly intended for simple numeric attributes.

kill_if_true(d, apply_to_members=False)[source]

Implementation of this abstract method for this tester. Kills d if the d[self.key] >= value stored with the class.

Returns a (potentially edited) copy of the input to allow use in a parallel map operation.

class mspasspy.algorithms.edit.MetadataGT(key, value, verbose=False)[source]

Bases: Executioner

Implementation of Executioner using a greater than test of a Metadata attribute. Both the metadata key and the threshold for the kill test are set on creation. This implementation should work on any value pairs for which the > operator in python works. That is true for pairs of numeric types, for example, but will fail if one of the pair is a string (an error anyway for any rational use of this). It should also work for any pair of pyobjects for which operator > is defined, although anything nonstandard should be tested carefully before using this class for editing data with such nonstandard types. This was mostly intended for simple numeric attributes.

kill_if_true(d, apply_to_members=False)[source]

Implementation of this abstract method for this tester. Kills d if the d[self.key] > value stored with the class.

Returns a (potentially edited) copy of the input to allow use in a parallel map operation.

class mspasspy.algorithms.edit.MetadataInterval(key, lower_endpoint, upper_endpoint, use_lower_edge=True, use_upper_edge=True, kill_if_outside=True, verbose=False)[source]

Bases: Executioner

Implementation of Executioner based on a numeric range of values. i.e. it tests if a metadata value is in a range defined by an upper and lower value. Ranges have a minor complication in handling the edge condition: should or should the test not include the edge? Rather than write different functions for the four possible combinations of <=, <, >=, and > that define a range we use constructor arguments (use_lower_edge and use_upper_edge) to turn inclusion of the boundary on and off.

In this context an interval test also has two logical alternatives. One may want to keep data inside an interval (most common and default) or delete data within a specified interval. That logic is controlled by the kill_if_outside boolean

Intervals mostly make sense only for numeric types (int and float), but can be used with strings. In reality this function should work with any object for which the operators >, <, >=, and >= are defined but that is adventure land if you try.

kill_if_true(d, apply_to_members=False)[source]

Implementation of this abstract method for this tester. Kills d if the d[self.key] <= value stored with the class.

Returns a (potentially edited) copy of the input to allow use in a parallel map operation.

class mspasspy.algorithms.edit.MetadataLE(key, value, verbose=False)[source]

Bases: Executioner

Implementation of Executioner using a less than or equal test of a Metadata attribute. Both the metadata key and the threshold for the kill test are set on creation. This implementation should work on any value pairs for which the <= operator in python works. That is true for pairs of numeric types, for example, but will fail if one of the pair is a string (an error anyway for any rational use of this). It should also work for any pair of pyobjects for which operator >= is defined, although anything nonstandard should be tested carefully before using this class for editing data with such nonstandard types. This was mostly intended for simple numeric attributes.

kill_if_true(d, apply_to_members=False)[source]

Implementation of this abstract method for this tester. Kills d if the d[self.key] <= value stored with the class.

Returns a (potentially edited) copy of the input to allow use in a parallel map operation.

class mspasspy.algorithms.edit.MetadataLT(key, value, verbose=False)[source]

Bases: Executioner

Implementation of Executioner using a lessthan test of a Metadata attribute. Both the metadata key and the threshold for the kill test are set on creation. This implementation should work on any value pairs for which the < operator in python works. That is true for pairs of numeric types, for example, but will fail if one of the pair is a string (an error anyway for any rational use of this). It should also work for any pair of pyobjects for which operator < is defined, although anything nonstandard should be tested carefully before using this class for editing data with such nonstandard types. This was mostly intended for simple numeric attributes.

kill_if_true(d, apply_to_members=False)[source]

Implementation of this abstract method for this tester. Kills d if the d[self.key] > value stored with the class.

Returns a (potentially edited) copy of the input to allow use in a parallel map operation.

class mspasspy.algorithms.edit.MetadataNE(key, value, verbose=False)[source]

Bases: Executioner

Implementation of Executioner using an not equal (NE) test of a Metadata attribute. Both the metadata key and the threshold for the kill test are set on creation. This implementation should work on any value pairs for which the != operator in python works. That is true for pairs of numeric types, for example, but will fail if one of the pair is a string (an error anyway for any rational use of this). It should also work for any pair of pyobjects for which operator >= is defined, although anything nonstandard should be tested carefully before using this class for editing data with such nonstandard types. This was mostly intended for simple numeric attributes. It can be used for booleans even though few would write an if statement testing if two booleans were equal.

kill_if_true(d, apply_to_members=False)[source]

Implementation of this abstract method for this tester. Kills d if the d[self.key] != value stored with the class.

Returns a (potentially edited) copy of the input to allow use in a parallel map operation.

class mspasspy.algorithms.edit.MetadataOperator[source]

Bases: ABC

Base class for a set of inline Metadata editors. That is, there are many instances where Metadata attributes need to be altered during a workflow where it is either unnecessary or inappropriate to access the database. e.g. we have found updates in MongoDB can be very slow if done one transaction at a time so it can streamline processing to edit metadata on the fly. This base class defines the API for a suite of tools for editing metadata on the fly.

abstract apply(d, apply_to_members=False, fast_mode=False)[source]

Implementations are required to implement this method. It must accept a MsPASS data object, d, (TimeSeries, Seismogram, TimeSeriesEnsemble, or SeimogramEnsemble) and apply the operator the implementation defines to the Metadata of d. It should throw a MsPASSError exception if d is not one of the expected data types. It should, on the other hand, handle d==None gracefully and do nothing if d is None. Ensemble handlers need to deal with an ambiguity in which metadata the operation refers to. Both the overall ensemble container and individual members have (normally independent) Metadata containers. Implementations of this method should have a apply_to_members argument as in the signature for this base class. When True the editing is done by a loop over all members while if False (which should be the default) the ensemble’s container should be used.

This method must return an (potentially but not guarnateed) edited version of d. That is essential to allow this method to be used as the function in a parallel map operator.

If a required metadata key is missing this method should do nothing unless a verbose flag is set. In that case it should log that as an error. Best practice for any use of operators based on this base class is to apply a kill operator for (MetadataDefined) on all data before running a calculator. That way you can guarantee the data needed for all operations is valid before trying to do calculations on on required metadata. There is some overhead in validating metadata so all implementations should include use of “fast_mode”. When set true the safeties will be bypassed for speed. That includes at least the two requried methods “check_keys” and “check_operation”. Implementations may add other safties.

abstract check_keys(d)[source]: All implementation should implement this checker even if all it contains is a pass. It should validate the keys are defined in data to be handled in the apply method. An early call in apply should always be to call this method.

abstract check_operation(d)[source]: All implementations should implement this method even if they choose to ignore it. It should be used to guarantee the operator the class defines will succeed on the datum sent to the apply method. e.g. any standard arithmetic operations will throw a TypeError if one of the operands is a string. This method should be used to make the operator as bombproof as possible logging a message to the datum rather than aborting if there is an issue. Some classes may want to implement this as pass because it makes no sense - e.g. setting a constant value. Those are the exception, however, so the api dogmatically demands these be implemented even if they do nothing.

edit_ensemble_members(ensemble)[source]: Subclasses should call this method if the input data are an ensemble. A trick of inheritance allows the algorithm of self to then be applied to whole ensemble. Putting this in the base class avoids the duplication of duplicate code in all subclasses.

log_edit(d, testname, message, severity=<ErrorSeverity.Informational: 5>)[source]

This base class method is used to standardize the error logging functionality of all editors. It writes a standardized message to simplify writing of subclasses - they need only define the testname (normally the name of the subclass) and format a specific message to be posted.

Note most subclasses will may want to include a verbose option (or the reciprocal silent) and only write log messages when verbose is set true.

Parameters:

d – MsPASS data object to which elog message is to be written.
testname – is the string assigned to the “algorithm” field

of the message posted to d.elog. :param message: specialized message to post - this string is added to an internal generic message. :param severity: ErrorSeverity to assign to elog message (See ErrorLogger docstring). Default is Informational

class mspasspy.algorithms.edit.MetadataOperatorChain(operator_list)[source]

Bases: MetadataOperator

Used to apply multiple a chain of arithmetic operators to derive computed metadata attributes. Very elaborate calculations can be done through this class by chaining appropriate atomic operators defined elsewhere in the module (i.e. Add, Subtract, etc.). The operation chain is defined by a python list of the atomic operators. When the apply method of this class is called the list of operators are applied sequentially in list order.

Note the class has a += operator to allow appending additional operators to the chain.

apply(d, apply_to_members=False)[source]: Implementation of base class method. In this case failure is defined as not passing one of the set of tests loaded when the object was created. As noted earlier the tests are performed in the same order they were passed to the constructor of added on with the += operator. :param d: is a mspass data object to be checked

check_keys(d)[source]: All implementation should implement this checker even if all it contains is a pass. It should validate the keys are defined in data to be handled in the apply method. An early call in apply should always be to call this method.

check_operation(d)[source]: All implementations should implement this method even if they choose to ignore it. It should be used to guarantee the operator the class defines will succeed on the datum sent to the apply method. e.g. any standard arithmetic operations will throw a TypeError if one of the operands is a string. This method should be used to make the operator as bombproof as possible logging a message to the datum rather than aborting if there is an issue. Some classes may want to implement this as pass because it makes no sense - e.g. setting a constant value. Those are the exception, however, so the api dogmatically demands these be implemented even if they do nothing.

class mspasspy.algorithms.edit.MetadataUndefined(key, verbose=False)[source]

Bases: Executioner

Implementation of Executioner using an nonexistence test. The constructor loads only a key string. The test for the kill_if_true method is then simply for the existence of a value associated with the loaded key. Data will be killed if the defined key does not exists (Undefined) in the Metadata (header).

This class is a useful prefilter to apply to any algorithm that requires a particular metadata attribute. Use FiringSquad to define a chain of required metadata to prefilter data input to such an algorithm.

kill_if_true(d, apply_to_members=False)[source]

Implementation of this abstract method for this tester. Kills d if self.key is not defined for this datum

Returns a (potentially edited) copy of the input to allow use in a parallel map operation.

class mspasspy.algorithms.edit.Mod(key, value=1)[source]

Bases: MetadataOperator

Used to implement % operator on a specified Metadata key. The % operator is a bit obscure but it implements the common need return the remainder of a divide operation. It is commonly used, for example, in cmp processing where survey flag numbers can often be converted to channel numbers for simple multichannel cable geometries.

This operator will work any numeric type but it is most commonly used for integer attributes.

Example: to convert the metadata associated with the key ‘ichan’ that are currently counting by 1 to numbers that cycle from 0 to 23 us this:

op = Mod(‘ix’,24) d = op.apply(d)

apply(d, apply_to_members=False)[source]

Implementations are required to implement this method. It must accept a MsPASS data object, d, (TimeSeries, Seismogram, TimeSeriesEnsemble, or SeimogramEnsemble) and apply the operator the implementation defines to the Metadata of d. It should throw a MsPASSError exception if d is not one of the expected data types. It should, on the other hand, handle d==None gracefully and do nothing if d is None. Ensemble handlers need to deal with an ambiguity in which metadata the operation refers to. Both the overall ensemble container and individual members have (normally independent) Metadata containers. Implementations of this method should have a apply_to_members argument as in the signature for this base class. When True the editing is done by a loop over all members while if False (which should be the default) the ensemble’s container should be used.

This method must return an (potentially but not guarnateed) edited version of d. That is essential to allow this method to be used as the function in a parallel map operator.

If a required metadata key is missing this method should do nothing unless a verbose flag is set. In that case it should log that as an error. Best practice for any use of operators based on this base class is to apply a kill operator for (MetadataDefined) on all data before running a calculator. That way you can guarantee the data needed for all operations is valid before trying to do calculations on on required metadata. There is some overhead in validating metadata so all implementations should include use of “fast_mode”. When set true the safeties will be bypassed for speed. That includes at least the two requried methods “check_keys” and “check_operation”. Implementations may add other safties.

check_keys(d)[source]: checks that the key defined for the operator is defined for d. Returns true if is defined and false if not. the method assumes d is a valid child of Metadata so the is_defined method will not generate an exception. that means this method should ALWAYS be called after a test with _input_is_valid.

check_operation(d)[source]: This method checks to make sure the value works with the operation required. It provides a (required by the base class) standardization of testing for validity of the operation.

class mspasspy.algorithms.edit.Mod2(key0, key1, key2)[source]

Bases: MetadataOperator

Used to implement % operator between two Metadata attributes. The attributes are feteched with two keys set when the operator is constructed. Let a be the value in a datum associated with key2 (arg1 of constructor) and b be the value associated with key2 (arg2 of constructor). The apply method of this class computes a+b and sets the Metadata attribute defined by key0 (arg0 of constructor) to that value (a%b) i.e. d[key0] = d[key1] % d[key2]

Note key0 can be the same as either key1 or key2. The contents of the left hand side (key0) are always set by this operator unless that input was previously marked dead. Further key1 and key2 can be the same although it is hard to conceive how that could be useful.

Example: to compute ix as the d[‘sx’] % d[‘chan’] use

op = Mod2(‘ix1’,’sx’,’chan’) d = op.apply(d)

apply(d, apply_to_members=False)[source]

Implementations are required to implement this method. It must accept a MsPASS data object, d, (TimeSeries, Seismogram, TimeSeriesEnsemble, or SeimogramEnsemble) and apply the operator the implementation defines to the Metadata of d. It should throw a MsPASSError exception if d is not one of the expected data types. It should, on the other hand, handle d==None gracefully and do nothing if d is None. Ensemble handlers need to deal with an ambiguity in which metadata the operation refers to. Both the overall ensemble container and individual members have (normally independent) Metadata containers. Implementations of this method should have a apply_to_members argument as in the signature for this base class. When True the editing is done by a loop over all members while if False (which should be the default) the ensemble’s container should be used.

This method must return an (potentially but not guarnateed) edited version of d. That is essential to allow this method to be used as the function in a parallel map operator.

If a required metadata key is missing this method should do nothing unless a verbose flag is set. In that case it should log that as an error. Best practice for any use of operators based on this base class is to apply a kill operator for (MetadataDefined) on all data before running a calculator. That way you can guarantee the data needed for all operations is valid before trying to do calculations on on required metadata. There is some overhead in validating metadata so all implementations should include use of “fast_mode”. When set true the safeties will be bypassed for speed. That includes at least the two requried methods “check_keys” and “check_operation”. Implementations may add other safties.

check_keys(d)[source]: checks that the keys required for the operator are defined for d. If either required key are missing from d return False. Return True if both are set in d. The method assumes d is a valid child of Metadata so that the is_defined method will not generate an exception. that means this method should ALWAYS be called after a test with _input_is_valid.

check_operation(d)[source]: This method checks to make sure the value works with the operation required. It provides a (required by the base class) standardization of testing for validity of the operation.

class mspasspy.algorithms.edit.Multiply(key, value_to_multiply=1)[source]

Bases: MetadataOperator

Used to implement *= operator on a specified Metadata key. Example: to multiple metadata in, d, with key=’Pamp’ by 2.5 you could use this

op = Multiply(‘Pamp’,2.5) d = op.apply(d)

apply(d, apply_to_members=False)[source]

Implementations are required to implement this method. It must accept a MsPASS data object, d, (TimeSeries, Seismogram, TimeSeriesEnsemble, or SeimogramEnsemble) and apply the operator the implementation defines to the Metadata of d. It should throw a MsPASSError exception if d is not one of the expected data types. It should, on the other hand, handle d==None gracefully and do nothing if d is None. Ensemble handlers need to deal with an ambiguity in which metadata the operation refers to. Both the overall ensemble container and individual members have (normally independent) Metadata containers. Implementations of this method should have a apply_to_members argument as in the signature for this base class. When True the editing is done by a loop over all members while if False (which should be the default) the ensemble’s container should be used.

This method must return an (potentially but not guarnateed) edited version of d. That is essential to allow this method to be used as the function in a parallel map operator.

If a required metadata key is missing this method should do nothing unless a verbose flag is set. In that case it should log that as an error. Best practice for any use of operators based on this base class is to apply a kill operator for (MetadataDefined) on all data before running a calculator. That way you can guarantee the data needed for all operations is valid before trying to do calculations on on required metadata. There is some overhead in validating metadata so all implementations should include use of “fast_mode”. When set true the safeties will be bypassed for speed. That includes at least the two requried methods “check_keys” and “check_operation”. Implementations may add other safties.

check_keys(d)[source]: checks that the key defined for the operator is defined for d. Returns true if is defined and false if not. the method assumes d is a valid child of Metadata so the is_defined method will not generate an exception. that means this method should ALWAYS be called after a test with _input_is_valid.

check_operation(d)[source]: This method checks to make sure the value works with the operation required. It provides a (required by the base class) standardization of testing for validity of the operation.

class mspasspy.algorithms.edit.Multiply2(key0, key1, key2)[source]

Bases: MetadataOperator

Used to implement * operator that multiplies two Metadata attributes together. The attributes are feteched with two keys set when the operator is constructed. Let a be the value in a datum associated with key2 (arg1 of constructor) and b be the value associated with key2 (arg2 of constructor). The apply method of this class computes a*b and sets the Metadata attribute defined by key0 (arg0 of constructor) to that value (a*b) i.e. d[key0] = d[key1] * d[key2]

Note key0 can be the same as either key1 or key2. The contents of the left hand side (key0) are always set by this operator unless that input was previously marked dead. Further key1 and key2 can be the same although it is hard to conceive how that could be useful.

Example: to compute ix as the d[‘sx’] * d[‘chan’] use

op = Multiply2(‘ix1’,’sx’,’chan’) d = op.apply(d)

apply(d, apply_to_members=False)[source]

Implementations are required to implement this method. It must accept a MsPASS data object, d, (TimeSeries, Seismogram, TimeSeriesEnsemble, or SeimogramEnsemble) and apply the operator the implementation defines to the Metadata of d. It should throw a MsPASSError exception if d is not one of the expected data types. It should, on the other hand, handle d==None gracefully and do nothing if d is None. Ensemble handlers need to deal with an ambiguity in which metadata the operation refers to. Both the overall ensemble container and individual members have (normally independent) Metadata containers. Implementations of this method should have a apply_to_members argument as in the signature for this base class. When True the editing is done by a loop over all members while if False (which should be the default) the ensemble’s container should be used.

This method must return an (potentially but not guarnateed) edited version of d. That is essential to allow this method to be used as the function in a parallel map operator.

If a required metadata key is missing this method should do nothing unless a verbose flag is set. In that case it should log that as an error. Best practice for any use of operators based on this base class is to apply a kill operator for (MetadataDefined) on all data before running a calculator. That way you can guarantee the data needed for all operations is valid before trying to do calculations on on required metadata. There is some overhead in validating metadata so all implementations should include use of “fast_mode”. When set true the safeties will be bypassed for speed. That includes at least the two requried methods “check_keys” and “check_operation”. Implementations may add other safties.

check_keys(d)[source]: checks that the keys required for the operator are defined for d. If either required key are missing from d return False. Return True if both are set in d. The method assumes d is a valid child of Metadata so that the is_defined method will not generate an exception. that means this method should ALWAYS be called after a test with _input_is_valid.

check_operation(d)[source]: This method checks to make sure the value works with the operation required. It provides a (required by the base class) standardization of testing for validity of the operation.

class mspasspy.algorithms.edit.SetValue(key, constant_value=0)[source]

Bases: MetadataOperator

Used to set a specified metadata key to a constant value. Note any existing value of Metadata associated with the key defined in the operator always be overwritten.

Example: to set the value of key = ‘a’ to constant 2.0

op = SetValue(‘a’,2.0) d = op.apply(d)

apply(d, apply_to_members=False)[source]

Used to apply this operator to Metadata of a MsPASS data object. Use of decorator adds common MsPASS arguments as call options.

Parameters:: d – datum to which operator is to be applied. d must be a

valid MsPASS data object or this method will throw a fatal MsPASSError exception. If d is marked dead it will be silently ignored. :param apply_to_members: when true and d is an ensemble object the operator is applied to the members. When false the metadata for the ensemble will be altered. This parameter is ignored for atomic data types. Default is False.

Returns:: always returns a (usually) edited copy of the input d.

When the input d is dead the copy will always be unaltered. Note the copy is a shallow copy which in python means we just return the equivalent of a pointer to the caller. Important for efficiency as d can be very large for some ensembles.

check_keys(d)[source]: Useless implementation of required abstract base method. In this case all it does is test if the stored value of key is a string. Returns true if the key is a string and false otherwise.

check_operation(d)[source]: All implementations should implement this method even if they choose to ignore it. It should be used to guarantee the operator the class defines will succeed on the datum sent to the apply method. e.g. any standard arithmetic operations will throw a TypeError if one of the operands is a string. This method should be used to make the operator as bombproof as possible logging a message to the datum rather than aborting if there is an issue. Some classes may want to implement this as pass because it makes no sense - e.g. setting a constant value. Those are the exception, however, so the api dogmatically demands these be implemented even if they do nothing.

class mspasspy.algorithms.edit.Subtract(key, value_to_subtract)[source]

Bases: MetadataOperator

Used to implement -= operator on a specified Metadata key. Example: to subtract 2 from metadata, d, with key=’icdp’ could use this

op = Subtract(‘icdp’,2) d = op.apply(d)

apply(d, apply_to_members=False)[source]

Implementations are required to implement this method. It must accept a MsPASS data object, d, (TimeSeries, Seismogram, TimeSeriesEnsemble, or SeimogramEnsemble) and apply the operator the implementation defines to the Metadata of d. It should throw a MsPASSError exception if d is not one of the expected data types. It should, on the other hand, handle d==None gracefully and do nothing if d is None. Ensemble handlers need to deal with an ambiguity in which metadata the operation refers to. Both the overall ensemble container and individual members have (normally independent) Metadata containers. Implementations of this method should have a apply_to_members argument as in the signature for this base class. When True the editing is done by a loop over all members while if False (which should be the default) the ensemble’s container should be used.

This method must return an (potentially but not guarnateed) edited version of d. That is essential to allow this method to be used as the function in a parallel map operator.

If a required metadata key is missing this method should do nothing unless a verbose flag is set. In that case it should log that as an error. Best practice for any use of operators based on this base class is to apply a kill operator for (MetadataDefined) on all data before running a calculator. That way you can guarantee the data needed for all operations is valid before trying to do calculations on on required metadata. There is some overhead in validating metadata so all implementations should include use of “fast_mode”. When set true the safeties will be bypassed for speed. That includes at least the two requried methods “check_keys” and “check_operation”. Implementations may add other safties.

check_keys(d)[source]: checks that the key defined for the operator is defined for d. Returns true if is defined and false if not. the method assumes d is a valid child of Metadata so the is_defined method will not generate an exception. that means this method should ALWAYS be called after a test with _input_is_valid.

check_operation(d)[source]: This method checks to make sure the value works with the operation required. It provides a (required by the base class) standardization of testing for validity of the operation.

class mspasspy.algorithms.edit.Subtract2(key0, key1, key2)[source]

Bases: MetadataOperator

Used to implement - operator that computes the difference of two Metadata attributes. The attributes are feteched with two keys set when the operator is constructed. Let a be the value in a datum associated with key2 (arg1 of constructor) and b be the value associated with key2 (arg2 of constructor). The apply method of this class computes a-b and sets the Metadata attribute defined by key0 (arg0 of constructor) to that value (a-b) i.e. d[key0] = d[key1] - d[key2]

Note key0 can be the same as either key1 or key2. The contents of the left hand side (key0) are always set by this operator unless that input was previously marked dead. Further key1 and key2 can be the same although it is hard to conceive how that could be useful.

Example: to compute ix as the d[‘sx’] - d[‘chan’] use

op = Subtract2(‘ix1’,’sx’,’chan’) d = op.apply(d)

apply(d, apply_to_members=False)[source]

Implementations are required to implement this method. It must accept a MsPASS data object, d, (TimeSeries, Seismogram, TimeSeriesEnsemble, or SeimogramEnsemble) and apply the operator the implementation defines to the Metadata of d. It should throw a MsPASSError exception if d is not one of the expected data types. It should, on the other hand, handle d==None gracefully and do nothing if d is None. Ensemble handlers need to deal with an ambiguity in which metadata the operation refers to. Both the overall ensemble container and individual members have (normally independent) Metadata containers. Implementations of this method should have a apply_to_members argument as in the signature for this base class. When True the editing is done by a loop over all members while if False (which should be the default) the ensemble’s container should be used.

This method must return an (potentially but not guarnateed) edited version of d. That is essential to allow this method to be used as the function in a parallel map operator.

If a required metadata key is missing this method should do nothing unless a verbose flag is set. In that case it should log that as an error. Best practice for any use of operators based on this base class is to apply a kill operator for (MetadataDefined) on all data before running a calculator. That way you can guarantee the data needed for all operations is valid before trying to do calculations on on required metadata. There is some overhead in validating metadata so all implementations should include use of “fast_mode”. When set true the safeties will be bypassed for speed. That includes at least the two requried methods “check_keys” and “check_operation”. Implementations may add other safties.

check_keys(d)[source]: checks that the keys required for the operator are defined for d. If either required key are missing from d return False. Return True if both are set in d. The method assumes d is a valid child of Metadata so that the is_defined method will not generate an exception. that means this method should ALWAYS be called after a test with _input_is_valid.

check_operation(d)[source]: This method checks to make sure the value works with the operation required. It provides a (required by the base class) standardization of testing for validity of the operation.

mspasspy.algorithms.edit.erase_metadata(d, keylist, apply_to_members=False)[source]

This editor clears the contents of any data associated with a list of Metadata keys. If there is no data for that key it will silently do nothing. If the input is an ensemble and apply_to_members is True all the algorithm will run on the metadata of each member in a loop. If false only the ensemble (global) metadata are handled.

Parameters:: d – must be a valid MsPASS data object. If not the function

will throw an exception. If the datum is marked dead it will silently just return the contents.

Parameters:: keylist – python list of strings to use as keywords. Any matching

keys the metadata of d will be cleared.

Parameters:: apply_to_members – is a boolean controlling how the function

should handle ensembles. Then set true the erase algorithm will be applied in a loop over the member vector of an input ensemble. When False (default) only the ensemble’s metadata is checked. This parameter is ignored if the input is an atomic data type.

Returns:: edited copy of d

resample

class mspasspy.algorithms.resample.BasicResampler(dt=None, sampling_rate=None)[source]

Bases: ABC

Base class for family of resampling operators. All this class really does is define the interface and standardize the target of the operator. A key concept of this family of operator is they are intended to be used in a map operator to regularize the sample rate to a constant. Hence, the base class defines that constant output sample rate (alternatively the sample interval, dt).

ALL implementations must recognize a couple fundamental concepts:

This is intended to ONLY be used on waveform segments. The problem of resampling continuous data requires different algorithms. The reason is boundary effects. All implementations are subject to edge transients. How the implementation does or does not handle that issue is viewed as an implementation detail,
Downsampling ALWAYS requires some method to avoid aliasing of the output. How that is done is considered an implementation detail.

This is a sketch of an algorithm is pseudopython code showing how a typical instance of this class (in the example ScipyResampler) would be used in parallel workflow sketch:

Example

resamp_op = ScipyResampler(10.0) # target sample rate of 10 sps cursor = db.TimeSeries.find({}) bag = read_distributed_data(cursor,collection=”wf_TimeSeries”) bag = bag.map(resamp_op.resample) bag.map(db.save_data()) bag.compute()

A point to emphasize is the model is to generate the operator through a constructor (ScipResampler in the example) that defines the target sample rate. All data passed through that operator through the map operator call will be returned to create a bag/rdd with a uniform sample rate. All implementations should also follow the MsPASS rule for parallel algorithms to kill data that cannot be handled and not throw exceptions unless the whole usage is wrong.

dec_factor(d)[source]

All implementations of decimation algorithms should use this method to test if resampling by decimation is feasible. A decimation operator only works for downsampling with the restriction that the ouptut sample interval is an integer multiple of the input sample interval.

The function returns a decimation factor to use for atomic data d being tested. The algorithm uses the numpy “isclose” function to establish if the sample interval is feasible. If so it returns the decimation factor as an integer that should be used on d. Not implementations should handle a return of 1 specially for efficiency. A return of 1 means no resampling is needed. A return of 0 or -1 is used for two slightly different cases that indicate a decimation operation is not feasible. A return of 0 should be taken as a signal that the data requires upsampling to match the target sampling rate (interval). A return of -1 means the data require downsampling but the a decimator operator is not feasible. (e.g. needing to create 10 sps data from 25 sps input.)

Parameters:: d – input mspass data object to be tested. All that is

actually required is d have a “dt” attribute (i.e. d.dt is defined) that is the sample interval for that datum. :type d: assumed to be a MsPASS atomic data object for which the dt attribute is defined. This method has no safeties to test input type. It will throw an exception if d.dt does not resolve.

abstract resample(mspass_object)[source]: Main operator a concrete class must implement. It should accept any mspass data object and return a clone that has been resampled to the sample interface defined by this base class.

target_dt()[source]

target_samprate()[source]

class mspasspy.algorithms.resample.ScipyDecimator(sampling_rate, n=None, ftype='iir', zero_phase=True)[source]

Bases: BasicResampler

This class defines a generic operator where a decimator can be used to downsample any input data to a common sample rate. A decimator requires the ratio of the input to output sample rate to be an integer (equivalently the ratio of the output sample interval to the input sample interval). The operator will fail on any data it receives that are irregular in that sense. For example, 10 sps data can be created by downsampling 40 sps data by a factor of 4. In constract, 10 sps can not be created by decimation of 25 sps data because the ratio is 2.5 (not an integer).

This operator satisfies the concept defined in BasicResampler. That is, a particular concrete instance once constructed will define an operator that will resample any input data to a common sample rate/interval. Because of the requirement of this algorithm that the sample intervals/rates are related by integers the operator has to be able to handle irregular sample rate data. The algorithm is brutal and will kill any datum for which the integer test fails and post an elog message.

This operator is really little more than a wrapper around a scipy function with a similar same name (scipy.signal.decimate). The resample method handles any supported MsPASS data object type but will fail with a TypeError if it receives any other data type.

Be aware decimators all have unavoidable edge effects. The anitialias filter that has to be applied (you can get garbage otherwise) will always produce an edge transient. A key to success with any downsampling operator is to always have a pad zone if possible. That is, you start with a longer time window than you will need for final processing and discard the pad zone when you enter the final stage. Note that is actually true of ALL filtering.

The constructor has almost the same arguments defined in the documentation page for scipy.signal.decimate. The only exception is the axis argument. It needs to be defined internally. For scalar data we pass 0 while for three component data we sent it 1 which is means the decimator is applied per channel.

resample(mspass_object)[source]

Implementation of required abstract method for this operator. The only argument is mspass_object. The operator will downsample the contents of the sample data container for any valid input. If the input is not a mspass data object (i.e. atomic TimeSeries or Seismogram) or one of the enemble objects it will throw a TypeError exception.

Note for ensembles the algorithm simply applies this method in a loop over all the members of the ensemble. Be aware that any members of the ensemble cannot be resampled to the target sampling frequency (interval) they will be killed. For example, if you are downsampling to 20 sps and you have 25 sps data in the ensemble the 25 sps data will be killed on output with an elog message posted.

Returns an edited clone of the input with revised sample data but no changes to any Metadata.

class mspasspy.algorithms.resample.ScipyResampler(sampling_rate, window='hann')[source]

Bases: BasicResampler

This class is a wrapper for the scipy resample algorithm. Obspy users should note that the Trace and Stream method called “resample” is only a light wrapper to apply the scipy resample function.

The algorithm and its limitations are described in the scipy documentation you can easily find with a web search. A key point about this algorithm is that unlike decimate it allows resampling to something not an integer multiple or division from the input OR if you need to upsample data to match the rest of the data set (Note that is not usually a good idea unless the upsampling is followed

by a decimator to get all data to a lower, uniform sample rate.)

A type example where that is essential is some old OBS data from Scripps instruments that had a sample rate that was a multiple of one of the more standard rates like 20 or 100. Such data can be downsampled immediately too something like 10 sps with this operator or upsampled to something like 50 and then downsampled to something like 10 with a factor of 5 decimator.

We emphasize a nice feature of the scipy implementation is that it automatically applies a rational antialiasing filter when downsampling, If, however, you need to do something like regularize a data set with irregular sample rates but preserve a common upper frequency response controlled at the high frequency end by digizer antialias filters (e.g. LH channels from Q330 data) you will need to crack the scipy documentation on setting up a custom antialias filter using FIR filters defined through the window argument. All common digitizer FIR filter coefficients can be found in appropriate response files. That should, in principle, be feasible but mspass developers have not tested that hypothesis.

The concept of the window argument in this constructor is idential to that described in the documentation for scipy.signal.resample. The value passed, in fact, is used as the argument whenever the scipy function is called. Note the other optional arguments to scipy resample are always defaulted because the current default apply to all cases we handle. Be careful if resample changes.

The primary method of this class is a concrete implementation of the resample method.

resample(mspass_object)[source]

Applies the scipy.signal.resample function to all data held in a mspass container passed through arg0 (mspass_object). This method will accept all supported MsPASS datat objects: TimeSeries, Seismogram, TimeSeriesEnsemble, and SeismogramEnsemble. For Ensembles the method is called recursively on each of the members.

The method returns mspass_object with the sample data altered by the operator defined by a particular instance, which is defined exclusively by the target sample rate for the output. All metadata will be clone without checking. If the metadata have attributes linked to the sample interval the metadata of the result may not match the data.

If the input is marked dead it will be returned immediately with no change.

mspasspy.algorithms.resample.resample(mspass_object, decimator, resampler, verify_operators=True, object_history=False, alg_name='resample', alg_id=None, dryrun=False, inplace_return=False, function_return_key=None)[source]

Resample any valid data object to a common sample rate (sample interval).

This function is a wrapper that automates handling of resampling. Its main use is in a dask/spark map operator where the input can be a set of irregularly sampled data and the output is required to be at a common sample rate (interval). The problem has some complexity because decimation is normally the preferred method of resampling when possible due to speed and more predictable behavior. The problem is that downsampling by decimation is only possible if the output sampling interval is an integer multiple of the input sample interval. With modern seismology data that is usually possible, but there are some common exceptions. For example, 10 sps cannot be created from 25 sps by decimation. The algorithm tests the data sample rate and if decimation is possible it applies a decimation operator passed as the argument “decimator”. If not, it calls the operator “resampler” that is assumed to be capable of handling any sample rate change. The two operators must have been constructed with the same output target sampling frequency (interval). Both must also be a subclass of BasicResampler to match the api requirements.

The parameters object_history, alg_name, alg_id, dryrun, inplace_return, and function_return_key are handled by the decorator called mspass_func_wrapper used by this function. See the docstring for mspass_func_wrapper for the generic use of those parameters.

Parameters:

mspass_object (Must a TimeSeries, Seismogram, TimeSeriesEnsemble, or SeismogramEnsemble object.) – mspass datum to be resampled
decimator (Must be a subclass of BasicResampler) – decimation operator.
resampler (Must be a subclass of BasicResampler) – resampling operator
verify_operators – boolean controlling whether safety checks are applied to inputs. When True (default) the contents of decimator and resampler are verified as subclasses of BasicResampler and the function tests if the target output sampling frequency (interval) of both operators are the same. The function will throw an exception if any of the verify tests fail. Standard practice should be to verify the operators and valid before running a large workflow and running production with this arg set False for efficiency. That should acceptable in any case I can conceive as once the operators are defined in a parallel workflow they should be invariant for the entire application in a map operator.

signals

mspasspy.algorithms.signals.correlate(a, b, shift, object_history=False, alg_name='correlate', alg_id=None, dryrun=False, demean=True, normalize='naive', method='auto')[source]

Cross-correlation of two signals up to a specified maximal shift. :param a: first signal :param b: second signal :param shift: Number of samples to shift for cross correlation. The cross-correlation will consist of 2*shift+1 or

2*shift samples. The sample with zero shift will be in the middle.

Parameters:

object_history – True to preserve the processing history. For details, refer to mspass_func_wrapper_multi.
alg_name (str) – alg_name is the name the func we are gonna save while preserving the history.
alg_id (bson.objectid.ObjectId) – alg_id is a unique id to record the usage of func while preserving the history.
dryrun – True for dry-run, which return “OK”. Used in the mspass_func_wrapper.
demean (bool) – Demean data beforehand.
normalize – Method for normalization of cross-correlation. One of ‘naive’ or None (True and False are supported for backwards compatibility). ‘naive’ normalizes by the overall standard deviation. None does not normalize.
method (str) – Method to use to calculate the correlation. ‘direct’: The correlation is determined directly from sums, the definition of correlation. ‘fft’ The Fast Fourier Transform is used to perform the correlation more quickly. ‘auto’ Automatically chooses direct or Fourier method based on an estimate of which is faster. (Only available for SciPy versions >= 0.19. For older Scipy version method defaults to ‘fft’.)

Returns:

cross-correlation function.

mspasspy.algorithms.signals.correlate_stream_template(stream, template, object_history=False, alg_name='correlate_stream_template', alg_id=None, dryrun=False, template_time=None, return_type='seismogram', **kwargs)[source]

mspasspy.algorithms.signals.correlate_template(data, template, object_history=False, alg_name='correlate_template', alg_id=None, dryrun=False, mode='valid', normalize='full', demean=True, method='auto')[source]

mspasspy.algorithms.signals.correlation_detector(stream, templates, heights, distance, object_history=False, alg_name='correlation_detector', alg_id=None, dryrun=False, template_times=None, template_magnitudes=None, template_names=None, similarity_func=None, details=None, plot=None, return_type='seismogram', **kwargs)[source]

mspasspy.algorithms.signals.detrend(data, *args, object_history=False, alg_name='dtrend', alg_id=None, dryrun=False, inplace_return=True, type='simple', **options)[source]

This function removes a trend from the data, which is a mspasspy object. Note it is wrapped by mspass_func_wrapper, so the processing history and error logs can be preserved.

Parameters:

data – input data, only mspasspy data objects are accepted, i.e. TimeSeries, Seismogram, Ensemble.
type (str) – type of filter, ‘simple’, ‘linear’, ‘constant’, ‘polynomial’, ‘spline’. You can refer to Obspy <https://docs.obspy.org/packages/autogen/obspy.core.trace.Trace.detrend.html> for details.
args – extra arguments
object_history – True to preserve the processing history. For details, refer to mspass_func_wrapper.
alg_name (str) – alg_name is the name the func we are gonna save while preserving the history.
alg_id (bson.objectid.ObjectId) – alg_id is a unique id to record the usage of func while preserving the history.
dryrun – True for dry-run, which return “OK”. Used in the mspass_func_wrapper.
inplace_return – True to return data in mspass_func_wrapper.
options – extra kv options

Returns:

None

mspasspy.algorithms.signals.filter(data, type, *args, object_history=False, alg_name='filter', alg_id=None, dryrun=False, inplace_return=True, **options)[source]

This function filters the data of mspasspy objects. Note it is wrapped by mspass_func_wrapper, so the processing history and error logs can be preserved.

Parameters:

data – input data, only mspasspy data objects are accepted, i.e. TimeSeries, Seismogram, Ensemble.
type (str) – type of filter, ‘bandpass’, ‘bandstop’, ‘lowpass’, ‘highpass’, ‘lowpass_cheby_2’, ‘lowpass_fir’, ‘remez_fir’. You can refer to Obspy <https://docs.obspy.org/packages/autogen/obspy.core.trace.Trace.filter.html> for details.
args – extra arguments
object_history – True to preserve the processing history. For details, refer to mspass_func_wrapper.
alg_name (str) – alg_name is the name the func we are gonna save while preserving the history.
alg_id (bson.objectid.ObjectId) – alg_id is a unique id to record the usage of func while preserving the history.
dryrun – True for dry-run, which return “OK”. Used in the mspass_func_wrapper.
inplace_return – True to return data in mspass_func_wrapper.
options – extra kv options

Returns:

None

mspasspy.algorithms.signals.interpolate(data, sampling_rate, *args, object_history=False, alg_name='interpolate', alg_id=None, dryrun=False, inplace_return=True, method='weighted_average_slopes', starttime=None, npts=None, time_shift=0.0, **kwargs)[source]

This function interpolates data, which is a mspasspy object. Note it is wrapped by mspass_func_wrapper, so the processing history and error logs can be preserved.

Parameters:

data – input data, only mspasspy data objects are accepted, i.e. TimeSeries, Seismogram, Ensemble.
sampling_rate – The new sampling rate in Hz.
args – extra arguments.
object_history – True to preserve the processing history. For details, refer to mspass_func_wrapper.
alg_name (str) – alg_name is the name the func we are gonna save while preserving the history.
alg_id (bson.objectid.ObjectId) – alg_id is a unique id to record the usage of func while preserving the history.
dryrun – True for dry-run, which return “OK”. Used in the mspass_func_wrapper.
inplace_return – True to return data in mspass_func_wrapper.
method (str) – One of “linear”, “nearest”, “zero”, “slinear”, “quadratic”, “cubic”, “lanczos”, or “weighted_average_slopes”. You can refer to Obspy <https://docs.obspy.org/packages/autogen/obspy.core.trace.Trace.interpolate.html> for details.
starttime (UTCDateTime or int) – The start time (or timestamp) for the new interpolated stream. Will be set to current start time of the data if not given.
npts (int) – The new number of samples. Will be set to the best fitting number to retain the current end time of the trace if not given.
time_shift – Shift the trace by adding time_shift to the starttime. The time shift is always given in seconds. A positive shift means the data is shifted towards the future, e.g. a positive time delta. Note that this parameter solely affects the metadata. The actual interpolation of the underlaying data is governed by the parameters sampling_rate, starttime and npts.
kwargs – extra kv arguments

Returns:

None.

mspasspy.algorithms.signals.templates_max_similarity(st, time, streams_templates, object_history=False, alg_name='templates_max_similarity', alg_id=None, dryrun=False)[source]

mspasspy.algorithms.signals.xcorr_3c(st1, st2, shift_len, object_history=False, alg_name='xcor_3c', alg_id=None, dryrun=False, components=None, full_xcorr=False, abs_max=True)[source]

mspasspy.algorithms.signals.xcorr_max(data, object_history=False, alg_name='xcor_max', alg_id=None, dryrun=False, abs_max=True)[source]

mspasspy.algorithms.signals.xcorr_pick_correction(trace1, trace2, pick1, pick2, t_before, t_after, cc_maxlag, object_history=False, alg_name='xcorr_pick_correction', alg_id=None, dryrun=False, filter=None, filter_options={}, plot=False, filename=None)[source]

snr

mspasspy.algorithms.snr.FD_snr_estimator(data_object, noise_window=<mspasspy.ccore.algorithms.basic.TimeWindow object>, noise_spectrum_engine=None, signal_window=<mspasspy.ccore.algorithms.basic.TimeWindow object>, signal_spectrum_engine=None, band_cutoff_snr=2.0, signal_detection_minimum_bandwidth=6.0, tbp=4, ntapers=6, high_frequency_search_start=2.0, fix_high_edge=True, poles=3, perc=95.0, optional_metrics=None, save_spectra=False)[source]

Estimates one or more amplitude metrics of signal-to-noise from a TimeSeries object. An implicit assumption is that the analysis is centered on a timeable “phase” like P, PP, etc.

This is a python function that can be used to compute one or several signal-to-noise ratio estimates based on an estimated bandwidth using the C++ function EstimateBandwidth. The function has a fair number of options, but the core metrics computed are the bandwidth estimates computed by that function. It uses a fairly simple search algorithm that functions well for most earthquake sources. For the low end the algorithm searches from the first frequency indistinguishable from DC to find the lowest frequency for which the snr exceeds a threshold specified by the input parameter ‘band_cutoff_snr’. It does a similar search from the high end from a point 80% of Nyquist - a good choice for all modern digital data that use FIR antialias filters. Both searches are not just defined with just the first frequency to satisfy the snr threshold criteria. Only when a group of frequencies more than 2 times the time-bandwidth product exceed the threshold is the band edge defined. The actual band edge is then defined as the first frequency exceeding the threshold. That more elaborate algorithm was used to prevent pure lines in either the signal or noise spectrum from corrupting the estimates.

A set of optional metrics can be computed. All optional metrics use the bandwidth estimates in one way or another. Optional metrics are defined by the following keywords passed through a list (actually any iterable container will work) of strings defining one or more of the keywords. The metrics and a brief description of each follow:

snr_stats computes what are commonly plotted in box plots for the snr estimates within the estimated bandwidth: minimum, maximum, 0.25 (1/4) point, 0.75 (3/4) point, and the median. These are set with following dict keys: ‘snr_band_maximum’,’snr_band_minimum’, ‘snr_band_1/4’, ‘srn_band_3/4’, and ‘snr_band_median’ respectively.

filtered_envelope, filtered_L2, filtered_Linf, filtered_perc, and filtered_MAD: All of these optional metrics first copy the data_object and then filter the copy with a Butterworth bandpass filter with the number of poles specified by the npoles argument and corners at the estimated band edge by the EstimateBandwidth function. The metrics computed are time domain snr estimates computed with he filtered data. They are actually computed from functions in this same module that can be used independently and have their own docstring description. The functions called have the following names in order of the keyword list above: snr_envelope, snr_filtered_rms, snr_Linv, and snr_filtered_mad. When the computed they are set in the output dictionary with the following (again in order) keys: ‘snr_envelope’,’snr_filtered_rms’, ‘srn_Linf’, and ‘snr_filtered_mad’.

It is important to note that all the metrics this function returns are measures of amplitude NOT power. You need to be particularly aware of this if you unpickle the spectra created if you set save_spectra true as those are power spectra.

Parameters:: data_object – TimeSeries object to be processed. For Seismogram

objects the assumption is algorithm would be used for a single component (e.g longitudinal or vertical for a P phase)

Parameters:: noise_window – defines the time window to use for computing the

spectrum considered noise. The time span can be either relative or UTC (absolute) time but we do not check for consistency. This low level function assumes they are consistent. If not, the calculations are nearly guaranteed to fail. Type must be mspasspy.ccore.TimeWindow.

Parameters:: signal_window – defines the time window to use that defines what

you consider “the signal”. The time span can be either relative or UTC (absolute) time but we do not check for consistency. This low level function assumes they are consistent. If not, the calculations are nearly guaranteed to fail. Type must be mspasspy.ccore.TimeWindow.

Parameters:: noise_spectrum_engine – is expected to either by a None type

or an instance of a ccore object called an MTPowerSpectralEngine. When None an instance of MTPowerSpectralEngine is computed for each call to this function. That is a convenience for small jobs or when called with data from mixed sample rates and/or variable length time windows. It is very inefficient to use the default approach for processing large data sets and really for any use in a map operation with dask or spark. Normal use should be for the user to predefine an MtPowerSpectralEngine from the expected window size for a given data sample rate and include it in the function call.

Parameters:: signal_spectrum_engine – is the comparable MTPowerSpectralEngine

to use to compute the signal power spectrum. Default is None with the same caveat as above for the noise_spectrum_engine.

Parameters:: band_cutoff_snr – defines the signal-to-noise ratio floor

used in the search for band edges. See description of the algorithm above and in the user’s manual. Default is 2.0

Parameters:: signal_detection_minimum_bandwidth – As noted above this

algorithm first tries to estimate the bandwidth of data where the signal level exceeds the noise level defined by the parameter band_cutoff_snr. It then computes the bandwidth of the data in dB computed as log10(f_high/f_low). For almost any application if the working bandwidth falls below some threshold the data is junk to all intends and purpose. A factor more relevant to this algorithm is that the “optional parameters” will all be meaningless and a waste of computational effort if the bandwidth is too small. A particular extreme example is zero bandwidth that happens all the time if no frequency band exceeds the band_cutoff_snr for a range over that minimum defined by the time-bandwidth product. The default is 6.0. (One octave which is roughly the width of the traditional short-period band) which allows optional metrics to be computed but may be too small for some applications. If your application requires higher snr and wider bandwidth adjust this parameter and/or band_cutoff_snr.

Parameters:: tbp – time-bandwidth product to use for computing the set of

Slepian functions used for the multitaper estimator. This parameter is used only if the noise_spectrum_engine or signal_spectrum_engine arguments are set as None. The default is 4.0

Parameters:: ntapers – is the number of Slepian functions (tapers) to compute

for the multitaper estimators. Like tbp it is referenced only if noise_spectrum_engine or signal_spectrum_engine are set to None. Note the function will throw an exception if the ntaper parameter is not consistent with the time-bandwidth product. That is, the maximum number of tapers is round(2*tbp-1). Default is 6 which is consistent with default tbp=4.0 where the maximum recommended is 8

Parameters:

high_frequency_search_start – Used to specify the upper frequency used to start the search for the upper end of the bandwidth by the function EstimateBandwidth. Default is 2.0 which reasonable for teleseismic P wave data. Should be change for usage other than analysis of teleseimic P phases or you the bandwidth may be grossly underestimated.
fix_high_edge – boolean controlling upper search behavior. When set True the search from the upper frequency limit is disabled and the upper band limit edge is set as the value passed as high_frequency_search_start. False enables the search. True is most useful for teleseismic body waves as many stations have a series of closely enough spaced lines (presumably from electronic sources) that set the high edge incorrectly. False would be more appropriate for most local and regional earthquake data. The default is True.
npoles – defines number of poles to us for the Butterworth

bandpass applied for the “filtered” metrics (see above). Default is 3.

Parameters:: perc – used only if ‘filtered_perc’ is in the optional metrics list.

Specifies the perc parameter as used in seismic unix. Uses the percentage point specified of the sorted abs of all amplitudes. (Not perc=50.0 is identical to MAD) Default is 95.0 which is 2 sigma for Gaussian noise.

Parameters:: optional_metrics – is an iterable container containing one or more

of the optional snr metrics discussed above.

Parameters:: store_as_subdocument – This parameter is included for

flexibility but should not normally be changed by the user. As noted earlier the outputs of this function are best abstracted as Metadata. When this parameter is False the Metadata members are all posted with directly to data_object’s Metadata container. If set True the internally generated python dict is copied and stored with a key defined through the subdocument_key argument. See use below in function arrival_snr.

Parameters:: subdocument_key – key for storing results as a subdocument.

This parameter is ignored unless store_as_subdocument is True. Default is “snr_data”

Parameters:: save_spectra – If set True (default is False) the function

will pickle the computed noise and signal spectra and save the strings created along with a set of related metadata defining the time range to the output python dict (these will be saved in MongoDB when db is defined - see below). This option should ONLY be used for spot checking, discovery of why an snr metric has unexpected results using graphics, or a research topic where the spectra would be of interest. It is a very bad idea to turn this option on if you are processing a large quantity of data and saving the results to MongoDB as it will bloat the arrival collection. Consider a different strategy if that essential for your work.

Returns:: python tuple with two components. 0 is a python dict with

the computed metrics associated with keys defined above. 1 is a mspass.ccore.ErrorLogger object. Any errors in computng any of the metrics will be posted to this logger. Users should then test this object using it’s size() method and if it the log is not empty (size >0) the caller should handle that condition. For normal use that means pushing any messages the log contains to the original data object’s error log. Component 0 will also be empty with no log entry if the estimated bandwidth falls below the threshold defined by the parameter signal_detection_minimum_bandwidth.

mspasspy.algorithms.snr.arrival_snr(data_object, noise_window=<mspasspy.ccore.algorithms.basic.TimeWindow object>, noise_spectrum_engine=None, signal_window=<mspasspy.ccore.algorithms.basic.TimeWindow object>, signal_spectrum_engine=None, band_cutoff_snr=2.0, signal_detection_minimum_bandwidth=6.0, tbp=4.0, ntapers=6, high_frequency_search_start=2.0, fix_high_edge=True, poles=3, perc=95.0, save_spectra=False, phase_name='P', arrival_time_key='Ptime', metadata_output_key='Parrival', kill_null_signals=True, optional_metrics=['snr_stats', 'filtered_envelope', 'filtered_L2', 'filtered_Linf', 'filtered_MAD', 'filtered_perc'])[source]

Specialization of FD_snr_estimator. A common situation where snr data is a critical thing to estimate is data windowed around a given seismic phase. FD_snr_estimator is a bit more generic. This function removes some of the options from the more generic function and has a frozen structure appropriate for measuring snr of a particular phase. In particular it always stores the results as a subdocument (python dict) keyed by the name defined in the metadata_output_key argument. This function has a close sibling called “broadband_snr_QC” that has similar behavior but add some additional functionality. The most significant limitation of this function relative to broadband_snr_QC is that this function ONLY accepts TimeSeries data as input.

This function is most appropriate for QC done within a workflow where the model is to process a large data set and winnow it down to separate the wheat from the chaff, to use a cliche consistent with “winnow”. In that situation the normal use would be to run this function with a map operator on atomic data and follow it with a call to filter to remove dead data and/or filter with tests on the computed metrics. See User’s Manual for guidance on this topic. Because that is the expected normal use of this function the kill_null_signals boolean defaults to True.

To be more robust the function tries to handle a common error. That is, if the input data has a UTC time standard then the noise and signal windows would need to be shifted to some reference time to make any sense. Consequently, this algorithm silently handles that situation automatically with a simple test. If the data are relative time no test of the time window range is made. If the data are UTC, however, it tests if the signal time window is inside the data range. If not, it shifts the time windows by a time it tries to pull from the input with the key defined by “arrival_time_key”. If that attribute is not defined a message is posted to elog of the input datum and it is returned with no other change. (i.e. the attribute normally output with the tag defined by metadata_output_key will not exist in the output). Large data workflows need to handle this condition,.

Dead inputs are handled the standard way - returned immediately with no change.

Most parameters for this function are described in detail in the docstring for FD_snr_estimator. The user is referred there to see the usage. The following are added for this specialization:

Parameters:: phase_name – Name tag for the seismic phase being analyzed.

This string is saved to the output subdocument with the key “phase”. The default is “P”

Parameters:

arrival_time_key – key (string) used to fetch an arrival time if the data are in UTC and the time window received does not overlap the data range (see above)
kill_null_signals – boolean controlling how null snr estimator

returns are handled. When True (default) if FD_snr_estimator returns a null result (no apparent signal) that input datum is killed before being returned. In that situation no snr metrics will be in the output because null means FD_snr_estimator couldn’t detect a signal and the algorithm failed. When False the datum is returned silently but will have no snr data defined in a dict stored with the key metadata_output_key (i.e. that attribute will be undefined in output)

Parameters:: metadata_output_key – is a string used as a key under which the

subdocument (python dict) created internally is stored. Default is “Parrival”. The idea is if multiple phases are being analyzed each phase should have a different key set by this argument (e.g. if PP were also being analyzed in the same workflow you

might use a key like “PParrival”).

Returns:: a copy of data_object with the the results stored under

the key defined by the metadata_output_key argument.

mspasspy.algorithms.snr.broadband_snr_QC(data_object, component=2, noise_window=<mspasspy.ccore.algorithms.basic.TimeWindow object>, noise_spectrum_engine=None, signal_window=<mspasspy.ccore.algorithms.basic.TimeWindow object>, signal_spectrum_engine=None, band_cutoff_snr=2.0, signal_detection_minimum_bandwidth=6.0, tbp=4.0, ntapers=6, high_frequency_search_start=2.0, fix_high_edge=True, kill_null_signals=False, poles=3, perc=95.0, phase_name='P', metadata_output_key='Parrival', optional_metrics=['snr_stats', 'filtered_envelope', 'filtered_L2', 'filtered_Linf', 'filtered_MAD', 'filtered_perc'], save_spectra=False, use_measured_arrival_time=False, measured_arrival_time_key='Ptime', taup_model=None, source_collection='source', receiver_collection=None)[source]

Compute a series of metrics that can be used for quality control filtering of seismic phase data.

This function is intended as a workhorse to be used for low-level, automated QC of broadband data when the the data set is defined by signals linked to a timeable seismic phase. It can be thought of as a version of a related function called “arrival_snr” with some additional features. See the docstring for that function for what those base features are. Features this function adds not found in arrival_snr are:

This function allows Seismogram inputs. Only TimeSeries data are handled by arrival_snr.

This function provides an option to compute arrival times from source coordinates, receiver coordinates, and a handle to an obspy tau-p calculator.

Otherwise it behaves the same. Note both functions may or may not choose to interact with the function save_snr_arrival. If you want to save the computed metrics into a form more easily fetched your workflow should extract the contents of the python dictionary stored under the metadata_output_key tag and save the result to MongoDB with the save_snr_arrival function. That option is most useful for test runs on a more limited data set to sort out values of the computed metrics that are appropriate for a secondary winnowing of the your data. See User’s Manual for more on this concept.

The input of arg 0 (data_object) can be either a TimeSeries or a Seismogram object. If a Seismogram object is passed the “component” argument is used to extract the specified single channel from the Seismogram object and that component is used for processing. That is necessary because all the algorithms used are single channel algorithms. To use this function on all components use a loop over components BUT make sure you use a unique value for the argument “metadata_output_key” for each component. Note this will also produce multiple documents per input datum.

The type of the data_object also has a more subtle implication the user must be aware of. That is, in the MsPASS schema we store receiver coordinates in one of two different collections: “channel” for TimeSeries data and “site” for Seismogram data. When such data are loaded the generic keys like lat are always converted to names like channel_lat or site_lat for TimeSeries and Seismogram data respectively. This function uses the data type to set that naming. i.e. if the input is TimeSeries it tries to fetch the latitude data as channel_lat while if it the input is a Seismogram it tries to fetch site_lat. That is true of all coordinate data loaded by normalization from a source and receiver collection.

The following args are passed directly to the function FD_snr_estimator: noise_window, noise_spectrum_engine, signal_window, signal_spectrum_engine, band_cutoff_snr, signal_detection_minimum_bandwidth, tbp, ntapers, high_frequency_search_start, fix_high_edge, npoles, perc, optional_metrics, and save_spectrum. Below we only describe arguments added by this function:

data_object, phase_name=”P”, metadata_output_key=”Parrival”, use_measured_arrival_time=False, measured_arrival_time_key=”Ptime”, taup_model=None, component=2, source_collection=”source”, receiver_collection=None,

Parameters:: data_object – An atomic MsPASS data object to which the

algorithms requested should be applied. Currently that means a TimeSeries or Seismogram object. Any other input will result in a TypeError exception. As noted above for Seismogram input the component argument defines which data component is to be used for the snr computations.

Parameters:: component – integer (0, 1, or 2) defining which component of a

Seismogram object to use to compute the requested snr metrics. This parameter is ignored if the input is a TimeSeries.

Parameters:: metadata_output_key – string defining the key where the results

are to be posted to the returned data_object. The results are always posted to a python dictionary and then posted to the returned data_object with this key. Default is “Parrival”

Parameters:: use_measured_arrival_time – boolean defining the method used to

define the time reference for windowing used for snr calculations. When True the function will attempt to fetch a phase arrival time with the key defined by the “measured_arrival_time_key” argument. In that mode if the fetch fails the data_object will be killed and an error posted to elog. That somewhat brutal choice was intentional as the expectation is if you want to use measured arrival times you don’t want data where there are no picks. The default is True to make the defaults consistent. The reason is that the tau-p calculator handle is passed to the function when using model-based travel times. There is no way to default that so it defaults to None.

Parameters:: measured_arrival_time_key – is the key used to fetch a

measured arrival time. This parameter is ignored if use_measured_arrival_time is False.

Parameters:: taup_model – when use_measured_arrival_time is False this argument

is required. It defaults as None because there is no way the author knows to initialize it to anything valid. If set it MUST be an instance of the obspy class TauPyModel (https://docs.obspy.org/packages/autogen/obspy.taup.tau.TauPyModel.html#obspy.taup.tau.TauPyModel) Mistakes in use of this argument can cause a MsPASSError exception to be thrown (not logged thrown as a fatal error) in one of two ways: (1) If use_measured_arrival_time is False this argument must be defined, and (2) if it is defined it MUST be an instance of TauPyModel.

Parameters:: source_collection – normalization collection for source data.

The default is the MsPASS name “source” which means the function will try to load the source hypocenter coordinates (when required) as source_lat, source_lon, source_depth, and source_time from the input data_object. The id of that document is posted to the output dictionary stored under metadata_output_key.

Parameters:: receiver_collection – when set this name will override the

automatic setting of the expected normalization collection naming for receiver functions (see above). The default is None which causes the automatic switching to be involked. If it is any other string the automatic naming will be overridden.

Returns:: the data_object modified by insertion of the snr QC data

in the object’s Metadata under the key defined by metadata_output_key.

mspasspy.algorithms.snr.save_snr_arrival(db, doc_to_save, wfid, wf_collection='wf_Seismogram', save_collection='arrival', subdocument_key=None, use_update=False, update_id=None, validate_wfid=False) → ObjectId[source]

This function is a companion to broadband_snr_QC. It handles the situation where the workflow aims to post calculated snr metrics to an output database (normally in the “arrival” collection but optionally to a parent waveform collection. ). The alternative models as noted in the User’s Manual is to use the kill option to broadband_snr_QC followed by a call to the filter method of bag/rdd to remove the deadwood and reduce the size of data passed downstream in large parallel workflow. That case is better handled by using broadband_snr_QC directly.

How the data are saved is controlled by four parameters: save_collection, use_update, update_id and subdocument_key. They interact in a way that is best summarized as a set of cases that procuce behavior you may want:

If save_collection is not the parent waveform collection, behavior is driven by use_update combined with subdocument_key. When use_update is False (default) the contents of doc_to_save will be used to define a new document in save_collection with the MongoDB insert_one method.

When use_update is True the update_id will be assumed to be defined and point be the ObjectId of an existing document in save_collection. Specifically that id will be used as the query clause for a call the insert_one method. This combination is useful if a workflow is being driven by arrival data stored in save_collection created, for example, for a css3.0 a event->origin->assoc->arrival catalog of arrival picks. A variant of this mode will occur if the argument subdocument_key is defined (default is None). If you define subgdocument_key the contents of doc_to_save will be stored as a subdocument in save_collection accessible with the key defined by subdocument_key.

If save_collection is the same as the parent waveform collection (defined via the input parameter wf_collection) the value of use_update will be ignored and only an update will be attempted. The reason is that if one tried to save the contents of doc_to_save to a waveform collection would corrupt the database by have a bunch of documents that that could not be used to construct a valid data object (the normal use for one of the wf collections).

Parameters:: db – MongoDB database handle to use for transactions that

are the focus of this algorithm.

Parameters:: doc_to_save – python dictionary containing data to be saved.

Where and now this is saved is controlled by save_collection, use_update, and subdocument_key as described above.

Parameters:: wfid – waveform document id of the parent datum. It is

assumed to be an ObjectId of linking the data in doc_to_save to the parent. It is ALWAYS saved in the output with the key “wfid”.

Parameters:: wf_collection – string defining the collection from which the

datum from which the data stored in doc_to_save are associated. wfid is assumed define a valid document in wf_collection. Default is “wf_Seismogram”.

Parameters:: save_collection – string defining the collection name to which

doc_to_save should be pushed. See above for how this name interacts with other parameters.

Parameters:: subdocument_key – Optional key for saving doc_to_save as a

a subdocument in the save_collection. Default is None which means the contents of doc_to_save will be saved (or update) as is. For saves to (default) arrival collection this parameter should normally be left None, but is allowed. If save_collection is the parent waveform collection setting this to some sensible key is recommended to avoid possible name collisions with waveform Metadata key-value pairs. Default is None which means no subdocuments are created.

Parameters:: use_update – boolean controlling whether or not to use

updates or inserts for the contents of doc_to_save. See above for a description of how this interacts with other arguments to this function. Default is False.

Parameters:: update_id – ObjectId of target document when running in update

mode. When save_collection is the same as wf_collection this parameter is ignored and the required id passed as wfid will be used for the update key matching. Also ignored with the default behavior if inserting doc_to_save as a new document. Required only if running with a different collection and updating is desired. The type example noted above would be updates to existing arrival informations created from a css3.0 database.

Parameters:: validate_wfid – When set True the id defined by the

required argument wfid will be validated by querying wf_collection. In this mode if wfid is not found the function will silently return None. Callers using this mode should handle that condition.

Returns:: ObjectId of saved record. None if something went wrong

and nothing was saved.

mspasspy.algorithms.snr.snr(data_object, noise_window=<mspasspy.ccore.algorithms.basic.TimeWindow object>, signal_window=<mspasspy.ccore.algorithms.basic.TimeWindow object>, noise_metric='mad', signal_metric='mad', perc=95.0)[source]

Compute time-domain based signal-to-noise ratio with a specified metric.

Signal-to-noise ratio is a fundamental measurement in all forms of seismic data processing. There is, however, not a single unified metric that ideal for all types of signals one may want to analyze. One class of metrics used time-domain metrics to use some measure of amplitude in a signal and noise window cut from a single waveform segment. A type example is snr of some particular “seismic phase” (P, S, PP, ScS, etc) relative to some measure of background noise. e.g. for P phases it is nearly universal to try to estimate snr from some window defined by the arrival time of P and a noise window before the time P arrives (pre-event noise).

This function provides a generic api to measure a large range of metrics using one of four choices for measuring the norm of the data in the signal and noise windows:

rms - L2 norm

mad - median absolute difference, which is essentially the median amplitude in this context

perc - percentage norm ala seismic unix. perc is defined at as the amplitude level were perc percentage of the data have an amplitude smaller than this value. It is computed by ranking (sorting) the data, computing the count of that perctage relative to the number of amplitude samples, and returning the amplitude of the nearest value to that position in the ranked data.

peak - is the peak value which in linear algebra is the L infinity norm

Note the user can specify a different norm for the signal and noise windows. The perc metric requires specifying what percentage level to use. It is important to recognize that ALL of these metrics are scaled to amplitude not power (amplitude squared).

This function will throw a MsPASSError exception if the window parameters do not define a time period inside the range of the data_object. You will need a custom function if the model of windows insider a larger waveform segment does not match your data.

There is one final detail about an snr calculation that we handle carefully. With simulation data it is very common to have error free simulations where the “noise” window one would use with real data is all zeros. An snr calculated with this function in that situation would either return inf or NaN depending on some picky details. Neither is good as either can cause downstream problems. For that reason we trap any condition where the noise amplitude measure is computed as zero. If the signal amplitude is also zero we return a -1.0. Otherwise we return a large, constant, positive number. Neither condition will cause an exception to be thrown as that condition is considered somewhat to be anticipated.

Parameters:

data_object – MsPASS atomic data object (TimeSeries or Seismogram) to use for computing the snr. Note that for Seismogram objects the metrix always use L2 measures of amplitude of each sample (i.e. vector amplitudes) If snr for components of a Seismogram are desired use ExtractComponent and apply this function to each component separately.
noise_window – TimeWindow objects defining the time range to extract from data_object to define the part of the signal considered noise. Times can be absolute or relative. Default the range -5 to 120 which is makes sense only as time relative to some phase arrival time.
signal_window – TimeWindow object defining the time range to extract from data_object to define the part of the signal defines as signal to use for the required amplitude measure. Default of -130 to -5 is consistent with the default noise window (in terms of length) and is assumes a time relative to a phase arrival time. For absolute times each call to this function may need its own time window.
noise_metric – string defining one of the four metrics defined above (‘mad’,’peak’,’perc’ or ‘rms’) to use for noise window measurement.
signal_metric – string defining one of the four metrics defined above (‘mad’,’peak’,’perc’ or ‘rms’) to use for signal window measurement.

Returns:

estimated signal-to-noise ratio as a single float. Note the special returns noted above for any situation where the noise window amplitude is 0

window

class mspasspy.algorithms.window.TopMute(t0=0.0, t1=0.5, type='cosine')[source]

Bases: object

A top mute is a form of taper applied to the “front” of a signal. Front in standard jargon means that with the time axis running from left to right (normal unless your native language is Arabic) the time period on the left side of a plot. Data tagged at times less than what we call the zero time here are always zeroed. The mute defines a ramp function running from the zero time to the “one” time. In the ramp zone the ramp function multiplies the sample data so it is a form of “taper” as used in Fourier analysis. This class should normally only be used on data with the time reference type set Relative. It can be applied to UTC time standard data but with such data one of these objects would often need to be created for each atomic data object, which would be horribly inefficient. In most cases conversion to relative time is an essential step before using this algorithm.

This implementation uses an “apply” method for processing data. That means for a parallel construct instead of the symbol for a function you use the apply method as a function call. (e.g. if tm is an is an instance of this class and “d” is a TimeSeries or Seismogram object to be muted the function call that applies the mute would be ts.apply(d))

apply(d, object_history=False, instance=None)[source]

Use thie method to apply the defined top mute to one of the MsPASS atomic data objects. The method does a sanity check on the input data range. If the starttime of the data is greater than t0 for the mute the datum is killed and an error posted to elog. The reason is in that situation the data would be completely zeroed anyway and it is better to define it dead and leave an error message than to completely null data. :param d: input atomic MsPASS data object (TimeSeries or Seismogram) :object_history: It set true the function will add define this

step as an map operation to preserve object level history. (default is False)

Parameters:: instance – string defining the “instance” of this algorithm. This parameter is needed only if object_history is set True. It is used to define which instance of this algrithm is being applied. (In the C++ api this is what is called the algorithm id). I can come from the global history manager or be set manually.

mspasspy.algorithms.window.WindowData(d, win_start, win_end, t0shift=None, object_history=False, alg_name='scale', alg_id=None, dryrun=False)[source]

Cut data defined by a TimeWindow object.

Cutting a smaller waveform segment from a larger waveform segment is a very common seismic data processing task. The function is a python wrapper to adapt C++ code that accomplishes that task to the MsPASS framework.

Note this function uses one model for being bombproof with a map operation. Any exception that makes the result invalid will cause an error message to be posted to the elog attribute of the input datum AND the data will be marked dead (killed).

Parameters:

d – is the input data. d must be either a mspasspy.ccore.seismic.TimeSeries or mspasspy.ccore.seismic.Seismogram object or the function will log an error to d and return a None.
twin_start (float) – defines the start of timeWindow to be cut
twin_end (float) – defines the end of timeWindow to be cut
t0shift – is an optional time shift to apply to the time window. This parameter is convenient to avoid conversions to relative time. A typical example would be to set t0shift to an arrival time and let the window define time relative to that arrival time. Default is None which cause the function to assume twin is to be used directly.
object_history – boolean to enable or disable saving object level history. Default is False. Note this functionality is implemented via the mspass_func_wrapper decorator.
alg_name – When history is enabled this is the algorithm name assigned to the stamp for applying this algorithm. Default (“WindowData”) should normally be just used. Note this functionality is implemented via the mspass_func_wrapper decorator.
ald_id – algorithm id to assign to history record (used only if object_history is set True.) Note this functionality is implemented via the mspass_func_wrapper decorator.
dryrun – Note this functionality is implemented via the mspass_func_wrapper decorator.
dryrun – Note this functionality is implemented via the mspass_func_wrapper decorator.

Returns:

copy of d with sample range reduced to twin range. Returns an empty version of the parent data type (default constructor) if the input is marked dead

mspasspy.algorithms.window.WindowData_with_duration(d, duration, t0shift=None, object_history=False, alg_name='scale', alg_id=None, dryrun=False)[source]

mspasspy.algorithms.window.ensemble_error_post(d, alg, message, severity)[source]

This is a small helper function useful for error handlers in except blocks for ensemble objects. If a function is called on an ensemble object that throws an exception this function will post the message posted to all ensemble members. It silently does nothing if the ensemble is empty. :param d: is the ensemble data to be handled. It print and error message

and returns doing nothing if d is not one of the known ensemble objects.

Parameters:

alg – is the algorithm name posted to elog on each member
message – is the string posted to all members
severity – is the error severity level

mspasspy.algorithms.window.merge(tsvector, starttime=None, endtime=None, fix_overlaps=False, zero_gaps=False, object_history=False, alg_name='merge', alg_id=None, dryrun=False) → TimeSeries[source]

Splices a vector of TimeSeries objects together and optionally carves out a specified time window. It acts a bit like an obspy function with the same name, but has completely different options and works with native MsPASS TimeSeries objects.

This function is a workhorse for handling continuous data that are universally stored today as a series of files. The input to this function is an array of TimeSeries objects that are assummed to be created from a set of data stored in such files. The data are assumed to be from a common stream of a single channel and sorted so the array index defines a time order. The algorithm attempts to glue the segments together into a single time series that is returned. The algorithm by default assumes the input is “clean” which means the endtime of each input TimeSeries is 1 sample ahead of the start time of the next segment (i.e. (segment[i+1].t0()-segment[i].endtime()) == dt). The actual test is that the time difference is less than dt/2.

This algorithm treats two conditions as a fatal error and will throw a MsPASSError when the condition occurs:

It checks that the input array of TimeSeries data are in time order.

It checks that the inputs all have the same sample rate.

If fix_overlaps is False if an overlap is found it is considered an exception.

Either of these conditions will cause the function to throw an exception. The assumption is that either is a user error created by failing to reading the directions that emphasize this requirement for the input.

Other conditions can cause the output to be marked dead with an error message posted to the output’s elog attribute. These are:

This algorithm aims to produce an output with data stored in a continuous vector with a length defined by the total time span of the input. Naive use can create enormously long vectors that would mostly be empty. (e.g. two random day files with a 5 year difference in start time) The algorithm refuses to try to merge data when the span exceeds an internal threshold of 10^8 samples.

If fix_overlaps is false any overlap of successive endtime to the next starttime of more than 0.5 samples will cause the output to be killed. The assumption is such data have a serious timing problem retained even after any cleaning.

The above illustrates that this function behaves differently and makes different assumption if the argument check_overlaps is True or False. False is faster because it bypasses the algorithm used to fix overlaps, but is safer if your data is not perfectly clean in the sense of lacking any timing issues or problem with duplicates. If the fix_overlaps boolean is set True, the mspass overlap handler is involked that is a C++ function with the name “repair_overlaps”. The function was designed only to handle the following common situations. How they are handled is different for each situation.

Duplicate waveform segments spanning a common time interval can exist in raw data and accidentally by indexing two copies the same data. If the samples in the overlapping section match the algorithm will attempt to remove the overlapping section. The algorithm is known to work only for a pair of pure duplicates and a smaller segment with a start time after a more complete segment. It may fail if there are more than three or more copies of the same waveform in the input or one of the waveforms spans a smaller time range than the other but has the same start time. The first is a gross user error. The second should be rare an is conceivable only with raw data where a packet or two was randomly saved twice - something that shouldn’t happen but could with flakey hardware.

If an overlap is detected AND the sample data in the overlap are different the algorithm assumes the data have a timing problem that created this situation. Our experience is this situation only happens when an instrument has a timing problem. All continuous data generating digitizers we know of use a timing system slaved to an external reference (usually GPS time). If the external signal is lost the clock drifts. When the signal is restored if the digitizer detects a large time jump it may reset (time jerk) to tag the next packet of data with the updated time based on the standard. If that time jump is forward it will leave an apparent gap in the data, which we discuss below, but if the jump is backward it will leave an apparent overlap with inconsistent samples. The repair_overlaps function assumes that is the cause of all overlaps it detects.

The final common situation this function needs to handle is gaps. A gap is defined in this algorithm as any section where the endtime of one segment is followed by a start time of the next segment that is more than 1 sample in duration. Specifically when

(segment[i+1].t0()-segment[i].endtime()) > 1.5*dt

The result depends on values of the (optional) windowing arguments starttime and endtime and the boolean “zero_gaps” argument. If windowing is enabled (done by changing default None values of starttime and endtime) any gap will be harmless unless it is present inside the specified time time range. In all cases what happens to the output depends upon the boolean zero_gaps. When zero_gaps is False any gaps detected within the output range of the result (windowed range if specified but the full time of the input otherwise) When set True gap sections will be zeroed in the output data vector. All outputs with gaps that have been zeroed will have the boolean Metadata attribute “has_gap” set True and undefined otherwise. When the has_data attribute is set true the tap windows will be stored as a list of “TimeWindow” objects with the Metadata key “gaps”.

Parameters:

tsvector – array of TimeSeries data that are to be spliced together to create a single output. The contents must all have the same sample rate and be sorted by starttime.
starttime (double - assumed to be a UTC time expressed as a unix epoch time.) – (optional) start time to apply for windowing the output. Default is None which means the output will be created as the merge of all the inputs. When set WindowData is applied with this start time. Note if endtime is defined but starttime is None windowing is enabled with starttime = earliest segment start time.
endtime (double - assumed to be a UTC time expressed as a unix epoch time.) – (optional) end time to apply for windowing the output. Default is None which means the output will be created as the merge of all the inputs. When set WindowData is applied with this end time. Note if starttime is defined but endtime is None windowing is enabled with endtime = latest end time of all segments.
fix_overlaps (boolean) – when set True (default is False) if an overlap is detected the algorithm will attempt to repair the overlap to yield a continuous time series vectgor if it determined to have matching data. See description above for more details.
zero_gaps – When set False, which is the default, any gaps detected in the output window will cause the return to be marked dead. When set True, gaps will be zeroed and with a record of gap positions posted to the Metadata of the output. See above for details.
zero_gaps – boolean
object_history – boolean to enable or disable saving object level history. Default is False. Note this functionality is implemented via the mspass_func_wrapper decorator.
alg_name – When history is enabled this is the algorithm name assigned to the stamp for applying this algorithm. Default (“WindowData”) should normally be just used. Note this functionality is implemented via the mspass_func_wrapper decorator.
ald_id – algorithm id to assign to history record (used only if object_history is set True.) Note this functionality is implemented via the mspass_func_wrapper decorator.
dryrun – Note this functionality is implemented via the mspass_func_wrapper decorator.
dryrun – Note this functionality is implemented via the mspass_func_wrapper decorator.

Returns:

TimeSeries in the range defined by the time span of the input vector of segments or if starttime or endtime are specified a reduced time range. The result may be marked dead for a variety of reasons with error messages explaining why in the return elog attribute.

mspasspy.algorithms.window.scale(d, compute_from_window=False, window=None, method='peak', level=1.0, scale_by_section=False, use_mean=False, object_history=False, alg_name='scale', alg_id=None, dryrun=False, function_return_key=None)[source]

Top level function interface to data scaling methods.

This function can be used to scale seismic data contained in any of the four seismic data objects defined in mspass: TimeSeries, Seismogram, TimeSeriesEnsemble, and SeismogramEnsemble. An extensive set of amplitude estimation metrics are available by selecting one of the allowed values for the method parameter. Ensembles can be scaled at the individual seismogram level or as a whole (scale_by_section=True).

Note all methods preserve the original amplitude by updating the Metadata parameter calib to include the scaling. i.e. as always the amplitude of the data in physical units can be restored by multiplying the data samples by calib.

Parameters:

d – is input data object. If not one of the four mspass seismic data types noted above the function will throw a RuntimeError exception.
compute_from_window – boolean used to compute amplitude and scale based on a windowed section of the input waveform. By default (this boolan False) the amplitude for scaling is computed from the full waveform. When True the window argument must contain a valid TimeWindow that spans a time range smaller than the data range. In that situation if the window is inconsistent with the data range the return will be marked dead and messages will be found posted in elog. For ensembles all or only portion of the group will be killed if this happens. Note this parameter is also ignored when scale_by_section is true.
window – mspass TimeWindow object defining the time range over which the amplitude for scaling is to be computed. (see the compute_from_window parameter description)
method – string defining the gain method to use. Currently supported method values are: peak, RMS (rms accepted), perc, and MAD (also accepts mad or Mad). Peak uses the largest amplitude for scaling. For 3C data that means vector amplitude while for scalar data it is the largest absolute value. rms is the standard rms measure, although for 3C data is is rms vector amplitudes so scaling is by the number of vectors not the total number of samples (3 times number of vectors). perc uses a sorted percentage clip level metric as used in seismic unix. mad is a variant where the value returned is the median absolute deviation (mad) that is actual the same as perc=1/2. Default is peak. WARNING: if an invalid value for method is passed the data will be returned unaltered with a complaint message issue for very datum (indivually or in ensembles) run that way.
level – For all but perc this defines the scale to which the data are scaled. For perc it is used to set the percent clip level. If the value passed is illegal (0 or negative for most methods while perc must also be positive but less or equal 1) a complain message will be posted to elog and the level adjusted to 1.0.
scale_by_section – is a boolean that controls the scaling behavior on ensembles only (It is silently ignored for atomic TimeSeries and Seismogram data). When true a single gain factor is applied to all members of an ensemble. When false each member is gained individually as if this function were applied in a loop to each member.
use_mean – boolean used only for ensembles and when scale_by_section is True. The algorithm used in that case has an option to use the mean log amplitude for scaling the section instead of the default median amplitude.

Returns:

Data scaled to specified level. Note the scaling always preserves absolute amplitude by adjusting the value of the calib attribute of the return so calib*data is the same value before and after the scaling.

Return type:

same as input