mspasspy.reduce
- mspasspy.reduce.mspass_dask_foldby(self, key='site_id')[source]
This function implements a convenient foldby method for a dask bag to generate ensembles from atomic data. The concept is to assemble ensembles of
mspasspy.ccore.seismic.TimeSeries
ormspasspy.ccore.seismic.Seismogram
objects with a common Metadata attribute using a single key. The output is the ensemble objects we call class:mspasspy.ccore.seismic.TimeSeriesEnsemble andmspasspy.ccore.seismic.SeismogramEnsemble
respectively. Note that “foldby” in this context acts a bit like a reduce operator BUT the data volume is not reduced; we just bundle groups of related data into ensembles. The outputs are always larger due to the overhead of the ensemble container. That is important as be careful with this operator as it can easily create huge ensembles that could cause a memory fault in your workflow.Note that because this is implemented as a method of the bag class the usage is a different from the map and reduce methods. arg0 is “self” which means it must be defined by the input bag.
Example:
# preceded by set of map-reduce lines to create a bag called mydata mydata = mspass_dask_foldby(mydata,key="source_id")
- Parameters:
key – The key that defines the gather. By default, it will use “site_id” to produce a common station gather.
- Returns:
dask.bag.Bag
ofmspasspy.ccore.seismic.TimeSeriesEnsemble
ormspasspy.ccore.seismic.SeismogramEnsemble
.
- mspasspy.reduce.mspass_spark_foldby(self, key='site_id')[source]
This function implements a convenient foldby method for a spark RDD to generate ensembles from atomic data. The concept is to assemble ensembles of
mspasspy.ccore.seismic.TimeSeries
ormspasspy.ccore.seismic.Seismogram
objects with a common Metadata attribute using a single key. The output is the ensemble objects we call class:mspasspy.ccore.seismic.TimeSeriesEnsemble andmspasspy.ccore.seismic.SeismogramEnsemble
respectively. Note that “foldby” in this context acts a bit like a reduce operator BUT the data volume is not reduced; we just bundle groups of related data into ensembles. The outputs are always larger due to the overhead of the ensemble container. That is important as be careful with this operator as it can easily create huge ensembles that could cause a memory fault in your workflow.Note that because this is implemented as a method of the RDD class the usage is a different from the map and reduce methods. arg0 is “self” which means it must be defined by the input RDD.
Example:
# preceded by set of map-reduce lines to create RDD mydata mydata = mspass_spark_foldby(mydata,key="source_id")
- Parameters:
key – The key that defines the gather. By default, it will use “site_id” to produce a common station gather.
- Returns:
pyspark.RDD
ofmspasspy.ccore.seismic.TimeSeriesEnsemble
ormspasspy.ccore.seismic.SeismogramEnsemble
.
- mspasspy.reduce.stack(data1, data2, object_history=False, alg_id=None, alg_name=None, dryrun=False)[source]
This function sums the data field of two mspasspy objects, the result will be stored in data1. Note it is wrapped by mspass_reduce_func_wrapper, so the history and error logs can be preserved.
- Parameters:
data1 – input data, only mspasspy data objects are accepted, i.e. TimeSeries, Seismogram, Ensemble.
data2 – input data, only mspasspy data objects are accepted, i.e. TimeSeries, Seismogram, Ensemble.
object_history – True to preserve the history. For details, refer to
mspass_reduce_func_wrapper
.alg_id – alg_id is a unique id to record the usage of this function while preserving the history. Used in the mspass_reduce_func_wrapper.
alg_name (
str
) – alg_name is the name of the func we are gonna save while preserving the history.dryrun – True for dry-run, which return “OK”. Used in the mspass_reduce_func_wrapper.
- Returns:
data1 (modified).