mr#

swordfish.function.mr()#

The Map-Reduce function is the core function of DolphinDB’s generic distributed computing framework.

Parameters:
  • ds (Constant) – The list of data sources. This required parameter must be a tuple and each element of the tuple is a data source object. Even if there is only one data source, we still need a tuple to wrap the data source.

  • mapFunc (Constant) – The map function. It accepts one and only one argument, which is the materialized data entity from a data source. If we would like the map function to accept more parameters in addition to the materialized data source, we can use a PartialApplication to convert a multiple-parameter function to a unary function. The number of map function calls is the same as the number of data sources. The map function returns a regular object (scalar, pair, array, matrix, table, set, or dictionary) or a tuple (containing multiple regular objects).

  • reduceFunc (Constant, optional) – The binary reduce function that combines two map function call results. The reduce function in most cases is trivial. An example is the addition function. The reduce function is optional. If the reduce function is not specified, the system returns all individual map call results to the final function.

  • finalFunc (Constant, optional) – The final function which accepts one and only one parameter. The output of the last reduce function call is the input of the final function. If it is not specified, the system returns the individual map function call results.

  • parallel (Constant, optional) – A boolean flag indicating whether to execute the map function in parallel locally. The default value is true, i.e., enabling parallel computing. When there is very limited available memory and each map call needs a large amount of memory, we can disable parallel computing to prevent the out-of-memory problem. We may also want to disable the parallel option in other scenarios. For example, we may need to disable the parallel option to prevent multiple threads from writing to the same partition simultaneously.