piecewiseLinFit

Syntax

piecewiseLinFit(X, Y, numSegments, [XC], [YC], [bounds], [lapackDriver='gelsd'], [degree=1], [weights], [method='de'], [maxIter], [initialGuess], [seed])

Arguments

X is a numeric vector indicating the data point locations of x. NULL value is not allowed.

Y is a numeric vector indicating the data point locations of y. NULL value is not allowed.

numSegments is a positive integer indicating the desired number of line segments.

XC (optional) is a numeric vector indicating the x locations of the data points that the piecewise linear function will be forced to go through. It only takes effect when method='de'.

YC (optional) is a numeric vector indicating the y locations of the data points that the piecewise linear function will be forced to go through. It only takes effect when method='de'.

bounds (optional) is a numeric matrix of shape (numSegments-1, 2), indicating the bounds for each breakpoint location within the optimization.

lapackDriver (optional) is a string indicating which LAPACK driver is used to solve the least-squares problem. It can be 'gelsd' (default), 'gelsy' and 'gelss'.

degree (optional) is a non-negative integer indicating the degree of polynomial to use. The default is 1 for linear models. Use 0 for constant models.

weights (optional) is a numeric vector indicating the weights used in least-squares algorithms. The individual weights are typically the reciprocal of the standard deviation for each data point, where weights[i] corresponds to one over the standard deviation of the ith data point. NULL value is not allowed.

method (optional) is a string indicating the model used. It can be:

  • 'nm' (default): Nelder-Mead simplex algorithm.
  • 'bfgs': BFGS algorithm.
  • 'lbfgs': LBFGS algorithm.
  • 'slsqp': Sequential Least Squares Programming algorithm.
  • 'de': Differential Evolution algorithm.

maxIter (optional) is an integral scalar or vector indicating the maximum number of iterations for the optimization algorithm during the fitting process.

initialGuess (optional) is a numeric vector indicating the initial guess for the parameters that optimize the function. Its length is numSegments-1.

seed (optional) is an integer indicating the random number seed used in the differential evolution algorithm to ensure the reproducibility of results. It only takes effect when method='de' or initialGuess is NULL. If not specified, a non-deterministic random number generator is used.

Details

Fit a continuous piecewise linear function for a specified number of line segments. Use differential evolution to find the optimal location of breakpoints for a given number of line segments by minimizing the sum of the square error. Note: Due to the randomness of the differential evolution, the results of this function may vary slightly each time.

The fitted model can be used as an input for function pwlfPredict.

Return value: A dictionary with the following keys:

  • breaks: A floating-point vector indicating the breakpoint locations.

  • beta: A floating-point vector indicating the beta parameter for the linear fit.

  • xData: A floating-point vector indicating the input data point locations of x.

  • yData: A floating-point vector indicating the input data point locations of y.

  • XC: A floating-point vector indicating the x locations of the data points that the piecewise linear function will be forced to go through.

  • YC: A floating-point vector indicating the y locations of the data points that the piecewise linear function will be forced to go through.

  • weights: A floating-point vector indicating the weights used in least-squares algorithms.

  • degree: A non-negative integer indicating the degree of polynomial.

  • lapackDriver: A string indicating the LAPACK driver used to solve the least-squares problem.

  • numParameters: An integer indicating the number of parameters.

  • predict: The function used for prediction. The method is called by model.predict(X, [beta], [breaks]). See pwlfPredict.

  • modelName: A string "Piecewise Linear Regression" indicating the model name.

Examples

def linspace(start, end, num, endpoint=true){
	if(endpoint) return end$DOUBLE\(num-1), start + end$DOUBLE\(num-1)*0..(num-1)
	else return start + end$DOUBLE\(num-1)*0..(num-1)	
}
X = linspace(0.0, 1.0, 10)[1]
Y = [0.41703981, 0.80028691, 0.12593987, 0.58373723, 0.77572962, 0.41156172, 0.72300284, 0.32559528, 0.21812564, 0.41776427]
model = piecewiseLinFit(X, Y, 3)
model;

Output:

breaks->[0.0,0.258454644769,0.366954310101,1.000000000000]
numParameters->4
degree->1
xData->[0.0,0.111111111111,0.222222222222,0.333333333333,0.444444444444,0.555555555555,0.666666666666,0.777777777777,0.888888888888,1.000000000000]
predict->pwlfPredict
yData->[0.417039810000,0.800286910000,0.125939870000,0.583737230000,0.775729620000,0.411561720000,0.723002840000,0.325595280000,0.218125640000,0.417764270000]
yC->
xC->
weights->
beta->[0.593305500750,-1.309949743583,5.703647584013,-5.105351630664]
lapackDriver->gelsd

piecewiseLinFit can be used with pwlfPredict for predication based on the model:

xHat = linspace(0.0, 1.0, 20)[1]
model.predict(xHat)

// Output: [0.593305499919518 0.524360777381737 0.455416054843957 0.386471332306177 0.317526609768396 0.368043438179296 0.529813781212159 0.691584124245021 0.69295837868457  0.655502915538459 0.618047452392347 0.580591989246236 0.543136526100125 0.505681062954014 0.468225599807903 0.430770136661792 0.393314673515681 0.35585921036957  0.318403747223459 0.280948284077348]

Related function: pwlfPredict