How can we Fix a PipeOp's $state, so that its parameters or config are fixed from the beginning and remain the same in both training and prediction.
task = tsk("iris")
pos1 = po("scale", param_vals =list(
center = T,
scale = T,
affect_columns = selector_name("Sepal.Width")))
pos1$state
pos1$state$center <- c(Sepal.Width = 0)
pos1$state$scale <- c(Sepal.Width = 2)
graph <- pos1 %>>% lrn("classif.xgboost", eval_metric = "mlogloss")
gl <- GraphLearner$new(graph)
gl$train(task)
gl$state
In the code above, the parameters center and scale from po("scale") are recalculated based on the data even when I try to fix them as zero and two (not sure whether I did this correctly), respectively.
A
PipeOp's$stateshould never be manually changed. I.e., it is more like a logging slot for you to inspect and where thePipeOpfinds all the information it needs to carry out its prediction step after being trained.PipeOpScalewill always scale the training data to mean 0 and scales them by their root-mean-square (see?scale) and stores the "learned" parameters (i.e., mean and root-mean-square of the training data, e.g., the attributes returned by thescalefunction) as the$state. During prediction, the data will be transformed analogously resulting in a probably different mean and root-mean-square.Assuming you want to scale
"Sepal.Width"to mean 0 and root-mean-square 2 both during training and prediction (as suggested by your code above; but this may be a bad idea), you can usePipeOpColApply: