Doesn't splitting the whole dataset into training set and test set result to the validation set also undergoing whatever preprocessing steps the training set went through? My understanding is that, ideally continuous features should be scaled like:
#standardization of continuous features
num_ct = ColumnTransformer([('standardize', StandardScaler(), numerical)])
X_train = num_ct.fit_transform(X_train)
X_val = num_ct.transform(X_val)
X_test = num_ct.transform(X_test)
But suppose I did:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.15, random_state = random_state)
#standardization of continuous features
num_ct = ColumnTransformer([('standardize', StandardScaler(), numerical)])
X_train = num_ct.fit_transform(X_train)
X_test = num_ct.transform(X_test)
for a neural network and used Skorch like so:
net = NeuralNet(
module=MyDNN,
...,
train_split = 0.2,
)
Doesn't this mean that I included the 20% validation set to the fit_transform I did earlier?