Sequential Feature Selection "Feature names seen at fit time, yet now missing"

125 Views Asked by At

I get a

Feature names seen at fit time, yet now missing

error when predicting from X_test with the subset of features selected by the sklearn SFS:

model_for_sfs = LogisticRegression(solver="saga")
model = LogisticRegression(solver="saga")

pipeline_for_fs = Pipeline(steps=[
        ('imputer', SimpleImputer(strategy="median")),
        ("model",model_for_sfs)])

n_splits = 2 
cv_fs = StratifiedKFold(n_splits, shuffle=True, random_state=0)
cv_perf = StratifiedKFold(n_splits, shuffle=True, random_state=0)

# Feature selection
fs = SFS(
  estimator=pipeline_for_fs,
  n_features_to_select=2,
  cv=cv_fs,
  scoring='accuracy', 
  n_jobs=-1
)
pipeline = Pipeline(steps=[
  ('imputer', SimpleImputer(strategy="median")),
  ('selector', fs),
  ("model", model)])

pipeline.fit(X_train, y_train)

sfs = pipeline.named_steps["selector"]
features = sfs.get_support(indices=True)

y_pred = pipeline.predict_proba(X_test.iloc[:, list(features)])[:, 1]

I thought sklearn SFS should transform the dataset to keep only the features it has chosen. Is it not the case? Is there a way to make it do that?

0

There are 0 best solutions below