Error in `step_log()`: When trying to make predictions with my model

171 Views Asked by At

I'm trying to make predictions with my testing data using my finalized workflow. But whenever I try using the predict function, it gives me this error:

Error in `step_log()`:
! The following required column is missing from `new_data` in step 'log_79Q8u': shares.

The shares variable is present in my testing dataset.

Do I need to cahnge my recipe and retune my model?? This is for my final and I really need to resolve this error would appreciate any advice!!

My code for the recipe and the prediction is below:

# recipe 
recipe_kc <- recipe(shares ~ ., data = articles_train) %>% 
  step_log(shares) %>% 
step_normalize(all_numeric_predictors()) %>%  
  step_zv(all_predictors()) 

# selecting best model
best_workflow <- bt_tuned %>% 
  extract_workflow_set_result("recipe3_bt") %>% 
  select_best(metric = "rmse", "rsq")

best_workflow

final_workflow <- bt_tuned %>% 
  extract_workflow("recipe3_bt") %>% 
  finalize_workflow(best_workflow)


final_fit <- fit(final_workflow, articles_train)


# using testing data
final_pred <- articles_test %>% 
  select(shares) %>% 
  bind_cols(predict(final_fit, new_data = articles_test)) %>% 
  mutate(
    .pred_log = .pred,
    .pred = 10^.pred_log
  ) %>% 
  summarize(.pred, shares, shares_log,.pred_log) 
1

There are 1 best solutions below

0
EmilHvitfeldt On

You are getting a problem because you are transforming the outcome inside the recipe. It is generally advised that you don't perform simple transformations on the outcome inside the recipe and it can cause problems as you have seen.

Instead I recommend that you do the transformation before you split your data, this way you won't run into problem when the outcome isn't available to transform

set.seed(3467) 
articles_split <- article %>%
  # Log the outcome
  mutate(shares = log(shares)) %>%
  initial_split()

articles_train <- training(articles_split)