I have the below data:
> paste(data_s)
[1] "c(0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)"
[2] "c(34, 34, 35, 35, 35, 34, 6, 34, 34, 6, 34, 34, 34, 6, 6, 6, 34, 34, 35, 6, 34, 34, 34, 34, 34, 34, 34, 34, 6, 34, 35, 35, 34, 34, 6, 34, 34, 34, 34, 6, 6, 35, 34, 34, 34, 35, 6, 35, 34, 34, 34, 34, 34, 34, 6, 34, 34, 6, 34, 34, 34, 6, 34, 34, 34, 34, 6, 34, 34, 34, 35, 6, 35, 34, 34, 35, 34, 6, 6, 35, 34, 34, 6, 34, 6, 6, 34, 34, 6, 34, 6, 35, 34, 6, 34, 35, 34, 6, 34, 34)"
[3] "c(1, 1, 4, 0, 3, 4, 5, 2, 4, 1, 2, 1, 4, 9, 9, 1, 1, 5, 1, 4, 4, 2, 3, 2, 3, 2, 1, 2, 5, 6, 5, 5, 5, 1, 5, 5, 2, 1, 1, 3, 4, 2, 9, 1, 4, 3, 2, 5, 2, 2, 3, 4, 4, 5, 5, 4, 1, 2, 0, 3, 4, 2, 2, 5, 0, 2, 5, 3, 3, 1, 0, 1, 4, 2, 5, 1, 1, 4, 2, 3, 5, 1, 5, 0, 2, 4, 1, 5, 4, 2, 2, 4, 5, 1, 2, 2, 0, 3, 7, 3)"
> str(data_s)
tibble [100 × 3] (S3: tbl_df/tbl/data.frame)
$ y : num [1:100] 0 0 0 0 0 0 0 0 1 0 ...
$ x1: num [1:100] 34 34 35 35 35 34 6 34 34 6 ...
$ x2: num [1:100] 1 1 4 0 3 4 5 2 4 1 ...
- attr(*, "na.action")= 'omit' Named int [1:197659] 4 5 6 7 9 14 19 20 24 27 ...
..- attr(*, "names")= chr [1:197659] "4" "5" "6" "7" ...
I am using vivi function using vivid package to explore the feature importance of variables.
I write the below code:
library("vivid")
library("dplyr")
library("xgboost")
y=data_s["y"]
x=data_s[,c("x1","x2")]
gbst <- xgboost(data = as.matrix(x),
label = as.matrix(y),
nrounds = 600)
pFun <- function(fit, data, ...) predict(fit, as.matrix(x))
viviGBst <- vivi(fit = gbst,
data = data_s,
response = "y",
reorder = FALSE,
normalized = FALSE,
predictFun = pFun)
But I get the below error:
Error:
! Assigned data `predict(x, data = X[, cols, drop = FALSE])` must be compatible with existing data.
✖ Existing data has 5000 rows.
✖ Assigned data has 100 rows.
ℹ Only vectors of size 1 are recycled.
Run `rlang::last_error()` to see where the error occurred.
Why do I get this error and how can I fix it?
I will be very glad for any help.
Thanks.
A bit late but hopefully this can help other users.
To work with
xgboostinvividyou need to use the term 'data' instead of the actual name of the data in the predict function. It also looks like you're not providing the full data set to thedataargument inxgboost. You are only providing the explanatory variables and omitting the response.Below is some code that should hopefully solve this issue: