(Seems there is no tag for clustMixType, tag suggestions welcome)
I am attempting to use library clustMixType to make some clusters.
library(tidyverse)
library(clustMixType)
# no scaling or real data prep here, just reproducing an issue with minimal code
my_diamonds <- diamonds %>%
mutate(is_color_g = factor(ifelse(color == 'G', 1, 0))) %>%
select(cut, carat, is_color_g, depth, table, price) %>%
group_by(cut) %>%
nest %>%
mutate(k = 3)
my_diamonds <- my_diamonds %>%
mutate(mod.kproto = map2(data, k, ~kproto(.x, k = .y, lambda = NULL, iter.max = 100, nstart = 1, na.rm = 'no')))
This results in a list column with a cluster model for each of cut:
my_diamonds
# A tibble: 5 × 4
# Groups: cut [5]
cut data k mod.kproto
<ord> <list> <dbl> <list>
1 Ideal <tibble [21,551 × 5]> 3 <kproto>
2 Premium <tibble [13,791 × 5]> 3 <kproto>
3 Good <tibble [4,906 × 5]> 3 <kproto>
4 Very Good <tibble [12,082 × 5]> 3 <kproto>
5 Fair <tibble [1,610 × 5]> 3 <kproto>
According to the library docs (pdf) we can use predict to assing newdata to the nearest cluster.
Under predict.kproto there is an example: predicted.clusters <- predict(kpres, x) where x is new data. I gave it a try:
my_diamonds <- my_diamonds %>%
+ mutate(preds = map2(data, mod.kproto, ~predict(.y, .x)))
Error in `mutate()`:
! Problem while computing `preds = map2(data, mod.kproto, ~predict(.y, .x))`.
ℹ The error occurred in group 1: cut = Fair.
Caused by error in `x[, j] != rep(protos[i, j], nrows)`:
! comparison of these types is not implemented
Run `rlang::last_error()` to see where the error occurred.
Warning message:
Problem while computing `preds = map2(data, mod.kproto, ~predict(.y, .x))`.
ℹ Incompatible methods ("Ops.data.frame", "Ops.factor") for "!="
ℹ The warning occurred in group 1: cut = Fair.
Why am I getting this error and how can I overcome it to use clustMixType's predict function to assign clusters to newdata?
It seems that passing
xas a standarddata.framedoes the trick:Created on 2023-09-19 by the reprex package (v2.0.1)
Update / Deep Dive
I did a bit more debugging and noticed that the error arises from this line:
What happens is here that we subset
xwith[, j]wherejis equal to the result ofwhich(catvars)which returnsin your case.
The error arises because of the different ways that
base::data.frame()andtibble::tibble()handle one-dimensional results of subsetting operations. As taken from this answer:[.data.framewill drop the dimensions if the result has only 1 column, similar to how matrix subsetting works. So the result is a vector.[.tbl_dfwill never drop dimensions like this; it always returns a tbl.See for yourself:
Created on 2023-09-19 by the reprex package (v2.0.1)
This means that with
xbeing atibble, the result will be a one-columntibbleinstead of a vector, leading to the encountered syntax error.