I have the data "li" and I want to run the algorithm FPGrowth, but I don't know how
set.seed(123)
# make fake data
li <- list()
for(i in 1:10) li[[i]] <- make.unique(letters[sample(1:26,sample(5:20,1),rep = T)])
require(sparklyr)
sc <- spark_connect(master = "local",version = "3.0.1")
df <- copy_to(sc, **....??????what should be here??????...** )
fp_growth_model <- ml_fpgrowth(df)
there is a similar answer here but it doesn't work, i get the error
sc <- spark_connect(master = "local", version = "2.3")
tb <- tibble::tibble(items=c("a b c", "a b", "c f g", "b c"))
df <- copy_to(sc, tb) %>%
mutate(items = split(items, "\\\\s+"))
Error in mutate(., items = split(items, "\\\\s+")) :
could not find function "mutate"
/// plyr::mutate
df <- copy_to(sc, tb) %>%
plyr::mutate(items = split(items, "\\\\s+"))
Error in sdf_import.default(x, sc, name, memory, repartition, overwrite, :
table tb already exists (pass overwrite = TRUE to overwrite)
/// SparkR::mutate
df <- copy_to(sc, tb) %>%
SparkR::mutate(items = split(items, "\\\\s+"))
Error in sdf_import.default(x, sc, name, memory, repartition, overwrite, :
table tb already exists (pass overwrite = TRUE to overwrite)
The code example from the mentioned answer works. You get two errors the first because
mutatewas not loaded. The second because the objecttbwas already loaded into Spark.Try running the following code from a new session:
To execute FP-growth with your dataset
li, you need to change the format. The functionml_fpgrowthrequires a SparkDataFrame with a column of lists containing the sequences. You cannot transfer an R DataFrame with lists directly to Spark. First, you create a SparkDataFrame with sequences as a String and then generate the lists withmutateandsplitfunctions.Here is the code applied to your data.
Transfer data to Spark and generate the lists:
The data is ready to be used by the model like the example above.