test (a table with columns: user_id, item_id, rating, with 6.2M rows)
als = ALS(userCol="user_id",
itemCol="item_id",
ratingCol="rating",
coldStartStrategy="drop",
implicitPrefs=True)
model = als.fit(train)
predictions = model.transform(test)
predictions (a table with columns: user_id, item_id, rating, prediction, but with only 1.7M rows)
Why did model.transform(test) drop rest of the rows? It should have been able to calculate prediction score for all user_id, item_id combination, right?
Is it because I have used coldStartStrategy="drop"?
- But if there is a rating calculated for all
user_id,item_idcombinations intest, no row should be dropped, yes?
It's because I have used the
coldStartStrategy="drop"option only. It's dropping rows corresponding to users and items which had no interactions corresponding to them in training data.