What is the right way to setup the data when feeding to lightfm model for cases where I have additional implicit data on additional items/products. For example, I have 100k users x 200 items interaction data, however in real application, I want the model to provide recommendations only from 50 out of the 200 items. So how do I setup the data? I am thinking of 2 cases but I am not sure which is the right approach:
Case 1: Feed the whole matrix (100k users x 200 items) directly as interactions argument in lightfm. This way it is more collaborative learning.
Case 2: Only feed (100k users x 50 items) to interactions argument and use the (100k x 150 items) matrix as user_features. This way it's more content based learning.
Which one is correct? Also, for case 1, is there a way for the utility functions for model evaluations(precision, recall, etc) to recommend for selected items only, for example, the top k recommended items should only be taken from the 50 items and do not recommend the other items and compute the precision, recall, etc from those.
You should follow case 1. Train the model with entire interactions data. While making predictions, you can pass the index of required(50) items as a parameter to model.predict.
From the lightfm documentation, you can see that model.predict takes item ids as parameter(which will be ids of your 50 items in this case).
https://making.lyst.com/lightfm/docs/_modules/lightfm/lightfm.html#LightFM.predict
def predict(self, user_ids, item_ids, item_features=None, user_features=None, num_threads=1): """ Compute the recommendation score for user-item pairs.