Steps to Reduce RMSE Score in Surprise KNN Predictions

76 Views Asked by At

I have attempted to build a recommendation system using the Surprise library and the k-Nearest Neighbors (KNN) algorithm. The primary challenge I've encountered is the very high RMSE (Root Mean Square Error) score, which currently stands at RMSE: 3179.9423.

The data I'm working with is an imputed user-item matrix where the ratings are derived from customer interactions using the formula: IR_iu = 100 * Buy + 50 * Added to Favorite + 15 * Interacted with the Item

In this formula, (IR_{iu} ) represents the imputed rating for user ( u ) on item ( i ). The interactions are weighted, with a higher score assigned to purchasing (Buy), a medium score for adding to favorites, and a lower score for general interaction with the item.

My expectation is to uncover a more effective approach to diminish the RMSE score and enhance the precision of my predictions. This consideration takes into account the unique characteristics of the imputed user-item matrix shaped by customer interactions. Additionally, I am open to exploring alternative algorithms that may better suit my problem. It's worth noting that my experience in this field is limited, and this marks my initial attempt at constructing a recommendation system without the guidance of seasoned mentors. As a result, I have undertaken a trial-and-error approach to navigate through the challenges

df = pd.read_excel("D:\SELECT\CustomerRatings.xlsx")
df.replace(np.nan, 0, inplace = True)



# Separate the first column (user IDs) from the rest of the data
user_ids = df.iloc[:, 0]
data_without_user_ids = df.iloc[:, 1:]

# Initialize the MinMaxScaler
scaler = MinMaxScaler(feature_range=(0, 10))

# Normalize each row of the data without user IDs
normalized_data = pd.DataFrame(scaler.fit_transform(data_without_user_ids.T).T, columns=data_without_user_ids.columns)

# Combine the user IDs and the normalized data into a new DataFrame
normalized_df = pd.concat([user_ids, normalized_data], axis=1)

# 'normalized_df' now contains the user IDs and the scaled data
# Reset the index and melt the DataFrame to long format
user_item_matrix = normalized_df.reset_index()
melted_data = pd.melt(user_item_matrix, id_vars=['UserID'], var_name='item', value_name='rating')
reader = Reader(rating_scale=(0, 10))

data = Dataset.load_from_df(melted_data, reader)
from surprise import KNNBasic
from surprise.model_selection import train_test_split

# Split the dataset into training and testing sets
trainset, testset = train_test_split(data, test_size=0.3)


sim_options = {
    "name": "cosine",
    "user_based": False,  # compute  similarities between items
}
algo = KNNBasic(sim_options=sim_options)
algo.fit(trainset)
predictions = algo.test(testset)
accuracy.rmse(predictions)
0

There are 0 best solutions below