Scala Spark Collaborative Filter

49 Views Asked by NotNow At 23 February 2024 at 18:37

I'm trying to implement collaborative filtering in Scala and Spark for a personal project. I'm using this dataset: [https://www.kaggle.com/datasets/antonkozyriev/game-recommendations-on-steam/data][1], a large set containing games, users, reviews, etc.

What I would like to do is create a simple filter that, given a user id taken as input, provides N similar users as output based on 3 columns of the dataset (user_id, app_id, hours_played). I've tried an approach using ALS model from the MLlib library, but I'm only able to get recommendations of a game for a user, and not users similar to a specific user.

This is the code I've tried so far, can anyone help me out?

/* Load data */
val rawData = spark.read
  .option("header", "true")
  .option("inferSchema", "true")
  .csv(csvFilePath)

val limitedData = rawData.limit(200000)

/* Pre-processing */
val userIndexer = new StringIndexer()
  .setInputCol("user_id")
  .setOutputCol("user_id_indexed")
val userIndexedData = userIndexer.fit(limitedData).transform(limitedData)

val appIndexer = new StringIndexer()
  .setInputCol("app_id")
  .setOutputCol("app_id_indexed")
val data = appIndexer.fit(userIndexedData).transform(userIndexedData)

/* training ALS */
val als = new ALS()
  .setUserCol("user_id_indexed")
  .setItemCol("app_id_indexed")
  .setRatingCol("hours")
  .setRank(10)
  .setMaxIter(10)
  .setRegParam(0.1)
  .setImplicitPrefs(true)

val model = als.fit(data)

/* Generate 5 recommendations */
import spark.implicits._
val userId = 0
val userSubset = Seq(userId).toDF("user_id_indexed") // Create the DataFrame
val recommendations = model.recommendForUserSubset(userSubset, 5)
recommendations.show()
spark.stop()

Finally, to have a clearer idea of what I want to achieve I leave a small snippet of python code that exactly implements my idea for this dataset.

user_ids = recommendations_df['user_id'].astype('category').cat.codes
item_ids = recommendations_df['app_id'].astype('category').cat.codes

# Get the unique user and game ids
unique_user_ids = recommendations_df['user_id'].astype('category').cat.categories
unique_item_ids = recommendations_df['app_id'].astype('category').cat.categories

# create a sparse matrix
user_game_matrix = coo_matrix((recommendations_df['hours'], (user_ids, item_ids)))

# Fit the model
model_knn = NearestNeighbors(metric='cosine', algorithm='brute')
model_knn.fit(user_game_matrix)

# Get top 5 recommendations for first user
distances, indices = model_knn.kneighbors(user_game_matrix.getrow(0), n_neighbors=6)
recommended_users = [unique_user_ids[i] for i in indices.flatten()[1:]]
print(f'Recommended users for the first user are: {recommended_users}')```


Output: 
Recommended users for the first user are: [3123620, 5031804, 1543163, 2829043, 1943227]

Original Q&A

Scala Spark Collaborative Filter

There are 0 best solutions below

Related Questions in SCALA

Related Questions in APACHE-SPARK

Related Questions in APACHE-SPARK-MLLIB

Related Questions in RECOMMENDATION-ENGINE

Related Questions in COLLABORATIVE-FILTERING

Trending Questions

Popular # Hahtags

Popular Questions