I am challenged by some implementation and I would love to hear from some MongoDB experts on my problem.
I have an app on which profiles can follow each others. Now I would like my users to get some insights about the evolution of the followers count for a profile over the time and display a chart on the profile's account page for let's say the last 30 days.
I am using Ruby on Rails, and so for the followers part I am using an ActiveRecord model for that with a Postgresql to store the records.
class Follow < ApplicationRecord
# The user giving the follow
belongs_to :follower, foreign_key: :follower_id, class_name: "Profile"
# The user being followed
belongs_to :followed, foreign_key: :followed_id, class_name: "Profile"
# ...
end
For the analytics part, I thought about using MongoDB and the Time Series feature to track daily the count of followers for a profile.
Initially, my approach is this:
I have this Document called FollowersActivity containing a timestamp, a count and metadata containing the profile_id related to the profile being followed. When a profile is following another profile, I create a Follow record in the postgres DB, then in the Follow after_create callback, I enqueue a job to create a FollowersActivity document.
I choose to use a job to that the follow request is not blocked for some more time by this.
I pass as arguments to the job the timestamp, the follow.id and a factor number (either -1 for follower removed or 1 for follower added).
class FollowersActivity
include Mongoid::Document
field :timestamp, type: DateTime
field :metadata, type: :hash
field :count, type: Integer
# ...
end
after_create :record_add_follower
before_destroy :record_remove_follower
def record_add_follower
# ... notification job etc
ActivitiesJob
.set(wait: 10.seconds)
.perform_later(
ActivitiesJob::Types::FOLLOWERS,
{
timestamp: self.created_at,
factor: 1,
follow_id: self.id,
}
)
end
def record_remove_follower
ActivitiesJob
.set(wait: 10.seconds)
.perform_later(
ActivitiesJob::Types::FOLLOWERS,
{
timestamp: self.created_at,
factor: -1,
follow_id: self.id,
}
)
end
So what I initially thought to do when the job will run is something as follow: -> retrieve the Follow record from its id.
-> retrieve the followed_id from the Follow record
-> Look for the last FollowersActivity document for that Profile id
I would get the last count
-> use the factor to either increment or decrement the count.
-> add new document with the new count for the timestamp.
profile_id = params[:profile_id]
profile = Profile.find_by(id: profile_id)
if profile.nil?
return
end
factor = params[:factor]
last_activity = FollowersActivity
.where(
{
"metadata.profile_id" => profile.id
}
)
.order_by("timestamp desc")
.first
count = 0
if last_activity.nil?
count = factor
else
count = last_activity.count + factor
end
timestamp = params[:timestamp]
FollowersActivity.create(
{
timestamp: timestamp,
count: count,
metadata: {
profile_id: profile.id
}
}
)
Finally, to pull up the data from followers activity, I would use the aggregate pipeline to format the data as following:
collection.aggregate([
{
$match: {
'metadata.profile_id': profileId,
timestamp: {
$gte: monthAgo,
},
},
},
{
$densify: {
field: 'timestamp',
range: {
step: 1,
unit: 'day',
bounds: [monthAgo, new Date()]
},
},
},
{
$fill: {
output: {
count: { method: 'locf' },
},
},
},
{
$sort: {
timestamp: -1,
},
},
{
$project: {
date: {
$dateToParts: {
date: '$timestamp',
},
},
count: 1,
},
},
{
$group: {
_id: {
date: {
year: '$date.year',
month: '$date.month',
day: '$date.day',
},
},
count: {
$last: '$count',
},
},
},
{
$sort: {
'_id.date.year': 1,
'_id.date.month': 1,
'_id.date.day': 1,
},
},
]);
First, I would like your opinion on the approach and it this makes sense for my purpose. I just don't want to perform too much Follow.count to lighten the DB activity.
Second, while testing this setup, using the rails db:seed, it looks like there is some issue with write ack as even though I made sure that for this queue the jobs are executed one after each other, it looks like when they run and that I read the last created FollowersActivity document to get the previous count on which I will apply the factor -1 or +1, it seems the document is not yet saved...
Looks like a race condition, which result in wrong count at the end.
Any idea if the FollowersActivity.create is asynchronous, which would explain which when the next job in queue is trying to read the last created document, its not there yet so the last count is not the right one.