Calculate max ‘win streak' over sequential documents on Mongo DB without using $accumulator and custom JS functions

82 Views Asked by At

Lets say I have a collection of documents like:

[
   { "createdAt": 1688383980100, "win": true },
   { "createdAt": 1688383980200, "win": false },
   { "createdAt": 1688383980300, "win": true },
   { "createdAt": 1688383980400, "win": true },
   { "createdAt": 1688383980500, "win": false }
]

How can I get the longest/max win streak? I managed to get it via $group and $accumulator with custom JS functions. But Digital Ocean does not allow JS on their Mongo DB instances.

So I spent almost a day trying many pipeline alternatives, including many combinations of aggregators like $group, $project, $setWindowFields, with $sum, $max, $cond, $multiply, etc.

Read a lot of articles, SO questions, searched the docs, talked a lot to chatGPT, no success. Any tips or pointers are super welcome! Thanks in advance!

1

There are 1 best solutions below

2
nimrod serok On

If you don't want to group your entire collection (which is not recommended), you can use setWindowFields:

  1. Use setWindowFields to mark documents which ends a win streak (so we can filter out all win: false documents)
  2. $match to keep only "winning" documents, now that we have our mark
  3. Format the documents
  4. Add index to all "winning" documents
  5. $match to keep only documents which ends a win streak, now that we have our index
  6. Use setWindowFields to insert the index for the previous streak end
  7. Format the document to calculate the win-streak size (since previous streak end index)
  8. $sort and $limit the find the maximal value
db.collection.aggregate([
  {$setWindowFields: {
      sortBy: {createdAt: 1},
      output: {isStreakEnd: {
          $push: "$win",
          window: {documents: [0, 1]}
      }}
  }},
  {$match: {win: true}},
  {$project: {
      win: 1,
      isStreakEnd: {$cond: [
          {$eq: [{$size: "$isStreakEnd"}, 2]},
          {$not: {$last: "$isStreakEnd"}},
          true
      ]}
  }},
  {$setWindowFields: {
      sortBy: {createdAt: 1},
      output: {index: {
          $sum: 1,
          window: {documents: ["unbounded", "current"]}
      }}
  }},
  {$match: {isStreakEnd: true}},
  {$setWindowFields: {
      sortBy: {createdAt: 1},
      output: {streakSize: {
          $push: "$index",
          window: {documents: [-1, 0]}
      }}
  }},
  {$project: {
      streakSize: {$cond: [
          {$eq: [{$size: "$streakSize"}, 2]},
          {$subtract: [
              {$last: "$streakSize"},
              {$first: "$streakSize"}
          ]},
          {$first: "$streakSize"}
      ]}
  }},
  {$sort: {streakSize: -1}},
  {$limit: 1}
])

See how it works on the playground example

This method allows to avoid grouping all documents into one large document (which has a size limit). It uses a pipeline that reduces the number of documents throughout the calculation.

If you can assume the maximal win streak is less than a reasonable number, (for example: 100), you can do it much simpler, using only one $setWindowFields step, by keeping the last 100 documents wins on each document...