Get records that have exactly (n) records in a row continuously

99 Views Asked by At

Please consider this records:

Name Year Season
Name1 2011 1
Name1 2011 2
Name2 2012 1
Name2 2012 2
Name2 2012 3
Name2 2012 4
Name3 2012 3
Name3 2012 4
Name4 2012 1
Name4 2012 4
Name5 2012 3
Name5 2012 4
Name5 2013 1
Name5 2013 2

I want to get records that have 4 record in 4 seasons in a row. For example for above records I want to get Name2 and Name5 because Name2 has 4 records in each season in 2012 and Name5 has 4 records for 4 seasons in a row (for years 2012 and 2013). I tried to partition these records with Row_Number but it failed when we have 4 records in different years.

How can I achieve my desire result?

If I have below records for Name3:

Name Year Season
...
Name3 2014 1
Name3 2014 2
...

then Name3 wouldn't add to result because it hasn't 4 records for 4 seasons continuously.

2

There are 2 best solutions below

3
Charlieface On BEST ANSWER

Assuming there are no duplicates:

  • You can assign a code to each row by multiplying the Year by 4 then adding the Season.
  • Then get the row 3 previous to this using LAG.
  • Then aggregate, and ensure using a HAVING clause that there is at least one row where the row 3 previous to it is exactly 3 seasons earlier.
SELECT
  Name
FROM (
    SELECT *,
      SeasonCode = Year * 4 + Season,
      Prev4SeasonCode = LAG(Year * 4 + Season, 3) OVER (PARTITION BY Name ORDER BY Year, Season)
    FROM YourTable t
) t
GROUP BY
  Name
HAVING COUNT(CASE WHEN Prev4SeasonCode = SeasonCode - 3 THEN 1 END) > 0;

db<>fiddle

Instead of using a SeasonCode, you could also do multiple conditions on Year and Season.

3
siggemannen On

You can solve this problem by using a technique called gaps and island:

SELECT  Name, groupFlag
,   MIN(Year) AS StartYear
,   MIN(Year * 10 + Season) % 10 AS StartSeason
,   MAX(Year) AS EndYear
,   MAX(Year * 10 + Season) % 10 AS EndSeason
FROM    (
    select  SUM(flag) OVER(PARTITION BY Name ORDER BY Year, Season ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS groupFlag
    ,   *
    from (
        select  case when Season = prevseason % 4 + 1 and year =  prevYear + case when prevseason = 4 then 1 else 0 end then 0 else 1 end as flag
        ,   *
        FROM    (
            select  *
            ,   lag(year) over(partition by name order by year, season) as prevYear
            ,   lag(Season) over(partition by name order by year, season) as prevSeason
            from (
                VALUES  (N'Name1', 2011, 1)
                ,   (N'Name1', 2011, 2)
                ,   (N'Name2', 2012, 1)
                ,   (N'Name2', 2012, 2)
                ,   (N'Name2', 2012, 3)
                ,   (N'Name2', 2012, 4)
                ,   (N'Name3', 2012, 3)
                ,   (N'Name3', 2012, 4)
                ,   (N'Name4', 2012, 1)
                ,   (N'Name4', 2012, 4)
                ,   (N'Name5', 2012, 3)
                ,   (N'Name5', 2012, 4)
                ,   (N'Name5', 2013, 1)
                ,   (N'Name5', 2013, 2)
            ) t (Name,Year,Season)
            ) prev
        ) flag
    ) grp
GROUP BY Name, groupFlag
HAVING COUNT(*) >= 4

The way i do it is to get previous row data with help of LAG, then compare if previous YearSeason + 1 matches the current YearSeason, and set a flag being 0 and 1. For matches i use 0, for mismatches i set 1.

Then by doing an aggregated SUM OVER Partition, one calculates sum of those flags. This creates islands of matching group rows.

Finally, you can do the actual logic by counting each Name and GroupFlag-group and make sure it's >= 4.

Output:

Name groupFlag StartYear StartSeason EndYear EndSeason
Name2 1 2012 1 2012 4
Name5 1 2012 3 2013 2