How to update older "duplicate" records (duplicate except for the date column)

76 Views Asked by At

We have a table that contains, for this example, links to demographic questions (questionID) for each subscriber, with a date indicating when the subscriber answered a particular demographic question. In some cases, a subscriber may have answered the same question again at a later date, and we now have multiple records for the same subscriber and questionID, but with different answer dates (see sample data):

subscriberID questionID  dateAnswered            isDeleted
------------ ----------- ----------------------- ---------
100          559         2015-07-29 13:07:26.153 0
100          560         2015-07-29 13:07:26.153 0
100          561         2015-07-29 13:07:26.153 0
100          562         2015-07-29 13:07:26.153 0
100          575         2015-07-29 13:07:26.153 0
102          559         2015-07-30 15:12:46.143 0
102          564         2015-07-30 15:12:46.143 0
102          588         2015-07-30 15:12:46.143 0
102          559         2015-07-31 16:11:53.323 0
114          575         2015-08-21 11:27:14.253 0
114          588         2015-08-21 11:27:14.253 0
114          560         2015-08-21 11:27:14.253 0
114          588         2015-08-24 05:44:42.030 0
114          562         2015-08-21 11:27:14.253 0
114          575         2015-08-24 05:44:42.030 0

The app that was storing the answers should have flagged the older records as "deleted" (set isDeleted = 1) but it did not do so, and I now need to clean up the older records.

This seems like it should be simple, but it's got me stumped. How do I (a) select any records where there are duplicate subscriberID and questionIDs but with different answer dates? And (b) how do I do an update to set all but the newest records for each subscriber to have isDeleted=1?

Any help would be appreciated! I suspect a self join may be in order, but I haven't figured it out yet. Thus the question!

2

There are 2 best solutions below

2
M.Ali On BEST ANSWER
;WITH X AS 
 ( 
  SELECT ROW_NUMBER() OVER (PARTITION BY 
                            subscriberID, questionID  
                            ORDER BY dateAnswered DESC) rn 
  , * 
  FROM TableName 
 )
UPDATE X 
 SET isDeleted = 1
WHERE rn > 1
1
Richard Vivian On

The select /update below will affect all records that are not marked as deleted , except for the last record by each subscriber for each question. Just another approach.

   WITH LastAnswers AS
    (
    SELECT    subscriberID ,questionID , MAX(dateAnswered) AS LastAnsweredDate
    FROM      TableName 
    GROUP BY  subscriberID ,questionID
    )

UPDATE TableName
   SET TableName.isDeleted = 1
FROM 
  TableName

LEFT JOIN LastAnswers 
ON  TableName.subscriberID = LastAnswers.subscriberID 
AND TableName.questionID   = LastAnswers.questionID 
AND TableName.dateAnswered = LastAnswers.LastAnsweredDate

WHERE  LastAnswers.LastAnsweredDate IS NULL AND TableName.isDeleted = 0