I have two tables with the following schema:
Table A:ColumnA, UserId,... - rest of the schema omitted for brevityTable B:ColumnB, UserId,... - rest of the schema omitted for brevity
The tables can have duplicate values between them. For e.g - Table A row (<some-columnA-value>, 1, ...) and Table B row (<some-columnB-value>, 1, ...), 1 being the UserId.
Now, I have an API which is used to fetch all the UserId values from both tables. With increasing data, I want to now use pagination for this API and would like to modify the queries accordingly. There should also not be any duplicates over the pages or within a page.
How do I achieve this? Also a requirement is that I need to use keyset pagination rather than offset pagination since offset pagination gets slower as and when the offset increases.
So far, I have thought of using indexed views since there is only 1 column that I require to fetch but since the data keeps changing quite frequently and in large volumes, the overhead of maintaining the indexed view is not optimal.
Table A:
| Column A | UserId |
|---|---|
| x | 1 |
| y | 2 |
| z | 3 |
| w | 4 |
Table B:
| Column B | UserId |
|---|---|
| a | 1 |
| b | 3 |
| c | 5 |
| d | 6 |
Result (if no page size):
| UserId |
|---|
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
| 6 |
Result (if page size 3)
Page 1
| UserId |
|---|
| 1 |
| 2 |
| 3 |
Page 2
| UserId |
|---|
| 4 |
| 5 |
| 6 |
As I mention in my canonical post on Keyset Pagination (paging by key), the base query needs to be efficient.
The key (sic) in your case, is to use a Merge Union, group up by ID, and then take the top 3. This means that the grouping is over an already sorted set, and can use a Stream Aggregation.
So first
UNION ALLthe tables together, thenGROUP BY UserIdthe final result and takeTOP (3).Final query plan is nice and neat. Note that the union is keeping the ordering correct for the later stages.
As I mention, you need to keep the previous highest ID value, and pass it in to the next query. Put it into both halves of the union.
db<>fiddle