SQL query - getting distinct result from combined tables

51 Views Asked by At

I am trying to compose an SQL query resulting a distint combination of two tables, but without success.

DB example:

Need a result set where the Questions.Text and related Options.Text are unique when combined without duplications, but have to make sure if the subset of options with a different question works standalone, not to miss it from the results.

I tried using Distinct and Group by on the joined tables in multiple options, but none were successful. (I'm not an SQL expert unfortunately.)

1

There are 1 best solutions below

0
Rob Eyre On

First, prepare a unique 'signature' for each set of options that goes with each question. There are different algorithms you could use, here I'm calculating the CRC32 value for each option and then XORing them together:

    SELECT
        questions.id,
        questions.Text,
        BIT_XOR(CRC32(options.Text)) AS sig
    FROM questions
    INNER JOIN options ON questions.ID = options.QuestionID
    GROUP BY questions.id

Next we want to get the groupwise minimum for this signature. Again there are different approaches (see https://dev.mysql.com/doc/refman/8.0/en/example-maximum-column-group-row.html). Here I'm using a window function to calculate ranks per partition:

WITH
questionsigs AS (
    SELECT
        questions.id,
        questions.Text,
        BIT_XOR(CRC32(options.Text)) AS sig
    FROM questions
    INNER JOIN options ON questions.ID = options.QuestionID
    GROUP BY questions.id
)
SELECT id, RANK() OVER (PARTITION BY sig, Text ORDER BY id) AS rn
FROM questionsigs

Finally, enforce the constraint that we only want the first question with a given set of options, and retrieve the data we want:

WITH
questionsigs AS (
    SELECT
        questions.id,
        questions.Text,
        BIT_XOR(CRC32(options.Text)) AS sig
    FROM questions
    INNER JOIN options ON questions.ID = options.QuestionID
    GROUP BY questions.id
),
questionranks AS (
    SELECT id, RANK() OVER (PARTITION BY sig, Text ORDER BY id) AS rn
    FROM questionsigs
)
SELECT questions.Text AS questionText, options.Text as optionText
FROM questionranks
INNER JOIN questions ON questionranks.id = questions.id
INNER JOIN options ON questions.id = options.QuestionID
WHERE rn = 1
ORDER BY questions.id;

Produces

+--------------+------------+
| questionText | optionText |
+--------------+------------+
| A            | x          |
| A            | y          |
| A            | x          |
| A            | y          |
| A            | z          |
| B            | a          |
| B            | b          |
| B            | c          |
| B            | z          |
+--------------+------------+