Is redundant data an acceptable trade-off in a normalized database structure?

78 Views Asked by At

In SQL I'm considering the following problem.

I have a list of A_ids and a list of B_ids.

  • the number of unique A_ids ~ 1.000s
  • the number of unique B_ids ~ 1.000.000s

The idea is that I for each A_id have a list of B_ids, with potentially many B_ids in this list (many to many).

I could simply store them in the format

| a_id | b_ids |
| 1 | '1,2,3,4,5' |
| 2 | '1,2,4,5' |
| 3 | '1' |
| 4 | '1,2' |
| 5 | '3,4' |
| 6 | '2,3' |
...

I however read that normalization i.e. simply doing:

| a_id | b_id |
| 1 | 1 |
| 1 | 2 |
| 1 | 3 |
| 1 | 4 |
| 1 | 5 |
| 2 | 1 |
...

is better practice but I fear the impact of having a huge amount of rows (i.e. 1.000.000.000+)

I understand the drawbacks with either but what is the better tradeoff?

1

There are 1 best solutions below

0
NickW On BEST ANSWER

Normalisation is the route to follow

  1. For a modern DBMS, that’s not a particularly large number of rows
  2. As you would index the table appropriately, you would only access the rows in the table actually used by any query rather than do a full table scan (unless your query requires a full table scan)