Trivial question about max number of distinct values in a B-tree index

114 Views Asked by At

I am trying to learn about indexing. I looked at the actual indexes used in the database at work.

I looked in to two random indexes:

SELECT
INDEX_NAME, INDEX_TYPE, LEAF_BLOCKS, DISTINCT_KEYS 
FROM ALL_INDEXES
WHERE TABLE_NAME = 'TRANS'
AND INDEX_NAME IN ('TRANS_PK','TRANS_ORD_NO')

This gives:

INDEX_NAME  | INDEX_TYPE | LEAF_BLOCKS | DISTINCT_KEYS |
TRANS_PK    | NORMAL     | 13981       |    3718619    | 
TRANS_ORD_NO| NORMAL     | 17052       |    43904      |

This is what makes no sense to me; shouldn't distinct(column_name) from the actual table yield the same number? It doesn't!

SELECT COUNT(DISTINCT ORD_NO) FROM trans

..gives 20273

TRANS_PK is an index for a column called NO.

SELECT COUNT(distinct NO) FROM trans

... gives 4 328 622

What am I not getting here? The select distinct should result in the same number as the "distinct keys" - column in the ALL_INDEXES- table?

3

There are 3 best solutions below

3
Barbaros Özhan On BEST ANSWER

Dictionary views ( including ALL_INDEXES ) don't have real-time data, but the new statistics values are refreshed during the analyze time, and become stale as the time passes.

0
Oly On

Check also here: https://learn.microsoft.com/en-us/sql/relational-databases/sql-server-index-design-guide?view=sql-server-2017 "Each index row contains a key value and a pointer to either an intermediate level page in the B-tree, or a data row in the leaf level of the index." - so on top of data pointers you also have many cross-level leafs.

0
cdub On

Index statistics can be gathered using the DBMS_STATS.GATHER_INDEX_STATS() procedure. The example below shows how the DISTINCT_KEYS value is updated after rows are inserted and index statistics are gathered. Note that Oracle samples a percentage of the data in the index by default, so the DISTINCT_KEYS may not reflect the actual number of distinct keys in the index. See the estimate_percent parameter in the link below.

https://docs.oracle.com/database/121/ARPLS/d_stats.htm#ARPLS68575

SQL> CREATE TABLE foo AS SELECT LEVEL id FROM dual CONNECT BY LEVEL <= 5;

Table FOO created.

SQL> CREATE INDEX foo_idx ON foo(id);

Index FOO_IDX created.

SQL> SELECT * FROM foo;

        ID
----------
         1
         2
         3
         4
         5

SQL> SELECT distinct_keys FROM all_indexes WHERE table_name = 'FOO';

DISTINCT_KEYS
-------------
            5

SQL> INSERT INTO foo VALUES(6);

1 row inserted.

SQL> INSERT INTO foo VALUES(7);

1 row inserted.

SQL> COMMIT;

Commit complete.

SQL> 
SQL> SELECT distinct_keys FROM all_indexes WHERE table_name = 'FOO';

DISTINCT_KEYS
-------------
            5

-- Here's where we gather the statistics
SQL> EXECUTE DBMS_STATS.GATHER_INDEX_STATS('MYSCHEMA', 'FOO_IDX');

PL/SQL procedure successfully completed.

SQL> SELECT distinct_keys FROM all_indexes WHERE table_name = 'FOO';

DISTINCT_KEYS
-------------
            7