I have seen a number of general statements on how sys.dm_db_partition_stats.row_count can produce inaccurate results due to providing objects' statistics instead of actually doing a COUNT(). However, I have never been able to find any deeper reasons behind those statements or validate the hypothesis on my Azure SQL DB.

So I would like to learn -

  1. How inaccurate this method can actually be?
  2. Why exactly the results might be skewed?
    (e.g. stats are only recalculated once per day / on specific object operation).

Any related insight is much appreciated !



Several things I was able to find out on my own -- mostly by running various queries containing sys.dm_db_partition_stats.row_count, while knowing actual row counts in each table.

Here's a final query I came up with
This gets fast and (in my case) accurate row count for each table, sorted from high count to low.

SELECT 
    (SCHEMA_NAME(A.schema_id) + '.' + A.Name) as table_name,  
    B.object_id, B.index_id, B.row_count 
FROM  
    sys.dm_db_partition_stats B 
LEFT JOIN 
    sys.objects A 
    ON A.object_id = B.object_id 
WHERE 
    SCHEMA_NAME(A.schema_id) <> 'sys' 
    AND (B.index_id = '0' OR B.index_id = '1') 
ORDER BY 
    B.row_count DESC 

First line of WHERE clause is used to exclude system tables, e.g. sys.plan_persist_wait_stats and many others.

Second line takes care of non-unique non-clustered indexes (which are objects and apparently have their own stats) -> if you don't filter them out, you get double row count for indexed tables when using GROUP BY A.schema_id, A.Name or two records with the same table_name in the query output (if you don't use GROUP BY)

1

There are 1 best solutions below

1
Leon Yue On BEST ANSWER

We're glad that you found the solution and solved it by yourself. Your new edition should be an answer. I just help you post it as answer and this can be beneficial to other community members:

Several things I was able to find out on my own -- mostly by running various queries containing sys.dm_db_partition_stats.row_count, while knowing actual row counts in each table.

Here's a final query I came up with This gets fast and (in my case) accurate row count for each table, sorted from high count to low.

SELECT 
    (SCHEMA_NAME(A.schema_id) + '.' + A.Name) as table_name,  
    B.object_id, B.index_id, B.row_count 
FROM  
    sys.dm_db_partition_stats B 
LEFT JOIN 
    sys.objects A 
    ON A.object_id = B.object_id 
WHERE 
    SCHEMA_NAME(A.schema_id) <> 'sys' 
    AND (B.index_id = '0' OR B.index_id = '1') 
ORDER BY 
    B.row_count DESC 

First line of WHERE clause is used to exclude system tables, e.g. sys.plan_persist_wait_stats and many others.

Second line takes care of non-unique non-clustered indexes (which are objects and apparently have their own stats) -> if you don't filter them out, you get double row count for indexed tables when using GROUP BY A.schema_id, A.Name or two records with the same table_name in the query output (if you don't use GROUP BY)

Thanks for your sharing again.

And thanks for @conor's commnet: "If you want to see how far off the numbers can be, I suggest you try doing user transactions, inserting a bunch of rows, then roll back the transaction."