How can I find the mismatch between two similar PROD vs SIT datasets on a summary level and dive deeper into the row for inconsistencies? Alteryx?

23 Views Asked by sais At 24 July 2023 at 15:23

I would like to automate this processes for multiple datasets. I am not sure if there exists a tool like Alteryx or any software where I can do the following.

Compare 2 similar datasets on a block aggregate level.
If mismatch exists, zoom into that block to find where the inconsistencies occur.
Get rows of data with ID where inconsistency happens. The goal is to explore on a high level because of the huge amount of data and open a more comprenhesive view of the inconsistency to point out the error easily.

Here is the data that exists in both SIT and PROD:

ID	Block	Amt A	Amt B	Amt C	....
1	X	1000	400	1000	....
2	Y	2000	500	4000	....
3	Z	3000	500	3000	....
2	Y	2000	3000	2000	....
3	Z	3000	4500	3000	....

This is my PROD data after aggregating on 'block'

Block	Counts	Sum Amt A	Sum Amt B
X	1000	40000	10000
Y	2000	45000	20000
Z	3000	50000	30000

This is my SIT data after aggregating on 'block'

Block	Counts	Sum Amt A	Sum Amt B
X	1000	40000	15000
Y	2000	47000	20000
Z	3000	50000	30000

FYI - There are multiple columns like block that could be used on aggregation. One could choose any column, get an aggregation to find inconsistencies and then explore into that particular block.

Original Q&A

There are 1 best solutions below

Xue On 26 July 2023 at 03:23

It's a typical full join, then filter process, you can do it in SQL, Python dataframe or Alteryx, pretty much anything

`
WITH
sum_prod AS
(
 SELECT
  SUM(AMT_A) SUM_AMT_A,
  COUNT(1),..
 FROM prod
 GROUP BY ...
)
,
sum_SIT AS
(
 SELECT
  SUM(AMT_A) SUM_AMT_A,
  COUNT(1),..
 FROM sit
 GROUP BY ...
)

SELECT 
*
FROM sum_prod
FULL JOIN
sum_sit
ON
sum_prod.block = sum_sit.block
WHERE
 sum_prod.Sum_Amt_A <> sum_sit.Sum_Amt_A
 OR sum_prod.Sum_Amt_A IS NULL
 OR sum_sit.Sum_Amt_A IS NULL  -- always take care of one side missing
 --add however_your_condition_to_tell_diff

`

Same principle in Alteryx, just a join and a selection filter will do

How can I find the mismatch between two similar PROD vs SIT datasets on a summary level and dive deeper into the row for inconsistencies? Alteryx?

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in TESTING

Related Questions in AUTOMATED-TESTS

Related Questions in ALTERYX

Trending Questions

Popular # Hahtags

Popular Questions