How do I validate data mapping between 2 data frames in pandas

363 Views Asked by Betabr At 22 June 2021 at 14:41

I am trying to validate a data mapping between two data frames for specific columns. I need to validate the following:

if values in a specific column in df1 matches the mapping in a specific column in df2.
if values in a specific column in df1 does not match the specified mapping in a specific column in df2 - a different value in df2.
if values in a specific column in df1 does not have a match in in df2.

df1 looks like this:

cp_id	cp_code
2A23	A
2A24	D
3A45	G
7A96	B
2A30	R
6A18	K

df2 looks like like:

cp_type_id	cp_type_code
2A23	8
2A24	7
3A45	3
2A44	1
6A18	8
4A08	2

The data mapping constitutes of sets of values where the combination could match any values within the set, as following:

('A','C','F','K','M') in df1 should map to (2, 8) in df2 - either 2 or 8
('B') in df1 should map to 4 in df2
('D','G','I') in df1 should map to 7 in df2
('T','U') in df1 should map to (3,5) in df2 - either 3 or 5

Note that df1 has a cp_code as R which is not mapped and that 3A45 is a mismatch. The good news is there is a unique identifier key to use.

First, I created a list for each mapping set and created a statement using merge to check for each mapping. I ended up with 3 lists and 3 statements per set, which I am not sure if this is the right way to do it.

At the end I want to combine the matches into one df that I call match, all no_matches into another df that I call no_match, and all no_mappings into another df that I call no_mapping, like the following:

Match

cp_id	cp_code	cp_type_id	cp_type_code
2A23	A	2A23	8
2A24	D	2A24	7
6A18	K	6A18	8

Mismatch

cp_id	cp_code	cp_type_id	cp_type_code
3A45	G	3A45	3

No Mapping

cp_id	cp_code	cp_type_id	cp_type_code
7A96	B	NaN	NaN
NaN	NaN	2A44	1
2A30	R	NaN	NaN
NaN	NaN	4A08	2

I am having a hard time to make the no_match to work. This is what I tried for no match:

filtered df1 based on the set 2 codes
filtered df2 based on not in map 2 codes
for the no mapping, I did a df merge with on='cp_id'

    no_mapping_set2 = df1_filtered.merge(df2_filtered, on='cp_id', indicator = True)

With the code above, for cp_id = 'B', for example, instead of getting only 1 row back, I get a lot of duplicate rows with cp_id = 'B'.

Just to state my level, I am a beginner in Python. Any help would be appreciated.

Thank you so much for your time.

Rob

Original Q&A

How do I validate data mapping between 2 data frames in pandas

There are 0 best solutions below

Related Questions in PYTHON-3.X

Related Questions in PANDAS

Related Questions in DATA-MAPPING

Trending Questions

Popular # Hahtags

Popular Questions