I am doing some data quality checking,
How do I compare two StringType columns ('old_unmatch' and 'new_unmatch') and create new columns for the results ('new_unmatch' and 'missed_unmatch)?
| old_unmatch | current_unmatch | new_unmatch | missed_unmatch |
|---|---|---|---|
| ['121', '122'] | ['121', '123'] | ['123'] | ['122'] |
To compare two string columns in
PySparkand create new columns to show the differences, you can use theudf(User-Defined Function) along with thearray_exceptfunction.or
Output-