Am about to achieve filling blanks on a column, however the column has somedata. Now I wanted to impute rest of the data where there are blanks based on the available columns on the dataset. Sample data is attached here, am looking for fuzzy approach to the query above.
Basically, I have a data of 5,00,000 rows in which there are 6,800 rows of blank.
Column C already have data, whereas it is left with 6800 blanks this needs to be filled based on the combination of column a and b not with d (Apologies) for the confusion.
If any of the combinations of column a and b had populated data in column c and same combination is having blank on the 6800 rows that should be filled based on the above criteria.
a b c
Pension ER A Sec Pension ER A Sec 20% 605
Period Base Sal BT Period Base Sal BT 704
myShare Plan myShare Plan 508
Transportation Allowance Transportation Allowance 227
Hypo Tax (Salary) Hypo Tax (Salary) 1001
Amount paid Amount paid 1025
Car assistance Car assistance - cash allowance 202
Hypo NI (Salary) Hypo NI (Salary) 908
Housing Allowance Housing Allowance
Pens Supp Allwnce Pens Supp Allwnce
Home Allowance Home Allowance
Assignment Allowance GRS Assignment Allowance GRS
Sal Supp Allowance GRS Sal Supp Allowance GRS
Accident Ins Accident Ins
Relocation Allowance GRS Relocation Allowance GRS
Bonus Bonus
Total Total
Sueldo Sueldo
RET. TAX INCOME RET. TAX INCOME
Plus vacacional Plus vacacional
I have tried imputing the values in the blank cells but unable to fill the blanks am looking for a fuzzy logic where this can fill gap on the missing values on column C based on the columns a and b