I'm having trouble finding out if values partially match between 2 columns in different tables so i could update values accordingly. The problem is trying to match a longer value with smaller value and having multiple prefix values. I don't want to use any loops as my dataset is in the 1m range.
df1:
| PRODUCT_8 | VAT |
|---|---|
| 72093234 | 0,2 |
| 30023000 | 0,2 |
| 12093236 | 0,2 |
| 30066012 | 0,2 |
df2:
| KN |
|---|
| 30022000 |
| 30023000 |
| 300660 |
| 72 |
Desired output:
| PRODUCT_8 | VAT |
|---|---|
| 72093234 | 0,9 |
| 30023000 | 0,9 |
| 12093236 | 0,2 |
| 30066012 | 0,9 |
I have tried this:
df1[is.na(pmatch(df1$PRODUCT_8, df2$KN, duplicates.ok = TRUE)) == FALSE,
"VAT"] <- 0.09
but pmatch doesn't understand if I want to match longer numbers with smaller. It would be perfect if I could just use startswith(df1$PRODUCT_8, df2$KN), but sadly you can't have multiple values in the prefix argument. sqldf is not complying with my demands either, don't know what's the problem with that.