use adist to determine which element only needs deletions

65 Views Asked by At

I've got this vector of strings (y) and a single string (x) which I want to compare and see which y fits x best if only deletions are considered.

x = "PCOR1"
y = c("PCor", "TCor", "TMMON", "INTMAX")

What I tried so far is to use adist but it leads to strange results:

adist(x,y,costs=c(substitutions = 0, insertions = 0, deletions = 1), ignore.case=TRUE)
      [,1] [,2] [,3] [,4]
[1,]    1    1    0    0

I can have a closer look at this doing:

 drop(attr(adist(x,y,costs=c(substitutions = 0, insertions = 0, deletions = 1), ignore.case=TRUE, counts=TRUE),"counts"))
     ins del sub
[1,]   0   1   0
[2,]   0   1   1
[3,]   0   0   5
[4,]   1   0   5

This now tells me, if I get it right, that I need one deletion to get from "PCOR1" to "PCor", one deletion and one substitution to get from "TCOR1" to "TCor" and so on.

Why does adist return this if I set insertions and substitutions to 0? Is there a way to only use deletions?

I would expect something like:

     [,1] [,2] [,3] [,4]
[1,]   1    0    0    0
1

There are 1 best solutions below

1
Sotos On

It seems you want to return it if it is a subset of the original string. In this case grepl() should suffice, i.e.

sapply(y, grepl, x, ignore.case = TRUE)
# pcor   tcor  tmmon intmax 
# TRUE  FALSE  FALSE  FALSE 

or

sapply(y, grepl, x, ignore.case = TRUE) * 1
#PCor   TCor  TMMON INTMAX 
# 1      0      0      0