I'm trying to apply the grabl function of stringdist to a large character vector "testref". I want to check for whether the strings in another character vector "testtitle" can be found in "testref". However, grabl does only allow for a single string to be tested at a time.
How can I circumvent this limitation?
Example to reproduce
#in reality each of the elements contains a full bibliography of a scientific article
testref <- c("asdfd sfgdgags dgsd.dsfas.dfs.f.sfas.f My beatiful title asfsdf dsf asfd dsf dsfsdfdsfsd, fdsf sdfdf: fsd fsdfafsd (2000) dsdfsf sfda", "sdfasfdsd, sdfsddf, fsagsg: sfds sfasdf sdfsdf", "sadfsdf: sdfsdf sdfggsdg another title here sdfdfsds, asdgasg (2021) blablabal")
#the pattern vector can contain up to 500 titles of scientific articles that contain typos or formatting mistakes. Hence, I need to use approximate matching
testtitle <- c("holy cow", "random notes", "MI beautiful title", "quantitative research is hard", "an0ther title here")
What I want to get out of this is a list of logical TRUE/FALSE vectors
results_list
#[[1]]
#[1] FALSE FALSE FALSE
#[[2]]
#[1] FALSE FALSE FALSE
#[[3]]
#[1] TRUE FALSE FALSE
#[[4]]
#[1] FALSE FALSE FALSE
#[[5]]
#[1] FALSE FALSE TRUE
So far I, I tried to loop the process as per @Rui Barradas suggestion. Technically it works, but it takes a very long time.
results_list <- vector("list", length = 5)
for(i in 1:5) {
results_list[[i]] <- grabl(testref, testtitle[i], maxDist = 8)
}
I was wondering whether it is possible to use lapply in combination with the grabl function.
results_list <- lapply(testtitle, function(testtitle) grabl(testref, testtitle[], maxDist = 2))
But I get this error: Error in grabl(testref, testtitle[], maxDist = 2) : could not find function "grabl"
I'm very grateful for your past suggestions and hope for more input!
Thank you!
Something like the following might do what you want. Untested, since there is no data.