I am trying to implement my own distance metrics specifically Jaro distance in Sklearn Nearest Neighbour and I am getting back some errors. I've tried looking up online and didn't manage to find a solution. This is what I have done:
# libraries
import pandas as pd
from sklearn.neighbors import NearestNeighbors
from sklearn.feature_extraction.text import CountVectorizer
import jellyfish
# Jaro distance function
def jaro_distance(s1,s2):
return 1 - jellyfish.jaro_similarity(s1,s2)
# create samples and a namelist to compare against samples
samples = pd.DataFrame({'NAME':['Saige Fuentes','Bowen Higgins','Kylan Gentry','Amelie Griffith','Jaylen Blackwell']})
namelist = pd.DataFrame({'NAME':['Bowen Higgins','Jaylen Blackwell','Marceline Avila']})
cvec = CountVectorizer(ngram_range=(1,4))
X_names = cvec.fit_transform(namelist['NAME'])
nbrs = NearestNeighbors(n_neighbors = 1, metric = jaro_distance).fit(X_names)
input_vec = cvec.transform(samples['NAME'])
distances, indices = nbrs.kneighbors(input_vec, n_neighbors = 1)
This is where I got a TypeError 'csr_matrix' object cannot be converted to 'PyString'.
I would like to know how I can fix this. Thanks!