How to Implement Custom Distance Metrics in Sklearn Nearest Neighbor

59 Views Asked by At

I am trying to implement my own distance metrics specifically Jaro distance in Sklearn Nearest Neighbour and I am getting back some errors. I've tried looking up online and didn't manage to find a solution. This is what I have done:

# libraries
import pandas as pd
from sklearn.neighbors import NearestNeighbors
from sklearn.feature_extraction.text import CountVectorizer
import jellyfish

# Jaro distance function
def jaro_distance(s1,s2):
    return 1 - jellyfish.jaro_similarity(s1,s2)

# create samples and a namelist to compare against samples
samples = pd.DataFrame({'NAME':['Saige Fuentes','Bowen Higgins','Kylan Gentry','Amelie Griffith','Jaylen Blackwell']})
namelist = pd.DataFrame({'NAME':['Bowen Higgins','Jaylen Blackwell','Marceline Avila']})

cvec = CountVectorizer(ngram_range=(1,4))
X_names = cvec.fit_transform(namelist['NAME'])
nbrs = NearestNeighbors(n_neighbors = 1, metric = jaro_distance).fit(X_names)

input_vec = cvec.transform(samples['NAME'])
distances, indices = nbrs.kneighbors(input_vec, n_neighbors = 1)

This is where I got a TypeError 'csr_matrix' object cannot be converted to 'PyString'.

I would like to know how I can fix this. Thanks!

0

There are 0 best solutions below