postgresql Wildcard search and ranking not working as expected

208 Views Asked by At

I have this tsvector column for full-text search of two columns' name and address.

ALTER TABLE public.business_info ADD column search_vector tsvector NULL GENERATED ALWAYS AS (setweight(to_tsvector('simple'::regconfig, COALESCE(name, ''::text)), 'A'::"char") || setweight(to_tsvector('simple'::regconfig, COALESCE(address, ''::text)), 'B'::"char")) STORED

I am trying the below query to find records with full text wild card search

SELECT name,ts_rank_cd(search_vector, to_tsquery('Oklahoma&State:*')) as rank 
from business_info order by rank desc

But I am getting the result as below though I am expecting, records with the name "Oklahoma state" show a higher ranking. Any idea how to rank these results to get the results with a higher ranking on the inputs that the user has given?

enter image description here

2

There are 2 best solutions below

0
Marmite Bomber On BEST ANSWER

If you are interested on higher score in the starting positions of the string, simple define an expression for this part.

Here an example for the first two words

select 
coalesce((string_to_array(name, ' '))[1],' ')||' ' || coalesce((string_to_array(name, ' '))[2],' ')  as name_pfx,
name
from business_info;

name_pfx,           name
Southwest Oklahoma  Southwest Oklahoma State University at Oklahoma State University-OKC

Than define a higher weight for this prefix e.g.

ALTER TABLE  business_info 
ADD column search_vector tsvector NULL GENERATED ALWAYS AS 
 (setweight(to_tsvector('simple'::regconfig, 
       COALESCE((coalesce((string_to_array(name, ' '))[1],' ')||' ' || coalesce((string_to_array(name, ' '))[2],' ')), ''::text)), 'A'::"char") || 
  setweight(to_tsvector('simple'::regconfig, COALESCE(name, ''::text)), 'B'::"char")) stored;

Resut is as expected

SELECT name_pfx||' '||name name,ts_rank_cd(search_vector, to_tsquery('Oklahoma&State:*')) as rank 
from  business_info order by rank desc

"rank",  "name"                                      
2.1714287   Oklahoma State Oklahoma State University - Oklahoma City
2.0714285   Oklahoma State Oklahoma State University at Western Oklahoma
0.93333334  Southwest Oklahoma Southwest Oklahoma State University at Oklahoma State University-OKC
0.90000004  Southwest Oklahoma Southwest Oklahoma State University at Northwest Oklahoma State University-A
0.90000004  Western Oklahoma Western Oklahoma State College at Western Oklahoma State College
0
jjanes On

You could use strpos to find how soon in the string another string occurs. But if the substrig doesn't occur at all, you get 0, so you probably need to deal with that specially:

ORDER BY nullif(strpos(name,'Oklahoma state'),0);

Note that this will fail to rank somethings that your query still finds, like 'state quick red fox jumps over Oklahoma'. Using FTS for the match but substring queries for the rank is inconsistent, maybe you should use substring for both.