I have a table in Postgres where I store articles for multiple languages. For a full-text search, I have a separate column that is enabled for full-text search using GIN indexes. Depending on the article language, I store the corresponding body for the corresponding dictionary: German, English, Spanish etc
However, when I want to query the database, it makes a significant difference for which dictionary I search. Actually, I want to query all languages and not only specific ones. So I am curious about how I can achieve that.
This is my table:
create table if not exists public.article
(
id serial constraint "pk_article_id" primary key,
title varchar(1024),
body text,
lang varchar(2),
fts_dict regconfig generated always as (
CASE
WHEN ((lang)::text = 'en'::text) THEN 'english'::regconfig
WHEN ((lang)::text = 'de'::text) THEN 'german'::regconfig
WHEN ((lang)::text = 'es'::text) THEN 'spanish'::regconfig
ELSE 'simple'::regconfig
END) stored,
fts_body tsvector generated always as (
CASE
WHEN ((lang)::text = 'en'::text) THEN to_tsvector('english'::regconfig, body)
WHEN ((lang)::text = 'de'::text) THEN to_tsvector('german'::regconfig, body)
WHEN ((lang)::text = 'es'::text) THEN to_tsvector('spanish'::regconfig, body)
ELSE to_tsvector('simple'::regconfig, body)
END) stored
);
And here comes the GIN-Index:
create index if not exists idx_article_fts_body
on public.article using gin (fts_body);
So this table contains article in several languages. I have an auto-generated column that stores the body as tsvector in its specific dictionary, depending on the original language.
While searching for articles, I come across, that the dictionary matters heavily:
SELECT a.id, a.lang, a.title, a.body
FROM article a
INNER JOIN summaries s ON a.id = s.article_id
WHERE a.fts_body @@ plainto_tsquery('german', 'Grünheide')
delivers only all articles with a german locale, but not the english or spanish articles containing 'Grünheide'.
If I want to include all english or spanish articles, that contain the same word "grünheide", I need to query for the corresponding dictionary additionally:
SELECT a.id, a.lang, a.title, a.body
FROM article a
INNER JOIN summaries s ON a.id = s.article_id
WHERE a.fts_body @@ plainto_tsquery('english', 'Grünheide')
SELECT a.id, a.lang, a.title, a.body
FROM article a
INNER JOIN summaries s ON a.id = s.article_id
WHERE a.fts_body @@ plainto_tsquery('spanish', 'Grünheide')
Omitting the dictionary in the query seems to default to "simple" or "english", which makes the result similar to if I had asked for "english".
My question is: do I really need to iterate over all dictionary-types (languages) and re-query again and again to get the full list of all articles that contains a specific term?
Or is there any other technique I should know?
You must use the same language in the query as for the ts_vector