I am using citext in PostgreSQL for all text column types. I wonder about citext performance.
I performed simple WHERE statement benchmarks over text columns that have a b-tree index, but I couldn't see any differences in terms of query cost.
For example:
Select * From table_text where a = '1';
Select * From table_citext where a= '1';
These queries have identical query costs.
As far as I understand, citext stores the string as it is without converting it to lower case. So when a value is used in the WHERE clause, it uses the lower function for every comparison in each node of the b-tree index (I used a b-tree index).
If this is as I say, this should have caused performance problems, but it hasn't.
How does PostgreSQL achieve this?
How does PostgreSQL store citext column values in a b-tree index?
citextis stored as it is input, without any conversion to lower case. This also holds for storage as b-tree index keys.The magic happens in the comparison function for
citext:So yes, this should incur some overhead. How expensive it is will also depend on the default collation of the database.
I'll demonstrate this using a query without an index. I am using the German collation:
First using
text:Now the same experiment with
citext:So
citextis about seven times slower.But don't forget that each of these experiments performed a sequential scan with a million comparisons.
If you use an index, the difference will not be noticeable:
You see that
CREATE INDEXtakes much longer for thecitextcolumns (it has to perform a lot of comparisons), but the queries take about the same time.The reason is that you need only few comparisons if you use an index scan: for each of the 2-3 index blocks you access you perform a binary search, and you may have to re-check the table row found in the case of a bitmap index scan.