I have a complex SQL Server query that needs to process around 2000 parameters at a time. My query is complex, so for simplicity suppose we have:
select *
from sample_table
where id in (:listIds);
In the Java code, I insert listIds using from hibernate.query .setParameterList("listIds", listIds), where listIds is of type List<String> and has around 2000 entries. This query takes too long and will sometimes timeout.
I found this answer, where the author uses openjson and mentions improved performance. I followed a similar pattern:
select *
from sample_table
where id in (select cast(value as NVARCHAR(20)) from openjson(:listIds);
In the java code, I convert List<String> listIds into a JSON array of strings (e.g. ["a", "b", "c"]) of length 2000. I then use .setParameter("listIds", listIds) to insert listIds. This query finishes a lot faster than the previous approach.
My question is why does the latter approach seem to improve performance?
I tried searching for explanations on openjson vs .setParameterList(). I'm still unclear about why performance improved and was hoping someone could break this down for me.
Because JSON functions drive directly the parse of the serie of data into a table variable instaed of leaving the values into the query...
When you use a query with a IN, the optimizer transform the serie of IN values into OR statement. But when there is too muche values, this transform is not the best way to have fast performances.
Since SQL Server 2017, new improvment has been done by that is called "Intelligent Query Processor" that can estimate the number of row a table variable can have by stopping the query when a table variable is encountered then asking for the cardinality of the table (the number of rows) then restart the execution of the query with the right cardinality that the optimizer need to choose which algorithm will be use to solve a join...
But the best way is to avoid using several different layers of code and adopt best practices to write plain queries... In your case the top would be to use a stored procedure and use a temporary table with a key to store values of the IN before the final query