Using C# CLR in SELECT and WHERE at the same time

135 Views Asked by At

I have a C# CLR doing multiple string manipulations. It returns an int. I'm calling it with

SELECT func(col1, col2) AS Score
INTO #A
FROM table;

SELECT *
INTO #B
FROM #A
WHERE Score > -1

If I do something like

SELECT func(col1, col2) AS Score
INTO #A
FROM table
WHERE func(col1, col2) > -1;

Is the CLR called and executed two times? In terms of performance/speed is there a better way to get the same result?

col1 is nvarchar(max), col2 is nvarchar(800). The function contains business logic.

There are about 10 billion rows/calculations

4

There are 4 best solutions below

1
David Browne - Microsoft On

You should not assume that it will be run only once. It may be, but the behavior may be plan-dependent, and later devs will have the same question. Instead, as @Larnu suggests, push function into a subquery/cte.

with q as
(
  SELECT func(col1, col2) AS Score
  FROM table
),
select *
INTO #A
from q
WHERE Score > -1;
1
DueGe On

If your function is expensive and the number of unique pairs (col1, col2) is significantly less than the number of input records, try using this:

select B.score 
from 
(
    select func(A.col1,A.col2) as score
    from 
    (
        select distinct col1, col2 from table 
    ) A
    where func(A.col1,A.col2) > -1
) B
where B.score > -1

I think whether the function is deterministic or not - can be important when creating a query execution plan by the engine.

I know from experience that (with large tables) temporary tables can be a faster solution than subqueries and CTE's, you can try rewrite proposed query using #temptables.

0
DueGe On

I wrote simple function taking two parameters and returning an integer. The function writes to external text file that was called with specified parameters, I was able to track how many times it was called.

Conclusion is: total number of calls is equal to sum of number of input records and number of records that meet criteria specified in WHERE clause.

SELECT don't know what WHERE is doing. Deterministic / Nondeterministic function flags doesn't change anything.

Of course - it's just a simple exercise, and I'm not an outstanding/certified Microsoft systems theorist.

I'm afraid - if you are not able to rewrite your CLR - you won't get much more, though, maybe parallel queries could speed things up ?

0
Ben Thul On

If you can reasonably put the IsDeterministic and IsPrecise properties on your function (very sure about IsDeterministic, pretty sure about IsPrecise'; I'm unable to find the relevant documentation right now on what the requirements are), then you can add a computed column to your table that is defined as func(col1, col2) and index it. The act of indexing it will make it so that the function won't be called at query time, but rather when rows are inserted/updated. My recommendation is to try adding the computed column and indexing it on a small version of your table before doing it live. That is:

  • select top(100) * into dbo.TestTable from dbo.YourTable;
  • alter table dbo.TestTable add NewColumn as dbo.func(col1, col2);
  • create index FuncIndex on dbo.TestTable (NewColumn);

And/or, if you have a non-production environment, do it on the live table there.

If predicates for likely queries on that column make sense to be filtered, you can make the index filtered. But that's a general indexing concern and not specific to your situation.