Pentaho database join slower than single query in IDE

52 Views Asked by At

I'm using a PDI transformation which consists of different steps, and one of them is a database join who gets just 1 value as input, execute the query (only 1 time) then the results are manipulated and so on. That step is enormously and painfully slow (over 40 mins).

The same query via Toad by Oracle is executed in 1-2 seconds. I tried with and without each optimization hints, nothing changes, there is a lot of performance difference between PDI and the single execution via Toad.

SELECT /*+PARALLEL(4)*/
    COUNT(T.ID_VAL) AS number_of_occurences, T.name
    FROM TABLE T
    WHERE     1 = 1
    AND T.col1 = 'S'
    AND T.col2 IS NULL
    AND T.name IN
    (
        SELECT /*+INDEX OTHER_TABLE GROUP_ID_INDEX*/ DISTINCT TAB_O.name
        FROM OTHER_TABLE TAB_O
            WHERE     1 = 1
            AND TAB_O.GROUP_ID = ?--'10052023'--PASSED VALUE
            AND TAB_O.col1 = 'M'
    )
    GROUP BY T.name

where T.name is the same as TAB_O.name.

The structure of my transformation is:

enter image description here

The parameters of the step are:

enter image description here

Why there is this difference?

An example of the two tables coud be:

OTHER_TABLE:

GROUP_ID COL1 NAME
1 S John
1 M Mark
2 S Marc
2 M Jenny
2 S John
10052023 S Al
10052023 M John
10052023 M Jack

TABLE:

ID_VAL NAME
1 Al
2 Al
3 John
4 Jack
5 Jack
6 Jack

So I want as a result:

number_of_occurences name
1 John
3 Jack

I think the problem lies with pentaho preparing queries, or computational overhead, I don't know how Pentaho executes queries.

Adding some infos about the environment, I am using a local instance of a Pentaho on a VM, which connects to the repository hosted on another server.

0

There are 0 best solutions below