I'm using a PDI transformation which consists of different steps, and one of them is a database join who gets just 1 value as input, execute the query (only 1 time) then the results are manipulated and so on. That step is enormously and painfully slow (over 40 mins).
The same query via Toad by Oracle is executed in 1-2 seconds. I tried with and without each optimization hints, nothing changes, there is a lot of performance difference between PDI and the single execution via Toad.
SELECT /*+PARALLEL(4)*/
COUNT(T.ID_VAL) AS number_of_occurences, T.name
FROM TABLE T
WHERE 1 = 1
AND T.col1 = 'S'
AND T.col2 IS NULL
AND T.name IN
(
SELECT /*+INDEX OTHER_TABLE GROUP_ID_INDEX*/ DISTINCT TAB_O.name
FROM OTHER_TABLE TAB_O
WHERE 1 = 1
AND TAB_O.GROUP_ID = ?--'10052023'--PASSED VALUE
AND TAB_O.col1 = 'M'
)
GROUP BY T.name
where T.name is the same as TAB_O.name.
The structure of my transformation is:
The parameters of the step are:
Why there is this difference?
An example of the two tables coud be:
OTHER_TABLE:
| GROUP_ID | COL1 | NAME |
|---|---|---|
| 1 | S | John |
| 1 | M | Mark |
| 2 | S | Marc |
| 2 | M | Jenny |
| 2 | S | John |
| 10052023 | S | Al |
| 10052023 | M | John |
| 10052023 | M | Jack |
TABLE:
| ID_VAL | NAME |
|---|---|
| 1 | Al |
| 2 | Al |
| 3 | John |
| 4 | Jack |
| 5 | Jack |
| 6 | Jack |
So I want as a result:
| number_of_occurences | name |
|---|---|
| 1 | John |
| 3 | Jack |
I think the problem lies with pentaho preparing queries, or computational overhead, I don't know how Pentaho executes queries.
Adding some infos about the environment, I am using a local instance of a Pentaho on a VM, which connects to the repository hosted on another server.

