Ok, we have one table with transaction data (TRANSACTIONS) and one of the fields is the card number used on it. I have to get the country of the card by accessing the BIC_COUNTRY_RANGES table , where we have stored ranges of card numbers and the country of them. So... we have to join both tables searching the card number on ranges of numbers (along an additional field that it is the card type). On the TRANSACTIONS table we have around 450k rows per day, and in BIC_COUNTRY_RANGES, 170K static rows.
TRANSACTIONS
| OPERATION_ID | CARD_TYPE | CARD_NUMBER |
|---|---|---|
| 1234 | A | 411389999000000001 |
| 5678 | B | 451716303000000001 |
BIC_COUNTRY_RANGES
| CARD_TYPE | RANGE_START | RANGE_END | COUNTRY_ISO |
|---|---|---|---|
| A | 411389999000000000 | 411389999999999999 | US |
| B | 451716303000000000 | 451716303999999999 | AR |
The join takes around 30 min to complete just with data of one day, and we have to run it with one-month data.
We have indexes created on CARD_TYPE and CARD_NUMBER on TRANSACTIONS, and CARD_TYPE, RANGE_START, RANGE_END on BIC_COUNTRY_RANGES, and the query used to join them is as easy as
SELECT *
FROM TRANSACTIONS T
LEFT JOIN
BIC_COUNTRY_CODES B
ON B.RANGE_START <= T.CARD_NUMBER AND
B.RANGE_END >= T.CARD_NUMBER AND
B.CARD_TYPE = T.CARD_TYPE;
¿Any idea why it takes so much to complete? We managed to reduce the number of rows of the BIC table from 1 million to 168k rows merging ranges, and if we replace non-equi join to a equijoin (just for testing), it takes seconds. So... its something related to the range of numbers but we cant figure out what is the problem. We have checked the ranged and doesn't seems to have overlapping ranges.
Find the EXPLAIN of the query here
1) First, we lock TRANSACTIONS in TD_MAP1 for read on a
reserved RowHash to prevent global deadlock.
2) Next, we lock BIC_COUNTRY_CODES in TD_MAP1 for read on a reserved
RowHash to prevent global deadlock.
3) We lock TRANSACTIONS in TD_MAP1 for read, and we lock
BIC_COUNTRY_CODES in TD_MAP1 for read.
4) We execute the following steps in parallel.
1) We do an all-AMPs RETRIEVE step in TD_MAP1 from BIC_COUNTRY_CODES
by way of an all-rows scan with a condition of ("NOT
(BIC_COUNTRY_CODES.CARD_TYPE IS NULL)") into Spool 2
(all_amps), which is redistributed by the hash code of (
BIC_COUNTRY_CODES.CARD_TYPE) to all AMPs in TD_Map1. Then
we do a SORT to order Spool 2 by row hash. The size of Spool
2 is estimated with low confidence to be 167,424 rows (
9,208,320 bytes). The estimated time for this step is 0.02
seconds.
2) We do an all-AMPs RETRIEVE step in TD_MAP1 from
TRANSACTIONS by way of an all-rows scan with no
residual conditions into Spool 3 (all_amps), which is
redistributed by the hash code of (
TRANSACTIONS.CARD_TYPE) to all AMPs in TD_Map1.
Then we do a SORT to order Spool 3 by row hash. The size of
Spool 3 is estimated with low confidence to be 475,776 rows (
16,176,384 bytes). The estimated time for this step is 0.03
seconds.
5) We do an all-AMPs JOIN step in TD_Map1 from Spool 2 (Last Use) by
way of a RowHash match scan, which is joined to Spool 3 (Last Use)
by way of a RowHash match scan. Spool 2 and Spool 3 are
right outer joined using a merge join, with condition(s) used for
non-matching on right table ("NOT (CARD_TYPE IS NULL)"),
with a join condition of ("(RANGE_START <= CARD_NUMBER) AND
((RANGE_END >= CARD_NUMBER) AND (CARD_TYPE =
CARD_TYPE ))"). The result goes into Spool 1
(group_amps), which is built locally on the AMPs. The size of
Spool 1 is estimated with no confidence to be 8,642,638 rows (
777,837,420 bytes). The estimated time for this step is 0.12
seconds.
6) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> The contents of Spool 1 are sent back to the user as the result of
statement 1. The total estimated time is 0.15 seconds.
Thanks