I need to add a column with a progressive number to a table with two columns: ID and Amount. The resulting table should be like this:
| ID | Amount | Progressive n |
|---|---|---|
| 10 | 10000.00 | 1 |
| 21 | 9.67 | 1 |
| 21 | 11.27 | 2 |
| 21 | 9.46 | 3 |
| 21 | 12.27 | 4 |
| 21 | 90780.51 | 5 |
| 21 | 200.00 | 6 |
| 21 | 20.18 | 7 |
| 21 | 1045.04 | 8 |
| 21 | 3477.98 | 9 |
| 21 | 454390.00 | 10 |
| 21 | 24.50 | 11 |
| 21 | 54.76 | 12 |
| 21 | 12000.00 | 13 |
| 5 | 5.99 | 1 |
| 45 | 160000.00 | 1 |
| 45 | 29.00 | 2 |
| 45 | 1.98 | 3 |
| 45 | 2.56 | 4 |
| 56 | 659034.90 | 1 |
| 56 | 12000000.00 | 2 |
| 56 | 4781.35 | 3 |
| 56 | 1065.05 | 4 |
| 56 | 29.50 | 5 |
| 56 | 92000.00 | 6 |
| 56 | 10.00 | 7 |
| 56 | 4.35 | 8 |
| 56 | 52000000.00 | 9 |
| 56 | 8763.57 | 10 |
| 56 | 3065.05 | 11 |
But I get the wrong output. Something like this:
| ID | Amount | Progressive n |
|---|---|---|
| 10 | 10000.00 | 1 |
| 21 | 9.67 | 1 |
| 21 | 11.27 | 2 |
| 21 | 9.46 | 3 |
| 21 | 12.27 | 4 |
| 21 | 90780.51 | 5 |
| 21 | 200.00 | 6 |
| 21 | 20.18 | 7 |
| 21 | 1045.04 | 8 |
| 21 | 3477.98 | 9 |
| 21 | 454390.00 | 10 |
| 21 | 24.50 | 11 |
| 21 | 54.76 | 12 |
| 21 | 12000.00 | 13 |
| 5 | 5.99 | 1 |
| 45 | 160000.00 | 1 |
| 45 | 29.00 | 2 |
| 45 | 1.98 | 3 |
| 45 | 2.56 | 4 |
| 56 | 659034.90 | 5 |
| 56 | 12000000.00 | 6 |
| 56 | 4781.35 | 7 |
| 56 | 1065.05 | 8 |
| 56 | 29.50 | 9 |
| 56 | 92000.00 | 10 |
| 56 | 10.00 | 11 |
| 56 | 4.35 | 12 |
| 56 | 52000000.00 | 13 |
| 56 | 8763.57 | 14 |
| 56 | 3065.05 | 15 |
When ID = 56, the progressive number starts with 5, instead of 1.
I tried to add a Sort Stage, ordering the table by ID, but I get almost the same error. It changes the order of the ID column, but it still assign the wrong number
| ID | Amount | Progressive n |
|---|---|---|
| 5 | 5.99 | 1 |
| 10 | 10000.00 | 1 |
| 21 | 9.67 | 2 |
| 21 | 11.27 | 3 |
| 21 | 9.46 | 4 |
| 21 | 12.27 | 5 |
| 21 | 90780.51 | 6 |
| 21 | 200.00 | 7 |
| 21 | 20.18 | 8 |
| 21 | 1045.04 | 9 |
| 21 | 3477.98 | 10 |
| 21 | 454390.00 | 11 |
| 21 | 24.50 | 12 |
| 21 | 54.76 | 13 |
| 21 | 12000.00 | 14 |
| 45 | 160000.00 | 1 |
| 45 | 29.00 | 2 |
| 45 | 1.98 | 3 |
| 45 | 2.56 | 4 |
| 56 | 659034.90 | 1 |
| 56 | 12000000.00 | 2 |
| 56 | 4781.35 | 3 |
| 56 | 1065.05 | 4 |
| 56 | 29.50 | 5 |
| 56 | 92000.00 | 6 |
| 56 | 10.00 | 7 |
| 56 | 4.35 | 8 |
| 56 | 52000000.00 | 9 |
| 56 | 8763.57 | 10 |
| 56 | 3065.05 | 11 |
When ID = 21, the progressive number starts with 2, instead of 1.
I also tried to add the Amount column to the ordering in the Sort Stage, but it just changes the row where the error appears.
It seems to me that the Column Generator Stage gets the partition wrong, but I don't know how to fix it.
Just ask if any more information is required.
To add a column, I would suggest to use a
Transform Stageinstead of a.Column Generator StageYou can use
stage variableswithin the Transformer to generate the values. For simplification, set the transform stage (or the entire job) to run in sequential mode.The data on the input link needs to be sorted.
If you need to run in parallel, you could use
@PARTITIONNUM,@NUMPARTITIONSand@INROWNUMto deal with partitioned processing, but the following should also work in parallel without that, as long as the input link is partitioned on the ID column (only) and of course also sorted.Example
Add a sort stage followed by a transform stage to your job. in the sort stage, partition and sort the data by ID. Edit the transform stage like this:
Input Link
Stage Variables
Output Link
Process Explanation
First Row (
ID = 5):svIDis-1(init. value), setting thesvPreviousIDto-1.DSLink2.IDwill setsvIDto5svID = svPreviousIDisfalse, thussvProgNis set to1Second Row (ID = 10):
svID(from previous step) setssvPrevIDto5DSLink2.IDsetssvIDto10svProgNtosvProgN + 1which is2Third row (ID = 20):
svPreviousID:=svID(10)svID:=DSLink2.ID(20)svProgN:=1Fourth row (ID = 20):
svPreviousID:=svID(20)svID:=DSLink2.ID(20)svProgN:=svProgN + 1( which is2)and so on...