I have a CSV file
I want to group together similar DeptID and sort the MID in ascending order and assign the value of lowest MID to SID who have similar DeptID using talend open studio for data integration. If there is one DeptID, then assign the same value of MID to SID.
Input CSV :
| DeptID | SID | StudentName | MID |
|---|---|---|---|
| 111 | Nancy | C1 | |
| 111 | Nancy | B1 | |
| 111 | Nancy | A1 | |
| 222 | James | Z1 |
I have used tFileInputDelimited to read the input file, I have used tSortRow to sort MID. And I have used tAggregateRow to group the values.
I am getting output as:
| DeptID | SID | StudentName | MID |
|---|---|---|---|
| 111 | Nancy | [A1,B1,C1] | |
| 222 | James | [Z1] |
The Output CSV should be as follows:
| DeptID | SID | StudentName | MID |
|---|---|---|---|
| 111 | A1 | Nancy | A1 |
| 111 | A1 | Nancy | B1 |
| 111 | A1 | Nancy | C1 |
| 222 | Z1 | James | Z1 |
One simple solution would be to read twice your input file : one time as the main flow, to get detail of MID column , one time as the lookup to get MIN value of MID column as your SID column. Then join the 2 flows with a tMap, joining on deptID (with "all matches" join type).
Another solution could be to use internal variables of tMap to get the work done with fewer components :
Once you have sorted your data with tSortRow, create 2 variables in a tMap following your tSort component :
The 2 solutions work to get the SID value .