How to aggregate strings in multiple rows in IBM InfoSphere DataStage grouped by a given ID

223 Views Asked by At

I am given a table like the following of a attendece of employees at a company. The data should be extracted from a sequenctial file which has comma seperated values.

ID Weekday Attendence
1 Monday Yes
2 Monday Yes
3 Monday No
4 Monday Yes
1 Tuesday Yes
2 Tuesday No
3 Tuesday Yes
4 Tuesday Yes

ID column has the employee ID and Weekday and Attendence columns has the exact meaning. I wan to count how many days the employee has shown up and I want the Days he attended as a comma seperated list. Like below

ID Weekday_List Attendenc_Count
1 Monday,Tuesday 2
2 Monday 1
3 Tuesday 1
4 Monday,Tuesday 2
1

There are 1 best solutions below

1
MichaelTiefenbacher On

It can be done with a transformer stage as concat aggregation is not supported by the aggregator stage.

You could use stage variables to do it or a transformer loop. The loop example shown here can be changed to do a concat.

If you decide to try it without the loop you need:

  • sorted and partitioned input to the transformer
  • use LastRowInGroup() to reset the string after each ID (and a second stage variable)