Concat Column2 data in one record based on column1 through JCL

121 Views Asked by At

I have a file having parent and child data in column1 and column2 respectively. I have a requirement to concat all the child from same parent in one record. How can we achive this through JCL ?

Expected Result shoold be like this : Input file :

Parent    |Child  
ABCDEFAAAA|1233444111  
ABCWEEAAAA|3456544111  
ABCDEFAAAA|3435677111  
HFFDDDAAAA|6444554111  
ABCDEFAAAA|2424234111  
HFFDDDAAAA|4334456111  

Output should be like this

Parent    |Child  
ABCDEFAAAA|1233444111,3435677111,2424234111  
ABCWEEAAAA|3456544111  
HFFDDDAAAA|6444554111,4334456111  
2

There are 2 best solutions below

0
Kolusu On BEST ANSWER

As Martin packer pointed out, it can be done in a single pass of data and get the desired results. Here is a sample which can handle upto 99 children per parent, but only the first 10 are handled as OP mentioned that the max is only 8.

//STEP0100 EXEC PGM=SORT                       
//SYSOUT   DD SYSOUT=*                         
//SORTIN   DD *                                
ABCDEFAAAA|1233444111                          
ABCWEEAAAA|3456544111                          
ABCDEFAAAA|3435677111                          
HFFDDDAAAA|6444554111                          
ABCDEFAAAA|2424234111                          
HFFDDDAAAA|4334456111                          
//SORTOUT  DD SYSOUT=*                         
//SYSIN    DD *                                
  SORT FIELDS=(01,10,CH,A),EQUALS              
                                               
  OUTREC IFTHEN=(WHEN=GROUP,                   
             KEYBEGIN=(001,10),                
                 PUSH=(081:ID=8,               
                       090:SEQ=2)),            
                                               
         IFTHEN=(WHEN=GROUP,                   
                BEGIN=(090,2,ZD,EQ,01),        
                 PUSH=(094:12,10)),            
                                               
         IFTHEN=(WHEN=GROUP,                   
                BEGIN=(090,2,ZD,EQ,02),        
                  END=(090,2,ZD,EQ,01),        
                 PUSH=(105:12,10)),            
                                               
         IFTHEN=(WHEN=GROUP,                   
                BEGIN=(090,2,ZD,EQ,03),        
                  END=(090,2,ZD,EQ,01),        
                 PUSH=(116:12,10)),            
                                               
         IFTHEN=(WHEN=GROUP,                   
                BEGIN=(090,2,ZD,EQ,04),        
                  END=(090,2,ZD,EQ,01),        
                 PUSH=(127:12,10)),            
                                               
         IFTHEN=(WHEN=GROUP,                   
                BEGIN=(090,2,ZD,EQ,05),        
                  END=(090,2,ZD,EQ,01),        
                 PUSH=(138:12,10)),            

         IFTHEN=(WHEN=GROUP,                                         
                BEGIN=(090,2,ZD,EQ,06),                              
                  END=(090,2,ZD,EQ,01),                              
                 PUSH=(149:12,10)),                                  
                                                                     
         IFTHEN=(WHEN=GROUP,                                         
                BEGIN=(090,2,ZD,EQ,07),                              
                  END=(090,2,ZD,EQ,01),                              
                 PUSH=(160:12,10)),                                  
                                                                     
         IFTHEN=(WHEN=GROUP,                                         
                BEGIN=(090,2,ZD,EQ,08),                              
                  END=(090,2,ZD,EQ,01),                              
                 PUSH=(171:12,10)),                                  
                                                                     
         IFTHEN=(WHEN=GROUP,                                         
                BEGIN=(090,2,ZD,EQ,09),                              
                  END=(090,2,ZD,EQ,01),                              
                 PUSH=(182:12,10)),                                  
                                                                     
         IFTHEN=(WHEN=GROUP,                                         
                BEGIN=(090,2,ZD,EQ,10),                              
                  END=(090,2,ZD,EQ,01),                              
                 PUSH=(193:12,10)),                                  
                                                                     
         IFTHEN=(WHEN=(090,2,ZD,EQ,01),                              
              OVERLAY=(105:99X)),                                    
                                                                     
         IFTHEN=(WHEN=NONE,                                          
              OVERLAY=(094:94,110,SQZ=(SHIFT=LEFT,MID=C',')))        
                                                 
  OUTFIL REMOVECC,NODETAIL,                                          
  SECTIONS=(81,8,                                                    
  TRAILER3=(01,12,94,110))                                           
/* 

The output of this is

ABCDEFAAAA|21233444111,3435677111,2424234111       
ABCWEEAAAA|33456544111                             
HFFDDDAAAA|46444554111,4334456111                  
0
Srinivasan JV On

For the give input data, I've tried the following to achieve the expected results:


Step 1: I have used WHEN=GROUP with KEYBEGIN, PUSH and SEQ parameters.

//Z02923A JOB 1,NOTIFY=&SYSUID                               
//STEP01  EXEC PGM=SORT                                      
//SORTIN  DD *                                               
ABCDEFAAAA|1233444111                                        
ABCWEEAAAA|3456544111                                        
ABCDEFAAAA|3435677111                                        
HFFDDDAAAA|6444554111                                        
ABCDEFAAAA|2424234111                                        
HFFDDDAAAA|4334456111                                        
//SORTOUT DD DSN=Z02923.SO.OUT1,                             
//        DISP=(,CATLG,DELETE),                              
//        SPACE=(CYL,(2,0),RLSE),                            
//        DCB=(LRECL=80,RECFM=FB,BLKSIZE=0)                  
//SYSOUT  DD SYSOUT=*                                        
//SYSIN   DD *                                               
  OUTREC IFTHEN=(WHEN=GROUP,KEYBEGIN=(1,11),PUSH=(50:SEQ=1)),
         IFTHEN=(WHEN=(50,1,CH,EQ,C'1'),BUILD=(1,11,         
                       12:12,10,                                      
                       22:Z,                                          
                       21Z)),                                         
         IFTHEN=(WHEN=(50,1,CH,EQ,C'2'),BUILD=(1,11,                  
                       12:11Z,                                        
                       23:12,10,                                      
                       33:11Z)),                                      
         IFTHEN=(WHEN=(50,1,CH,EQ,C'3'),BUILD=(1,11,                  
                       12:22Z,                                        
                       12,10,                                         
                       Z))                                            
  SORT FIELDS=(1,11,CH,A)                                             
/*  

KEYBEGIN=(1,11) means position 1 to 11 defines the key and any change in the value of the key implies the start of a new group. It is imperative to note that I'm including the 11th byte, which holds the pipe symbol, as part of the key.

PUSH=(50:SEQ=1) means position 50 contains a 1 byte sequence number for each record in a group. When the key value changes, the sequence number resets to 1.

Low value ain't an enemy all the time ;)

I've used the sequence number, that we PUSHed, in the IFTHEN clauses to arrange the fields by inserting binary zeros (aka Low value i.e., X'00') in between as placeholders so that I can coalesce similar group records into one. I've covered that in the later sections of this answer.

Please note that I intend to add a space between the Child fields. This becomes handy when I squeeze the output in the final step with a comma in between the fields separated by spaces.

Let's take a look at the output dataset created out of the first step.

***************************** Top of Data ******************************
ABCDEFAAAA|1233444111                                                   
ABCDEFAAAA|           2424234111                                        
ABCDEFAAAA|                      3435677111                             
ABCWEEAAAA|3456544111                                                   
HFFDDDAAAA|4334456111                                                   
HFFDDDAAAA|           6444554111                                        
**************************** Bottom of Data ****************************

With Hex mode ON, you will be able to see the binary zeros (X'00).

***************************** Top of Data ******************************
ABCDEFAAAA|1233444111                                                   
CCCCCCCCCC4FFFFFFFFFF000000000000000000000044444444444444444444444444444
1234561111F1233444111000000000000000000000000000000000000000000000000000
----------------------------------------------------------------------- 
ABCDEFAAAA|           2424234111                                        
CCCCCCCCCC400000000000FFFFFFFFFF0000000000044444444444444444444444444444
1234561111F0000000000024242341110000000000000000000000000000000000000000
----------------------------------------------------------------------- 
ABCDEFAAAA|                      3435677111                             
CCCCCCCCCC40000000000000000000000FFFFFFFFFF04444444444444444444444444444
1234561111F0000000000000000000000343567711100000000000000000000000000000
----------------------------------------------------------------------- 
ABCWEEAAAA|3456544111                                                   
CCCECCCCCC4FFFFFFFFFF000000000000000000000044444444444444444444444444444
1236551111F3456544111000000000000000000000000000000000000000000000000000
----------------------------------------------------------------------- 
HFFDDDAAAA|4334456111                                                   
CCCCCCCCCC4FFFFFFFFFF000000000000000000000044444444444444444444444444444
8664441111F4334456111000000000000000000000000000000000000000000000000000
----------------------------------------------------------------------- 
HFFDDDAAAA|           6444554111                                        
CCCCCCCCCC400000000000FFFFFFFFFF0000000000044444444444444444444444444444
8664441111F0000000000064445541110000000000000000000000000000000000000000
----------------------------------------------------------------------- 
**************************** Bottom of Data ****************************

Step 2: The output dataset from Step 1 is passed as input to Step 2.

//STEP02  EXEC PGM=SORT,COND=(0,NE)              
//SORTIN  DD DISP=SHR,DSN=Z02923.SO.OUT1         
//SORTOUT DD DSN=Z02923.SO.OUT2,                 
//        DISP=(,CATLG,DELETE),                  
//        SPACE=(CYL,(2,0),RLSE),                
//        DCB=(LRECL=80,RECFM=FB,BLKSIZE=0)      
//SYSOUT  DD SYSOUT=*                            
//SYSIN   DD *                                   
  SORT FIELDS=(1,11,CH,A)                        
  SUM FIELDS=(12,8,20,8,28,8,36,8),FORMAT=BI     
  ALTSEQ CODE=(0040)                             
  OUTREC FIELDS=(1,80,TRAN=ALTSEQ)               
/*     

As the input is sorted on the first 11 bytes, I've used SUM FIELDS to combine each records of a group into one. Binary addition takes place with SUM FIELDS=(12,8,20,8,28,8,36,8),FORMAT=BI statement.

What really happens with the SUM FIELDS and Binary zeros?

Binary addition operates on two bit patterns. Let's consider the following records (Hex mode is ON as I would like to show how Binary addition is being performed on the Hex values).

----+----1----+----2----+----3----+----4----+----5----+----6----+----7--
***************************** Top of Data ******************************
ABCDEFAAAA|1233444111                                                   
CCCCCCCCCC4FFFFFFFFFF000000000000000000000044444444444444444444444444444
1234561111F1233444111000000000000000000000000000000000000000000000000000
----------------------------------------------------------------------- 
ABCDEFAAAA|           2424234111                                        
CCCCCCCCCC400000000000FFFFFFFFFF0000000000044444444444444444444444444444
1234561111F0000000000024242341110000000000000000000000000000000000000000
-----------------------------------------------------------------------

Hex value of the 12th byte in 1st record, X'F1' is being added with the binary zeros in the 12th byte in 2nd record as the keys are same.

 F1
+00
 --
 F1

X'F1' denotes number '1' in EBCDIC

If we change the Binary zero in the 12th byte of 2nd record to Binary one (X'01'), results will vary.

 F1
+01
 --
 F2

X'F2' denotes number '2' in EBCDIC.

With Binary zeros we can perform SUM FIELDS on Alphanumeric data items (EBCDIC characters a-z, A-Z, 0-9) to group them onto one single record.

ALTSEQ CODE=(0040) is used to convert the unused Binary Zeros (X'00) to spaces (X'40').

The output from Step 2 is shown below:

***************************** Top of Data ******************************
ABCDEFAAAA|1233444111 2424234111 3435677111                             
ABCWEEAAAA|3456544111                                                   
HFFDDDAAAA|4334456111 6444554111                                        
**************************** Bottom of Data **************************** 

Step 3: In this step, I'm left squeezing the output from Step 2 with MID=C',' to to create comma separated fields.

//STEP03  EXEC PGM=SORT,COND=(0,NE)             
//SORTIN  DD DISP=SHR,DSN=Z02923.SO.OUT2        
//SORTOUT DD DSN=Z02923.SO.OUT3,                
//        DISP=(,CATLG,DELETE),                 
//        SPACE=(CYL,(2,0),RLSE),               
//        DCB=(LRECL=80,RECFM=FB,BLKSIZE=0)     
//SYSOUT  DD SYSOUT=*                           
//SYSIN   DD *                                  
  SORT FIELDS=COPY                              
  OUTREC FIELDS=(1,80,SQZ=(SHIFT=LEFT,MID=C','))
/* 

Output from Step 3

***************************** Top of Data ******************************
ABCDEFAAAA|1233444111,2424234111,3435677111                             
ABCWEEAAAA|3456544111                                                   
HFFDDDAAAA|4334456111,6444554111                                        
**************************** Bottom of Data ****************************

and there you go! You can definitely scale up this solution to accommodate the max limit of 8 child fields.

Hope this helps!