AWK or sed to print data in table format

122 Views Asked by At

I've data like below.

Row: 1
Name    :   Ajay
Age     :   45
Address :   zyx

Row: 2
Name    :   Ajay
Age     :   45
Address :   zyx

Row: 3
Name    :   Ajay
Age     :   45
Address :   zyx

Row: 4
Name    :   Ajay
Age     :   45
Address :   zyx

I wanted output like below.

Name    Age     Address
-----   ----    -------
Ajay    45      xyz
Ajay    45      xyz
Ajay    45      xyz
Ajay    45      xyz

Name, Age & Address columns are just for example, it should be anything and of any number. So command should dynamically read the columns and print and same way read data and print

I tried below code

#!/bin/bash

get_column_names() {
    # Extract column names from Row: 1 and Row: 2
    sed -n '/Row: 1/{n;p;q}' "$1" | awk -F '[:\t]' '{for(i=1;i<=NF;i+=2) print $i}'
}

print_header() {
    printf "%s\t" "$@"
    printf "\n"
}

data_file="ab.txt"

# Get col names
header=$(get_column_names "$data_file")

print_header $header

# Read data from file and print each row
max_row=$(grep -c "Row:" "$data_file")
for ((i=1; i<=$max_row; i++)); do
    sed -n "/Row: $i/{n;p;}" "$data_file" | awk -F '[:\t]' '{for(i=2;i<=NF;i+=2) printf "%s\t", $i; printf "\n"}'
done
7

There are 7 best solutions below

3
anubhava On BEST ANSWER

You may use this awk:

awk -v OFS="\t" -F '[[:blank:]]*:[[:blank:]]*' '
NF && $1 != "Row" {
   row[++c] = $2
   if (!first)
      hdr[c] = $1
}
!NF {
   if (!first) {
      for (i=1; i<=c; ++i)
         printf "%s", hdr[i] (i < c ? OFS : ORS)
      for (i=1; i<=c; ++i) {
         gsub(".", "-", hdr[i])
         printf "%s", hdr[i] (i < c ? OFS : ORS)}
         first = 1
      }
      for (i=1; i<=c; ++i)
         printf "%s", row[i] (i < c ? OFS : ORS)
      c = 0
}
END {
   for (i=1; i<=c; ++i)
      printf "%s", row[i] (i < c ? OFS : ORS)
}' file | column -t -s $'\t'

Name  Age  Address
----  ---  -------
Ajay  45   zyx
Ajay  45   zyx
Ajay  45   zyx
Ajay  45   zyx
2
Ed Morton On

Using any awk:

$ cat tst.awk
BEGIN { RS=""; OFS="\t" }
NR == 1 {
    end = NF-2
    for ( i=3; i<=end; i+=3 ) {
        printf "%s%s", $i, (i<end ? OFS : ORS)
    }
    for ( i=3; i<=end; i+=3 ) {
        gsub(/./,"-",$i)
        printf "%s%s", $i, (i<end ? OFS : ORS)
    }
}
{
    for ( i=5; i<=NF; i+=3 ) {
        printf "%s%s", $i, (i<NF ? OFS : ORS)
    }
}

$ awk -f tst.awk file
Name    Age     Address
----    ---     -------
Ajay    45      zyx
Ajay    45      zyx
Ajay    45      zyx
Ajay    45      zyx
0
markp-fuso On

Assumptions:

  • input lines consist of just two :-delimited fields, and said fields do not include the : character
  • the 2nd field may contain white space
  • output is to be tab delimited (as opposed to pretty printed as fixed-width columns)
  • OP can get the column list into a format of Col1:Col2:...:Coln; this also represents the order in which the columns are to be displayed

Adding some variability to the input file:

$ cat ab.txt
Row: 1
Name    :   Ajay
Age     :   45
Address :   zyx

Row: 2
Name    :   Sarah Jane
Age     :   27
Address :   123 Bubble Ave, #315, Any City, Country

Row: 3
Name    :   Angelina
Age     :   57
Address :   does not matter

Row: 4
Name    :   Bobby
Age     :   22
Address :   parent's basement

One awk approach that replaces OP's (bash) functions and max_row/for loop:

awk -v clist="${list}" '                              # user provides list of columns; list == "Col1:Col2:...:Coln"

function print_row(arr) {
    if (1 in arr)                                     # if the "arr" array is populated then ...
       for (i=1; i<=n; i++)                           # loop through column counter and ...
           printf "%s%s", arr[i], (i<n ? OFS : ORS)   # print columns
}

BEGIN         { FS=":"; OFS="\t"                      # input/output field delimiters
                n=split(clist,a)                      # split column list in to separate fields, store in array a[]
                for (i=1; i<=n; i++)                  # convert array into ...
                    headers[a[i]] = i                 # associative array, storing order (i) in array
                print_row(a)                          # print our header line
              }

              # strip leading/trailing white space from our 2 input fields

              { for (i=1; i<=2; i++)                 
                    gsub(/^[[:space:]]+|[[:space:]]+$/,"",$i)
              }

$1=="Row"     { print_row(cols)                       # print previous row of data
                delete cols                           # clear the cols[] array
              } 

$1 in headers { cols[headers[$1]] = $2 }              # if 1st field is an index in headers[] array then save 2nd field in cols[] array

END     { print_row(cols) }                           # print last row of data

' ab.txt

Taking for a test drive:

When list='Name:Age:Address':

Name    Age     Address
Ajay    45      zyx
Sarah Jane      27      123 Bubble Ave, #315, Any City, Country
Angelina        57      does not matter
Bobby   22      parent's basement

When list='Row:Name:Address:Age':

Row     Name    Address Age
1       Ajay    zyx     45
2       Sarah Jane      123 Bubble Ave, #315, Any City, Country 27
3       Angelina        does not matter 57
4       Bobby   parent's basement       22

NOTE: I've opted to not print the line of hyphens since this won't look 'pretty' with tab-delimited output; while we could add a line of hyphens I'm guessing that would only be required if we are to also generate the output in fixed-width format ... doable but would require additional code to dynamically determine number of hyphens based on the widest string (header vs data) for each column

0
Kaz On

Using TXR:

$ txr table.txr data
Name    Age     Address
-----   ----    -------
Ajay    45      zyx
Ajay    45      zyx
Ajay    45      zyx
Ajay    45      zyx

Code in table.txr:

@(collect)
Row: @nil
Name    :   @na
Age     :   @ag
Address :   @addr
@(end)
@(output)
Name    Age     Address
-----   ----    -------
@  (repeat)
@{na 7} @{ag 7} @addr
@  (end)
@(end)

I used short variable names na and ag on purpose, because there is a code formatting trick I like to use when producing fixed column tables. Namely, I make the braced variable as wide as the column (minus separating space). For instance, suppose I have a 25 character column where the variable var is to be placed:

  @{var                 25}

The spaces within the brace notation do not matter; but I put enough of them so that the thing is 25 characters wide. Then the output template's layout mimics the exact columnar layout.

This is not just for looks; it helps you get the columns right.

Thus I chose the name na so that @{na 7} ends up 7 characters wide. If we had 15 spaces for a name, I would have made it @{name 15}, and that's what I would recommend if the real data has longer names, requiring wider columns.

P.S. If the translation of zyx in the data to the xyz in the desired output is not a typo, we can use `@(reverse addr) to achieve it:

@(output)
Name    Age     Address
-----   ----    -------
@  (repeat)
@{na 7} @{ag 7} @(reverse addr)
@  (end)
@(end)
1
karakfa On

another awk

$ awk -v RS= 'NR==1 {for(i=3;i<=NF;i+=3) 
                       h = h sprintf("%s ", $i); 
                     print h; 
                     gsub(/[^ ]/,"-",h); 
                     print h} 
                    {for(i=5;i<=NF;i+=3) 
                       printf "%s ",$i; print ""}' file | column -t


Name  Age  Address
----  ---  -------
Ajay  45   zyx
Ajay  45   zyx
Ajay  45   zyx
Ajay  45   zyx

not sure better but DRY

$ awk -v RS= 'function m(b,  h) {for(i=b;i<=NF;i+=3) h=h $i FS; return h}
              NR==1 {print h=m(3); gsub(/[^ ]/,"-",h); print h} 
                    {print m(5)}' file | column -t

Explanation: Assuming the same record structure with unknown number of key value pairs. Extract the header from the first record, use the headers to create underline, print corresponding values for each key from all records.

0
potong On

This might work for you (GNU sed):

sed -n 'n;:a;N;$!{/^$/M!ba}
        x;/./!{x;h;x;s/\s*:.*//mg;s/\n/\t/gp;s/\S/-/gp;};x
        s/.*:\s*//mg;s/\n/\t/gp' file

In essence, gather up each set and using multiline pattern matching fashion the desired result.

This is a filter operation so use the -n option to suppress implicit printing.

Throw away the first line of the set and then gather up the remainder in the pattern space.

If this is the first set, make a copy in the hold space and using multiline substitution remove all but the headings and replace newlines by tabs and print the result. Then replace that result by -'s and print that too.

For each set, remove the headings and replace newlines by tabs and print the result.

The solution may be piped into an invocation of column like so:

 sed -n 'n;:a;N;$!{/^$/M!ba}
        x;/./!{x;h;x;s/\s*:.*//mg;s/\n/\t/gp;s/\S/-/gp;};x
        s/.*:\s*//mg;s/\n/\t/gp' file |
        column -s'   ' -t

This will prettify the result if necessary.

N.B. The -s is the separator and a tab can be achieve by ctrl-v and then tab or for some shells $'\t'.

0
dawg On

Your example does not show what you expect if the width of the data is wider than the header -- All the headers ("Name", "Age", "Address") are either the same width or narrower than the example fields, which is an easier problem.

IMHO, this is only useful if the column widths are dynamic.

Suppose your example is similar to this:

$ cat file
Row: 1
Name    :   Ajay the elder
Age     :   65
Address :   123 elder street

Row: 2
Name    :   Ajay the younger
Age     :   56
Address :   456 younger street

Row: 3
Name    :   Ajay junior
Age     :   35
Address :   789 junior terrace

Row: 4
Name    :   Ajay III
Age     :   22
Address :   the last street

You can see that many of the data are wider than the associated header.

Here is a Ruby that correctly entables that data:

ruby -e '
data=$<.read.split(/\R{2,}/).
    map{|b| b.split(/\R/)}.
    map{|a| a[1..]}.
    map{|a| a.map{|s| s.split(/\s+:\s+/,2)}.to_h }
keys=data.map{|h| h.keys}.flatten.to_set
col_w=keys.zip([0]*keys.length).to_h
(data+[keys.zip(keys).to_h]).
    each{|h| h.each{|k,v| 
        col_w[k] = (v.length>col_w[k]) ? v.length : col_w[k] } 
}

puts keys.map{|k| k.center(col_w[k]) }.join(" ")
puts col_w.map{|k,v| "-"*v}.join(" ")
data.each{|h| puts h.map{|k,v| v.ljust(col_w[k]) }.join(" ") }
' file 

Prints:

      Name       Age      Address      
---------------- --- ------------------
Ajay the elder   65  123 elder street  
Ajay the younger 56  456 younger street
Ajay junior      35  789 junior terrace
Ajay III         22  the last street