awk not getting correct comparison result on data bash

Question

awk not getting correct comparison result on data bash

81 Views Asked by Shubham Saroj At 03 February 2022 at 13:53

I have a file file.txt,

ch.qos.logback:logback-classic 1.2.0
ch.qos.logback:logback-core 1.2.0
com.fasterxml.jackson.core:jackson-databind 2.10.0
com.fasterxml.jackson.core:jackson-databind 2.6.7
com.fasterxml.jackson.core:jackson-databind 2.6.7
com.fasterxml.jackson.core:jackson-databind 2.7.9
com.fasterxml.jackson.core:jackson-databind 2.8.11
com.fasterxml.jackson.core:jackson-databind 2.8.11
com.fasterxml.jackson.core:jackson-databind 2.8.9
com.fasterxml.jackson.core:jackson-databind 2.9.10
com.fasterxml.jackson.core:jackson-databind 2.9.10
com.fasterxml.jackson.core:jackson-databind 2.9.10
com.fasterxml.jackson.core:jackson-databind 2.9.5
com.fasterxml.jackson.core:jackson-databind 2.9.7
com.fasterxml.jackson.core:jackson-databind 2.9.8
com.fasterxml.jackson.core:jackson-databind 2.9.9
com.h2database:h2 2.0.206
com.h2database:h2 2.1.210
com.thoughtworks.xstream:xstream 1.4.11
com.thoughtworks.xstream:xstream 1.4.16
commons-collections:commons-collections 3.2.2
commons-fileupload:commons-fileupload 1.3.3
handlebars 4.3.0
handlebars 4.7.7
io.dropwizard:dropwizard-validation 1.3.21
io.netty:netty-all 4.1.44
io.netty:netty-codec 4.1.66
io.netty:netty-codec-http 4.1.44
io.vertx:vertx-web 3.5.4
net.minidev:json-smart 2.4.1
org.apache.hadoop:hadoop-common 0.23.4
org.apache.logging.log4j:log4j-core 2.3.1
org.apache.shiro:shiro-core 1.7.1
org.apache.shiro:shiro-web 1.5.2
org.apache.shiro:shiro-web 1.5.3
org.apache.shiro:shiro-web 1.7.1
org.apache.tomcat.embed:tomcat-embed-core 7.0.89
org.apache.tomcat.embed:tomcat-embed-core 9.0.31
org.eclipse.jetty:jetty-http 9.2.25
org.jasig.cas.client:cas-client-core 3.3.2
org.springframework.data:spring-data-commons 1.13.11
org.springframework.security.oauth:spring-security-oauth2 2.3.3
org.springframework:spring-web 5.3.0

I want to get the unique package names with the highest version mentioned in the file.

I tried awk as below but the result is not as expected,

cat file.txt | awk '$2 > a[$1]{a[$1] = $2} END{for (i in a) print i, a[i]}'

RESULT:
org.eclipse.jetty:jetty-http 9.2.25
io.netty:netty-codec 4.1.66
com.h2database:h2 2.1.210
org.jasig.cas.client:cas-client-core 3.3.2
org.apache.logging.log4j:log4j-core 2.3.1
io.vertx:vertx-web 3.5.4
handlebars 4.7.7
com.thoughtworks.xstream:xstream 1.4.16
ch.qos.logback:logback-core 1.2.0
net.minidev:json-smart 2.4.1
org.apache.shiro:shiro-core 1.7.1
commons-fileupload:commons-fileupload 1.3.3
org.springframework:spring-web 5.3.0
commons-collections:commons-collections 3.2.2
org.apache.shiro:shiro-web 1.7.1
com.fasterxml.jackson.core:jackson-databind 2.9.9
io.netty:netty-all 4.1.44
org.springframework.security.oauth:spring-security-oauth2 2.3.3
org.apache.hadoop:hadoop-common 0.23.4
io.dropwizard:dropwizard-validation 1.3.21
org.apache.tomcat.embed:tomcat-embed-core 9.0.31
org.springframework.data:spring-data-commons 1.13.11
ch.qos.logback:logback-classic 1.2.0
io.netty:netty-codec-http 4.1.44

But the result is not correct, like 2.9.0, 2.10.0 here it takes 2.9 as greater which is not expected.

Could you please help.

Original Q&A

There are 2 best solutions below

**Ed Morton** · Answer 1 · 2022-02-03T14:07:08.540000

The main problem with your script was $2 > a[$1] is doing a string (i.e. alphabetic, character by character) comparison instead of a version (i.e. numeric, dot-separated number by dot-separated number) comparison and so 10 comes before 9 since the first chars compared are 1 vs 9 and 1 is less than 9.

awk doesn't have a notion of "versions" so you'd have to code a version-comparison yourself in awk, but GNU sort has it built in so - using GNU sort for -V, "version sort":

$ sort -k2,2Vr file | awk '!seen[$1]++'
org.eclipse.jetty:jetty-http 9.2.25
org.apache.tomcat.embed:tomcat-embed-core 9.0.31
org.springframework:spring-web 5.3.0
handlebars 4.7.7
io.netty:netty-codec 4.1.66
io.netty:netty-all 4.1.44
io.netty:netty-codec-http 4.1.44
io.vertx:vertx-web 3.5.4
org.jasig.cas.client:cas-client-core 3.3.2
commons-collections:commons-collections 3.2.2
com.fasterxml.jackson.core:jackson-databind 2.10.0
net.minidev:json-smart 2.4.1
org.springframework.security.oauth:spring-security-oauth2 2.3.3
org.apache.logging.log4j:log4j-core 2.3.1
com.h2database:h2 2.1.210
org.springframework.data:spring-data-commons 1.13.11
org.apache.shiro:shiro-core 1.7.1
org.apache.shiro:shiro-web 1.7.1
com.thoughtworks.xstream:xstream 1.4.16
io.dropwizard:dropwizard-validation 1.3.21
commons-fileupload:commons-fileupload 1.3.3
ch.qos.logback:logback-classic 1.2.0
ch.qos.logback:logback-core 1.2.0
org.apache.hadoop:hadoop-common 0.23.4

or if you care about the output being sorted alphabetically too you can do either of these (the former holds every unique $1 in memory in the awk script while the latter holds just 2 $1 values at a time in memory in the awk script):

sort -k1,1 -k2,2Vr file | awk '!seen[$1]++'
sort -k1,1 -k2,2Vr file | awk '$1!=prev{print; prev=$1}'

For example:

$ sort -k1,1 -k2,2Vr file | awk '!seen[$1]++'
ch.qos.logback:logback-classic 1.2.0
ch.qos.logback:logback-core 1.2.0
com.fasterxml.jackson.core:jackson-databind 2.10.0
com.h2database:h2 2.1.210
com.thoughtworks.xstream:xstream 1.4.16
commons-collections:commons-collections 3.2.2
commons-fileupload:commons-fileupload 1.3.3
handlebars 4.7.7
io.dropwizard:dropwizard-validation 1.3.21
io.netty:netty-all 4.1.44
io.netty:netty-codec 4.1.66
io.netty:netty-codec-http 4.1.44
io.vertx:vertx-web 3.5.4
net.minidev:json-smart 2.4.1
org.apache.hadoop:hadoop-common 0.23.4
org.apache.logging.log4j:log4j-core 2.3.1
org.apache.shiro:shiro-core 1.7.1
org.apache.shiro:shiro-web 1.7.1
org.apache.tomcat.embed:tomcat-embed-core 9.0.31
org.eclipse.jetty:jetty-http 9.2.25
org.jasig.cas.client:cas-client-core 3.3.2
org.springframework.data:spring-data-commons 1.13.11
org.springframework.security.oauth:spring-security-oauth2 2.3.3
org.springframework:spring-web 5.3.0

**Shawn** · Answer 2 · 2022-02-03T15:29:40.583000

You could also do it with perl instead of awk, which has native version types that can be compared sensibly:

$ perl -Mversion -lane '
my $v = version->parse("v$F[1]");
$packages{$F[0]} = $v if $v > $packages{$F[0]};
END {
  foreach my $p (sort keys %packages) {
    print $p, "\t", substr($packages{$p}, 1)
  }
}' input.txt
ch.qos.logback:logback-classic  1.2.0
ch.qos.logback:logback-core 1.2.0
com.fasterxml.jackson.core:jackson-databind 2.10.0
com.h2database:h2   2.1.210
com.thoughtworks.xstream:xstream    1.4.16
commons-collections:commons-collections 3.2.2
commons-fileupload:commons-fileupload   1.3.3
handlebars  4.7.7
io.dropwizard:dropwizard-validation 1.3.21
io.netty:netty-all  4.1.44
io.netty:netty-codec    4.1.66
io.netty:netty-codec-http   4.1.44
io.vertx:vertx-web  3.5.4
net.minidev:json-smart  2.4.1
org.apache.hadoop:hadoop-common 0.23.4
org.apache.logging.log4j:log4j-core 2.3.1
org.apache.shiro:shiro-core 1.7.1
org.apache.shiro:shiro-web  1.7.1
org.apache.tomcat.embed:tomcat-embed-core   9.0.31
org.eclipse.jetty:jetty-http    9.2.25
org.jasig.cas.client:cas-client-core    3.3.2
org.springframework.data:spring-data-commons    1.13.11
org.springframework.security.oauth:spring-security-oauth2   2.3.3
org.springframework:spring-web  5.3.0

awk not getting correct comparison result on data bash

There are 2 best solutions below

Related Questions in BASH

Related Questions in SHELL

Related Questions in AWK

Related Questions in BC

Trending Questions

Popular # Hahtags

Popular Questions