awk not getting correct comparison result on data bash

81 Views Asked by At

I have a file file.txt,

ch.qos.logback:logback-classic 1.2.0
ch.qos.logback:logback-core 1.2.0
com.fasterxml.jackson.core:jackson-databind 2.10.0
com.fasterxml.jackson.core:jackson-databind 2.6.7
com.fasterxml.jackson.core:jackson-databind 2.6.7
com.fasterxml.jackson.core:jackson-databind 2.7.9
com.fasterxml.jackson.core:jackson-databind 2.8.11
com.fasterxml.jackson.core:jackson-databind 2.8.11
com.fasterxml.jackson.core:jackson-databind 2.8.9
com.fasterxml.jackson.core:jackson-databind 2.9.10
com.fasterxml.jackson.core:jackson-databind 2.9.10
com.fasterxml.jackson.core:jackson-databind 2.9.10
com.fasterxml.jackson.core:jackson-databind 2.9.5
com.fasterxml.jackson.core:jackson-databind 2.9.7
com.fasterxml.jackson.core:jackson-databind 2.9.8
com.fasterxml.jackson.core:jackson-databind 2.9.9
com.h2database:h2 2.0.206
com.h2database:h2 2.1.210
com.thoughtworks.xstream:xstream 1.4.11
com.thoughtworks.xstream:xstream 1.4.16
commons-collections:commons-collections 3.2.2
commons-fileupload:commons-fileupload 1.3.3
handlebars 4.3.0
handlebars 4.7.7
io.dropwizard:dropwizard-validation 1.3.21
io.netty:netty-all 4.1.44
io.netty:netty-codec 4.1.66
io.netty:netty-codec-http 4.1.44
io.vertx:vertx-web 3.5.4
net.minidev:json-smart 2.4.1
org.apache.hadoop:hadoop-common 0.23.4
org.apache.logging.log4j:log4j-core 2.3.1
org.apache.shiro:shiro-core 1.7.1
org.apache.shiro:shiro-web 1.5.2
org.apache.shiro:shiro-web 1.5.3
org.apache.shiro:shiro-web 1.7.1
org.apache.tomcat.embed:tomcat-embed-core 7.0.89
org.apache.tomcat.embed:tomcat-embed-core 9.0.31
org.eclipse.jetty:jetty-http 9.2.25
org.jasig.cas.client:cas-client-core 3.3.2
org.springframework.data:spring-data-commons 1.13.11
org.springframework.security.oauth:spring-security-oauth2 2.3.3
org.springframework:spring-web 5.3.0

I want to get the unique package names with the highest version mentioned in the file.

I tried awk as below but the result is not as expected,

cat file.txt | awk '$2 > a[$1]{a[$1] = $2} END{for (i in a) print i, a[i]}'

RESULT:
org.eclipse.jetty:jetty-http 9.2.25
io.netty:netty-codec 4.1.66
com.h2database:h2 2.1.210
org.jasig.cas.client:cas-client-core 3.3.2
org.apache.logging.log4j:log4j-core 2.3.1
io.vertx:vertx-web 3.5.4
handlebars 4.7.7
com.thoughtworks.xstream:xstream 1.4.16
ch.qos.logback:logback-core 1.2.0
net.minidev:json-smart 2.4.1
org.apache.shiro:shiro-core 1.7.1
commons-fileupload:commons-fileupload 1.3.3
org.springframework:spring-web 5.3.0
commons-collections:commons-collections 3.2.2
org.apache.shiro:shiro-web 1.7.1
com.fasterxml.jackson.core:jackson-databind 2.9.9
io.netty:netty-all 4.1.44
org.springframework.security.oauth:spring-security-oauth2 2.3.3
org.apache.hadoop:hadoop-common 0.23.4
io.dropwizard:dropwizard-validation 1.3.21
org.apache.tomcat.embed:tomcat-embed-core 9.0.31
org.springframework.data:spring-data-commons 1.13.11
ch.qos.logback:logback-classic 1.2.0
io.netty:netty-codec-http 4.1.44

But the result is not correct, like 2.9.0, 2.10.0 here it takes 2.9 as greater which is not expected.

Could you please help.

2

There are 2 best solutions below

3
Ed Morton On

The main problem with your script was $2 > a[$1] is doing a string (i.e. alphabetic, character by character) comparison instead of a version (i.e. numeric, dot-separated number by dot-separated number) comparison and so 10 comes before 9 since the first chars compared are 1 vs 9 and 1 is less than 9.

awk doesn't have a notion of "versions" so you'd have to code a version-comparison yourself in awk, but GNU sort has it built in so - using GNU sort for -V, "version sort":

$ sort -k2,2Vr file | awk '!seen[$1]++'
org.eclipse.jetty:jetty-http 9.2.25
org.apache.tomcat.embed:tomcat-embed-core 9.0.31
org.springframework:spring-web 5.3.0
handlebars 4.7.7
io.netty:netty-codec 4.1.66
io.netty:netty-all 4.1.44
io.netty:netty-codec-http 4.1.44
io.vertx:vertx-web 3.5.4
org.jasig.cas.client:cas-client-core 3.3.2
commons-collections:commons-collections 3.2.2
com.fasterxml.jackson.core:jackson-databind 2.10.0
net.minidev:json-smart 2.4.1
org.springframework.security.oauth:spring-security-oauth2 2.3.3
org.apache.logging.log4j:log4j-core 2.3.1
com.h2database:h2 2.1.210
org.springframework.data:spring-data-commons 1.13.11
org.apache.shiro:shiro-core 1.7.1
org.apache.shiro:shiro-web 1.7.1
com.thoughtworks.xstream:xstream 1.4.16
io.dropwizard:dropwizard-validation 1.3.21
commons-fileupload:commons-fileupload 1.3.3
ch.qos.logback:logback-classic 1.2.0
ch.qos.logback:logback-core 1.2.0
org.apache.hadoop:hadoop-common 0.23.4

or if you care about the output being sorted alphabetically too you can do either of these (the former holds every unique $1 in memory in the awk script while the latter holds just 2 $1 values at a time in memory in the awk script):

sort -k1,1 -k2,2Vr file | awk '!seen[$1]++'
sort -k1,1 -k2,2Vr file | awk '$1!=prev{print; prev=$1}'

For example:

$ sort -k1,1 -k2,2Vr file | awk '!seen[$1]++'
ch.qos.logback:logback-classic 1.2.0
ch.qos.logback:logback-core 1.2.0
com.fasterxml.jackson.core:jackson-databind 2.10.0
com.h2database:h2 2.1.210
com.thoughtworks.xstream:xstream 1.4.16
commons-collections:commons-collections 3.2.2
commons-fileupload:commons-fileupload 1.3.3
handlebars 4.7.7
io.dropwizard:dropwizard-validation 1.3.21
io.netty:netty-all 4.1.44
io.netty:netty-codec 4.1.66
io.netty:netty-codec-http 4.1.44
io.vertx:vertx-web 3.5.4
net.minidev:json-smart 2.4.1
org.apache.hadoop:hadoop-common 0.23.4
org.apache.logging.log4j:log4j-core 2.3.1
org.apache.shiro:shiro-core 1.7.1
org.apache.shiro:shiro-web 1.7.1
org.apache.tomcat.embed:tomcat-embed-core 9.0.31
org.eclipse.jetty:jetty-http 9.2.25
org.jasig.cas.client:cas-client-core 3.3.2
org.springframework.data:spring-data-commons 1.13.11
org.springframework.security.oauth:spring-security-oauth2 2.3.3
org.springframework:spring-web 5.3.0
0
Shawn On

You could also do it with perl instead of awk, which has native version types that can be compared sensibly:

$ perl -Mversion -lane '
my $v = version->parse("v$F[1]");
$packages{$F[0]} = $v if $v > $packages{$F[0]};
END {
  foreach my $p (sort keys %packages) {
    print $p, "\t", substr($packages{$p}, 1)
  }
}' input.txt
ch.qos.logback:logback-classic  1.2.0
ch.qos.logback:logback-core 1.2.0
com.fasterxml.jackson.core:jackson-databind 2.10.0
com.h2database:h2   2.1.210
com.thoughtworks.xstream:xstream    1.4.16
commons-collections:commons-collections 3.2.2
commons-fileupload:commons-fileupload   1.3.3
handlebars  4.7.7
io.dropwizard:dropwizard-validation 1.3.21
io.netty:netty-all  4.1.44
io.netty:netty-codec    4.1.66
io.netty:netty-codec-http   4.1.44
io.vertx:vertx-web  3.5.4
net.minidev:json-smart  2.4.1
org.apache.hadoop:hadoop-common 0.23.4
org.apache.logging.log4j:log4j-core 2.3.1
org.apache.shiro:shiro-core 1.7.1
org.apache.shiro:shiro-web  1.7.1
org.apache.tomcat.embed:tomcat-embed-core   9.0.31
org.eclipse.jetty:jetty-http    9.2.25
org.jasig.cas.client:cas-client-core    3.3.2
org.springframework.data:spring-data-commons    1.13.11
org.springframework.security.oauth:spring-security-oauth2   2.3.3
org.springframework:spring-web  5.3.0