I have a file file.txt,
ch.qos.logback:logback-classic 1.2.0
ch.qos.logback:logback-core 1.2.0
com.fasterxml.jackson.core:jackson-databind 2.10.0
com.fasterxml.jackson.core:jackson-databind 2.6.7
com.fasterxml.jackson.core:jackson-databind 2.6.7
com.fasterxml.jackson.core:jackson-databind 2.7.9
com.fasterxml.jackson.core:jackson-databind 2.8.11
com.fasterxml.jackson.core:jackson-databind 2.8.11
com.fasterxml.jackson.core:jackson-databind 2.8.9
com.fasterxml.jackson.core:jackson-databind 2.9.10
com.fasterxml.jackson.core:jackson-databind 2.9.10
com.fasterxml.jackson.core:jackson-databind 2.9.10
com.fasterxml.jackson.core:jackson-databind 2.9.5
com.fasterxml.jackson.core:jackson-databind 2.9.7
com.fasterxml.jackson.core:jackson-databind 2.9.8
com.fasterxml.jackson.core:jackson-databind 2.9.9
com.h2database:h2 2.0.206
com.h2database:h2 2.1.210
com.thoughtworks.xstream:xstream 1.4.11
com.thoughtworks.xstream:xstream 1.4.16
commons-collections:commons-collections 3.2.2
commons-fileupload:commons-fileupload 1.3.3
handlebars 4.3.0
handlebars 4.7.7
io.dropwizard:dropwizard-validation 1.3.21
io.netty:netty-all 4.1.44
io.netty:netty-codec 4.1.66
io.netty:netty-codec-http 4.1.44
io.vertx:vertx-web 3.5.4
net.minidev:json-smart 2.4.1
org.apache.hadoop:hadoop-common 0.23.4
org.apache.logging.log4j:log4j-core 2.3.1
org.apache.shiro:shiro-core 1.7.1
org.apache.shiro:shiro-web 1.5.2
org.apache.shiro:shiro-web 1.5.3
org.apache.shiro:shiro-web 1.7.1
org.apache.tomcat.embed:tomcat-embed-core 7.0.89
org.apache.tomcat.embed:tomcat-embed-core 9.0.31
org.eclipse.jetty:jetty-http 9.2.25
org.jasig.cas.client:cas-client-core 3.3.2
org.springframework.data:spring-data-commons 1.13.11
org.springframework.security.oauth:spring-security-oauth2 2.3.3
org.springframework:spring-web 5.3.0
I want to get the unique package names with the highest version mentioned in the file.
I tried awk as below but the result is not as expected,
cat file.txt | awk '$2 > a[$1]{a[$1] = $2} END{for (i in a) print i, a[i]}'
RESULT:
org.eclipse.jetty:jetty-http 9.2.25
io.netty:netty-codec 4.1.66
com.h2database:h2 2.1.210
org.jasig.cas.client:cas-client-core 3.3.2
org.apache.logging.log4j:log4j-core 2.3.1
io.vertx:vertx-web 3.5.4
handlebars 4.7.7
com.thoughtworks.xstream:xstream 1.4.16
ch.qos.logback:logback-core 1.2.0
net.minidev:json-smart 2.4.1
org.apache.shiro:shiro-core 1.7.1
commons-fileupload:commons-fileupload 1.3.3
org.springframework:spring-web 5.3.0
commons-collections:commons-collections 3.2.2
org.apache.shiro:shiro-web 1.7.1
com.fasterxml.jackson.core:jackson-databind 2.9.9
io.netty:netty-all 4.1.44
org.springframework.security.oauth:spring-security-oauth2 2.3.3
org.apache.hadoop:hadoop-common 0.23.4
io.dropwizard:dropwizard-validation 1.3.21
org.apache.tomcat.embed:tomcat-embed-core 9.0.31
org.springframework.data:spring-data-commons 1.13.11
ch.qos.logback:logback-classic 1.2.0
io.netty:netty-codec-http 4.1.44
But the result is not correct, like 2.9.0, 2.10.0 here it takes 2.9 as greater which is not expected.
Could you please help.
The main problem with your script was
$2 > a[$1]is doing a string (i.e. alphabetic, character by character) comparison instead of a version (i.e. numeric, dot-separated number by dot-separated number) comparison and so10comes before9since the first chars compared are1vs9and1is less than9.awk doesn't have a notion of "versions" so you'd have to code a version-comparison yourself in awk, but GNU sort has it built in so - using GNU sort for
-V, "version sort":or if you care about the output being sorted alphabetically too you can do either of these (the former holds every unique
$1in memory in the awk script while the latter holds just 2 $1 values at a time in memory in the awk script):For example: