I have a file named a.csv. which contains
100008,3
10000,3
100010,5
100010,4
10001,6
100021,7
After running this command sort -k1 -d -t "," a.csv
The result is
10000,3
100008,3
100010,4
100010,5
10001,6
100021,7
Which is unexpected because 10001 should come first than 100010
Trying to understand why this happened from long time. but couldn't get any answers.
$ sort --version
sort (GNU coreutils) 8.13
Copyright (C) 2011 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Written by Mike Haertel and Paul Eggert.
Some of the other responses have assumed this is a numeric sort vs dictionary sort problem. It isn't, as even sorting alphabetically the output given in the question is incorrect.
The answer
To get the correct sorting, you need to change
-k1to-k1,1:The reason
The
-koption takes two numbers, the start and end fields to sort (i.e.-ks,ewheresis the start andeis the end). By default, the end field is the end of the line. Hence,-k1is the same as not giving the-koption at all. To show this, compare:with:
The first sorts
a,1beforeaa,2, while the second sortsaa~2beforea~1since, in ASCII,,<a<~.To get the desired behaviour, therefore, we need to sort only one field. In your case, that means using 1 as both the start and end field, so you specify
-k1,1. If you try the two examples above with-k2,2instead of-k2, you'll find you get the same (correct) ordering in both cases.Many thanks to Eric and Assaf from the coreutils mailing list for pointing this out.