I have a text or log file which typically looks like this:
First line which is also a paragraph.
Another line that is its own paragraph.
etc. etc.
but occasionally it has some spill-over into a multi-line paragrah:
First line which is also a paragraph.
Another line that is its own paragraph.
Now, this paragraph encompasses more than a single line
with its second line onwards being indented by spaces
to distinguish it from the paragraph-opener, although
it could just as well have been tabs etc.
And this is another paragraph.
I would like to sort these paragraphs lexicographically; and I don't mind if it's by the first line only or the entire paragraph. If these were one-liner paragraphs - then Bob's your uncle, we got sort. But what do I do otherwise?
I know that, in principle, I could:
- Define an escaping scheme
- Escape newlines which are followed by white space (and escape the escaping character itself)
- Sort the resulting one-line-per-paragraph file
- Un-escape
but this seems a bit cumbersome. Can I do better?
Notes:
- I realize this is doable in a straightforward way using an awk or perl script, but the closer an answer is to a one-liner, the better.
- You may make reasonable assumptions in your answer, such as the GNU variants of certain tools, or POSIX compatibility, or minimum versions of tools etc. But - please make them explicitly.
A few ways:
A one-liner pipeline that uses
perlto read the entire file, and insert a 0 byte between paragraphs (Defined as a newline immediately followed by a non-whitespace character),sortto sort them, and finallytrto remove those 0 bytes again from the final output. Basically a simpler version of your idea.(Does require a version of
sortthat understands the-zoption)Or a pure perl one-liner that's a bit more verbose but does it all in one process without a pipeline or needing non-POSIX
sortoptions:Similar approach using GNU
awkinstead:Or if you're okay installing extra stuff, I found a nifty looking program written in
perlcalledptp(Install through your OS package manager if available or withcpan App::PTP/cpanm App::PTP/other preferred CPAN client):