I am trying to make an awk like tool that uses Rebol 3 to process bigger text files with bash pipes and tools. I am having a problem reading STDIN line by line in Rebol 3?
For example this shell command produces 3 lines:
$ (echo "first line" ; echo "second line" ; echo "third line" )
first line
second line
third line
But the Rebol's input word reads all 3 lines at the same time. I would expect it to stop at newline as it stops if you use input interactively.
r3 --do 'while [ x: input ] [ if empty? x [ break ] print x print "***" ]'
abcdef
abcdef
***
blabla
blabla
***
But when I run it all together it reads whole input at once. I could read it all at once and split into lines, but I want it to work in a "streaming" manner as I usually cat in many 1000-s of lines.
$ (echo "first line" ; echo "second line" ; echo "third line" ) \
| r3 --do 'while [ x: input ] [ if empty? x [ break ] print x print "***" ]'
first linesecond linethird line
***
I also looked at source of input to make a similar function. I could read in character per character in a while loop and check for newlines but that doesn't seem efficient.
I figured it out and it seems to work well even on big, 10000 lines files. It's could be written more elegantly and improved though.
The function r3awk takes STDIN and a block of code that it executes per line, binding line variable to it:
It works like this. read/lines reads a number of characters from the stream and returns a block of lines. Each time it's called it reads a next batch of characters like this, so it's all wrapped in a while loop. The code processes (do-es the code block) as the while loops (not at the end).
The batch of characters doesn't end on newline so last line is partial each time. And so is the first line in the next batch, hence it joins them together. At the end it has to process the last (this time non-partial) line. Try is there because some lines caused utf encoding errors.
It can be used like this in command line:
Things to improve: make function generally better, deduplicate some code. Check what happens if the read/lines does end exactly on the newline.