Overview
Recently Ken Fallon did a show on HPR, number
3962, in which he used a Bash
pipeline of multiple commands feeding their output into a
while loop. In the loop he processed the lines produced by
the pipeline and used what he found to download audio files belonging to
a series with wget.
This was a great show and contained some excellent advice, but the
use of the format:
pipeline | while read variable; do ...
reminded me of the "gotcha" I mentioned in my own show
2699.
I thought it might be a good time to revisit this subject.
So, what's the problem?
The problem can be summarised as a side effect of pipelines.
What are pipelines?
Pipelines are an amazingly useful feature of Bash (and other shells).
The general format is:
command1 | command2 ...
Here command1 runs in a subshell and produces output (on
its standard output) which is connected via the pipe symbol
(|) to command2 where it becomes its
standard input. Many commands can be linked together in this
way to achieve some powerful combined effects.
A very simple example of a pipeline might be:
$ printf 'World\nHello\n' | sort
Hello
World
The printf command (≡'command1') writes two
lines (separated by newlines) on standard output and this is
passed to the sort command's standard input
(≡'command2') which then sorts these lines
alphabetically.
Commands in the pipeline can be more complex than this, and in the
case we are discussing we can include a loop command such as
while.
For example:
$ printf 'World\nHello\n' | sort | while read line; do echo "($line)"; done
(Hello)
(World)
Here, each line output by the sort command is read into
the variable line in the while loop and is
written out enclosed in parentheses.
Note that the loop is written on one line. The semi-colons are used
instead of the equivalent newlines.
Variables and subshells
What if the lines output by the loop need to be numbered?
$ i=0; printf 'World\nHello\n' | sort | while read line; do ((i++)); echo "$i) $line"; done
1) Hello
2) World
Here the variable 'i' is set to zero before the
pipeline. It could have been done on the line before of course. In the
while loop the variable is incremented on each iteration
and included in the output.
You might expect 'i' to be 2 once the loop exits but it
is not. It will be zero in fact.
The reason is that there are two 'i' variables. One is
created when it's set to zero at the start before the pipeline. The
other one is created in the loop as a "clone". The expression:
((i++))
both creates the variable (where it is a copy of the one in the
parent shell) and increments it.
When the subshell in which the loop runs completes, it will delete
this version of 'i' and the original one will simply
contain the zero that it was originally set to.
You can see what happens in this slightly different example:
$ i=1; printf 'World\nHello\n' | sort | while read line; do ((i++)); echo "$i) $line"; done
2) Hello
3) World
$ echo $i
1
These examples are fine, assum