Process substitution
Remarks#
Process substitution is a form of redirection where the input or output of a process (some sequence of commands) appear as a temporary file.
Compare two files from the web
The following compares two files with diff
using process substitution instead of creating temporary files.
diff <(curl https://www.example.com/page1) <(curl https://www.example.com/page2)
Feed a while loop with the output of a command
This feeds a while
loop with the output of a grep
command:
while IFS=":" read -r user _
do
# "$user" holds the username in /etc/passwd
done < <(grep "hello" /etc/passwd)
With paste command
# Process substitution with paste command is common
# To compare the contents of two directories
paste <( ls /path/to/directory1 ) <( ls /path/to/directory1 )
Concatenating files
It is well known that you cannot use the same file for input and ouput in the same command. For instance,
$ cat header.txt body.txt >body.txt
doesn’t do what you want. By the time cat
reads body.txt
, it has already been truncated by the redirection and it is empty. The final result will be that body.txt
will hold the contents of header.txt
only.
One might think to avoid this with process substitution, that is, that the command
$ cat header.txt <(cat body.txt) > body.txt
will force the original contents of body.txt
to be somehow saved in some buffer somewhere before the file is truncated by the redirection. It doesn’t work. The cat
in parentheses begins reading the file only after all file descriptors have been set up, just like the outer one. There is no point in trying to use process substitution in this case.
The only way to prepend a file to another file is to create an intermediate one:
$ cat header.txt body.txt >body.txt.new
$ mv body.txt.new body.txt
which is what sed
or perl
or similar programs do under the carpet when called with an edit-in-place option (usually -i
).
Stream a file through multiple programs at once
This counts the number of lines in a big file with wc -l
while simultaneously compressing it with gzip
. Both run concurrently.
tee >(wc -l >&2) < bigfile | gzip > bigfile.gz
Normally tee
writes its input to one or more files (and stdout). We can write to commands instead of files with tee >(command)
.
Here the command wc -l >&2
counts the lines read from tee
(which in turn is reading from bigfile
). (The line count is sent to stderr (>&2
) to avoid mixing with the input to gzip
.) The stdout of tee
is simultaneously fed into gzip
.
To avoid usage of a sub-shell
One major aspect of process substitution is that it lets us avoid usage of a sub-shell when piping commands from the shell.
This can be demonstrated with a simple example below. I have the following files in my current folder:
$ find . -maxdepth 1 -type f -print
foo bar zoo foobar foozoo barzoo
If I pipe to a while
/read
loop that increments a counter as follows:
count=0
find . -maxdepth 1 -type f -print | while IFS= read -r _; do
((count++))
done
$count
now does not contain 6
, because it was modified in the sub-shell context. Any of the commands shown below are run in a sub-shell context and the scope of the variables used within are lost after the sub-shell terminates.
command &
command | command
( command )
Process substitution will solve the problem by avoiding use the of pipe |
operator as in
count=0
while IFS= read -r _; do
((count++))
done < <(find . -maxdepth 1 -type f -print)
This will retain the count
variable value as no sub-shells are invoked.