Tiny Code

Tiny Code

Sunday, May 4, 2008

I/O Redirection

One of the most useful features of the Linux, Unix, MacOS, and to a lesser extent Windows (more on that later) is the concept of I/O redirection. In this discussion, we'll restrict ourselves to the pipe form of redirection which is invoked by the vertical bar character "|". This tells the command interpreter to take all output from the first part of the command and send (or pipe) it to the second part. The other characters which invoke I/O redirection are the less than "<" and the greater than ">" characters. Those are primarily used to send output to a file or cause the program to take its input from a file.

If you're a long time computer user, you may want to skip ahead to the examples below. You may have heard the term "I/O redirection" before but what does it really mean? I/O redirection gives you the ability to chance where a program's input and/or output is bound for. Normal command line programs have both their input, aka stdin (standard input), and output, aka stdout (standard output), directed to the console (which is just shorthand for saying the input comes from your computer's keyboard and the output goes to the portion of the screen where you're running the program). Note that if one of the portions of the command line produces errors, you may be surprised to find the error messages may not get redirected with the pipe command. This is because many Unix/Linux style programs also make use of a third I/O stream called stderr. WIthout taking special action, stderr output is almost always directed to the console to bring the error condition to the user's attention.

The simple example

For this example, let's suppose that you've got a huge tar file, aka tarball) which is really an archive file containing many other files. Now suppose you want to look to see whether it contains a text file but you really don't recall the name of the text file. Perhaps you recall something else about the text file such as it was located in the /projects directory. You could always get a listing of all the files within the tar file and manually search through them but computers were created to relieve users of the need to do such labor intensive tasks. How about we put I/O redirection to work?

To start with, we need to obtain a listing of all the files in our tar file which for purposes of illustration we'll call sample.tar. That can be accomplished with the command below.

tar -tvf sample.tar

That gives you a complete listing of all the files but chances are if it's a big tar file, the names and details of the files scrolled off the screen and perhaps overwhelmed even the scroll back buffer of your terminal window (aka command interpreter or shell). In any case, being lazy computer types we don't feel like searching through this huge amount of data.

The first thing we want to do is to weed out all the non-text files. Hopefully we've been disciplined about our file naming conventions and have added a ".txt" file extension to all our text files. So let's show only the files which end with that file extension. We'll use our old friend "grep" to match just the output lines which contain the string ".txt". Note that we're using the "-i" parameter to specify that we want to match the string ignoring the case of the letters in the string. This may be important if not everyone adding files to the tar file was careful about adding a .txt and not a .TXT file extension. Toy OSes like Windows don't make this distinction but you'll find they don't handle I/O redirection properly either. The Windows shell is very simplistic so even if you've added Linux style utilities like "tar" to its repertoire, you may be disappointed to find that it doesn't do multitasking. A proper OS will handle I/O redirection real time so when you type the command below, you'll see output data appear quickly. Windows creates a temporary file containing all output from the first part of the command which it then sends to the second part of the command once the first is completed. It makes the Windows command line feel much slower than it is and believe me it doesn't need much help. If you've ever manipulated large tar files on both Windows and Linux systems, you'll quickly discover that the Windows command line isn't performance oriented by any stretch of the imagination.

tar -tvf sample.tar | grep -i ".txt"

That gives us a listing of just the files which contain the string ".txt" which hopefully only appears as a file extension.

Something we notice about the output which makes life a bit tougher is that the tar shows you the file names and other details in the order they were added to the tar file. It would be nice if we could see the files sorted by the directory names in which they appear. Fortunately there's a simple solution to that desire.

tar -tvf sample.tar | grep -i ".txt" | sort

This command does the trick but it also illustrates some odd behavior. The output doesn't appear piecemeal the way it had been doing previously. If we think about it, the reason becomes obvious. The sort command can't really sort correctly unless all input to be sorted is present. So it must wait until the commands up to that point in the command line are complete before starting to sort the output.

Since we might be fans of the graphical version of the vim (vi improved) editor, we can add another labor saving twist to our command line. We can send the output of our command to gvim. This has the advantage of being able to search the output using editor commands. Doing this in gvim will also cause the search terms to be highlighted within the text making it much easier to pick out from the surrounding text.

tar -tvf sample.tar | grep -i ".txt" | sort | gvim -

Obviously the commands above were simple examples to make explanation easier. I'll add a few slightly more advanced examples below with a brief explanation of what they do. Once you get the hang of it, you'll find you quickly come to rely upon this powerful feature. Most of the Unix/Linux style command line utilities are written so they can be easily combined to create more powerful command lines similar or more sophisticated than the ones we've been exploring.

A few more advanced examples

The command below uses the find command to search for all files which end with the ".txt" file extension. It then searches them to see which of them contain the string "project". Note the "-l" parameter causes grep to only output the filenames which match the search criteria. If you omit the "-l" you'll see a list of matching lines from within the files. Also note the use of the xargs command which may seem unfamiliar. It's a method of appending multi-line output from previous commands to form arguments for the command specified after xargs.

find . -name "*.txt" | xargs grep -l project

This command does essentially the same thing but sends the list of matching file names to the vim editor. It issues the command to search for the string "project" so that term will be highlighted in the file and the cursor will be placed on the first occurrence within the first file.

find . -name "*.txt" | xargs grep -l project | xargs vim -c /project

Try coming up with ways to use I/O redirection which make your time at the computer easier. You'll be glad you did.

No comments: