Tiny Code

Tiny Code

Monday, June 2, 2008

quick and dirty shell command

Today I was working on some old code at work. I discovered at least one duplicate include file which is a personal pet peeve. It's far too easy to allow multiple include files get out of sync so you have different versions for different source files.

What I needed was a quick way of finding all the duplicated include files within this project directory (and subdirectories). It turns out stringing together a few Unix/Linux/Mac OS commands with some I/O redirection makes this task pretty easy.

The first thing we need is to be able to locate all the include files. In the C programming language, these files typically end with the ".h" file extension. We can use the find command to give us a list of the files which end with .h.

The next problem to be solved is that the matching files will have not only their filenames but also the directory in which they're located printed out. So we need a way of extracting just the "base" filename. Fortunately bash has any easy method of accomplishing this with the basename command.

The next logical step in figuring out whether there are duplicate filenames is to sort the matching filenames to make it easier to see matches with the sort command.

Finally we can use the uniq command to show just the filenames which appear more than once. The uniq command has other options. You can choose to show just items which are unique as well.

If we put all the portions of this command together, we come up with the following command. It's doing a lot of work to save us the trouble of manually sifting through all the filenames ourselves. That's what computers are supposed to do for us, eh?

find . -name "*.h" -print | xargs basename | sort | uniq -d

No comments: