Bruised Fancy

This blog gives me a place to comment on things which strike my fancy, hence the title. Topics may include computer software/hardware, science, space, beer, books/movies/television programs of a geeky nature, or almost anything else. It is not marked as containing adult content but be warned that I occasionally post about beer and sometimes forget to watch my language. I've been writing systems software since the days of core memory, paper tape, and front panel lights/switches.

Sunday, July 06, 2008

Recovering data from a failing hard drive

I've been trying to recover the data off a failing hard drive for a family member. I've found a few programs which claim to be able to do just that but they always get hung up by the numerous retries the drive keeps doing in the failing areas. Then I came up with the idea of using the dd command to make a copy of drive image which I could then manipulate having gotten the retries out of the way during the initial copy process. I'd used dd pretty heavily during the development of an SD card driver I'd done at my last company. Once the drive image has been copied to a file, the resulting image file can be mounted using the mount command... well it can on Linux and Mac OS X at least. You poor folks running Windows are out of luck.

After looking around on the web, I discovered a great little program called dd_rescue which does intelligent retries if errors are encountered, slowly lowering the block size being requested to find the boundaries of the affected area. I think the standard dd command would try to do retries until the read worked or until the copy was aborted. dd_rescue also allows an offset to be specified when the command is invoked so the copy may be done in several stages. Since it's taken about 4 hours, off and on, to copy the first 33 GB from the failing 60 GB drive, I'm anticipating having to make heavy use of this feature to complete the copy process over the next day or two.

I made a few minor changes to the source to allow me to curtail the retries to speed up the copy. So far it's copied about 32 GB from the failing 60 GB drive. Once the data has been copied then I'll start trying to recover files from it. Wish me luck, I think I'm going to need it!

Monday, June 02, 2008

quick and dirty shell command

Today I was working on some old code at work. I discovered at least one duplicate include file which is a personal pet peeve. It's far too easy to allow multiple include files get out of sync so you have different versions for different source files.

What I needed was a quick way of finding all the duplicated include files within this project directory (and subdirectories). It turns out stringing together a few Unix/Linux/Mac OS commands with some I/O redirection makes this task pretty easy.

The first thing we need is to be able to locate all the include files. In the C programming language, these files typically end with the ".h" file extension. We can use the find command to give us a list of the files which end with .h.

The next problem to be solved is that the matching files will have not only their filenames but also the directory in which they're located printed out. So we need a way of extracting just the "base" filename. Fortunately bash has any easy method of accomplishing this with the basename command.

The next logical step in figuring out whether there are duplicate filenames is to sort the matching filenames to make it easier to see matches with the sort command.

Finally we can use the uniq command to show just the filenames which appear more than once. The uniq command has other options. You can choose to show just items which are unique as well.

If we put all the portions of this command together, we come up with the following command. It's doing a lot of work to save us the trouble of manually sifting through all the filenames ourselves. That's what computers are supposed to do for us, eh?

find . -name "*.h" -print | xargs basename | sort | uniq -d

Sunday, June 01, 2008

Palm Centro

I've used a Palm PDA without interruption since the first one was introduced in 1996. I was working for U.S. Robotics which owned Palm at the time and the employee pricing helped me decide to take the plunge into PDA life. After all these years, I've come to rely heavily upon a few key PDA applications (in addition to the standard PDA applications).

I use SplashID to securely store the multitude of passwords I need to remember both at work and at home. Without it, I'd have to resort to using weak passwords in order to stand a chance of remembering them all which compromises security.

I use SplashMoney to record credit card transactions while I'm away from my computer. This ensures I stay within budget and helps guarantee that I recognize any erroneous charges which might pop up.

JFile is invaluable for storing databases I design myself. I use this to keep track of all manner of data such as books I've got and those I'm interested in reading. Before I did this, I occasionally bought multiple copies of a book.

SlovoEd is a portable dictionary which allows me to look up words I don't recognize when reading without a print dictionary handy. My Centro takes up a lot less space on my nightstand than a conventional dictionary.

Adobe Reader for Palm allows me to read PDF documents on my PDA. This is handy to ready books in non-traditional settings. It's nice always having a book handy to read for those occasions when unexectedly left with extra time to kill.

A couple months ago, the time seemed ideal to upgrade my phone and PDA. My wife's phone was acting up and my stepdaughter wanted a cheap PDA. So it made sense to get a device which fulfilled both those functions for me, freeing up my phone and PDA for them. This also had the added benefit of allowing me to pare down the devices I carried from two to just a single gadget.

The Palm Centro is smaller than I expected but keyboard surprisingly useable. The software upgrades work to make the smaller sized device more intuitive to use than older Palm devices. Is it perfect? No, but it does seem a better compromise device than the other affordable multi-use devices I've seen.

If you're interested in an affordable combination mobile phone and PDA device, check out this review from Engadget.

Sunday, May 04, 2008

I/O Redirection

One of the most useful features of the Linux, Unix, MacOS, and to a lesser extent Windows (more on that later) is the concept of I/O redirection. In this discussion, we'll restrict ourselves to the pipe form of redirection which is invoked by the vertical bar character "|". This tells the command interpreter to take all output from the first part of the command and send (or pipe) it to the second part. The other characters which invoke I/O redirection are the less than "<" and the greater than ">" characters. Those are primarily used to send output to a file or cause the program to take its input from a file.

If you're a long time computer user, you may want to skip ahead to the examples below. You may have heard the term "I/O redirection" before but what does it really mean? I/O redirection gives you the ability to chance where a program's input and/or output is bound for. Normal command line programs have both their input, aka stdin (standard input), and output, aka stdout (standard output), directed to the console (which is just shorthand for saying the input comes from your computer's keyboard and the output goes to the portion of the screen where you're running the program). Note that if one of the portions of the command line produces errors, you may be surprised to find the error messages may not get redirected with the pipe command. This is because many Unix/Linux style programs also make use of a third I/O stream called stderr. WIthout taking special action, stderr output is almost always directed to the console to bring the error condition to the user's attention.

The simple example

For this example, let's suppose that you've got a huge tar file, aka tarball) which is really an archive file containing many other files. Now suppose you want to look to see whether it contains a text file but you really don't recall the name of the text file. Perhaps you recall something else about the text file such as it was located in the /projects directory. You could always get a listing of all the files within the tar file and manually search through them but computers were created to relieve users of the need to do such labor intensive tasks. How about we put I/O redirection to work?

To start with, we need to obtain a listing of all the files in our tar file which for purposes of illustration we'll call sample.tar. That can be accomplished with the command below.

tar -tvf sample.tar

That gives you a complete listing of all the files but chances are if it's a big tar file, the names and details of the files scrolled off the screen and perhaps overwhelmed even the scroll back buffer of your terminal window (aka command interpreter or shell). In any case, being lazy computer types we don't feel like searching through this huge amount of data.

The first thing we want to do is to weed out all the non-text files. Hopefully we've been disciplined about our file naming conventions and have added a ".txt" file extension to all our text files. So let's show only the files which end with that file extension. We'll use our old friend "grep" to match just the output lines which contain the string ".txt". Note that we're using the "-i" parameter to specify that we want to match the string ignoring the case of the letters in the string. This may be important if not everyone adding files to the tar file was careful about adding a .txt and not a .TXT file extension. Toy OSes like Windows don't make this distinction but you'll find they don't handle I/O redirection properly either. The Windows shell is very simplistic so even if you've added Linux style utilities like "tar" to its repertoire, you may be disappointed to find that it doesn't do multitasking. A proper OS will handle I/O redirection real time so when you type the command below, you'll see output data appear quickly. Windows creates a temporary file containing all output from the first part of the command which it then sends to the second part of the command once the first is completed. It makes the Windows command line feel much slower than it is and believe me it doesn't need much help. If you've ever manipulated large tar files on both Windows and Linux systems, you'll quickly discover that the Windows command line isn't performance oriented by any stretch of the imagination.

tar -tvf sample.tar | grep -i ".txt"

That gives us a listing of just the files which contain the string ".txt" which hopefully only appears as a file extension.

Something we notice about the output which makes life a bit tougher is that the tar shows you the file names and other details in the order they were added to the tar file. It would be nice if we could see the files sorted by the directory names in which they appear. Fortunately there's a simple solution to that desire.

tar -tvf sample.tar | grep -i ".txt" | sort

This command does the trick but it also illustrates some odd behavior. The output doesn't appear piecemeal the way it had been doing previously. If we think about it, the reason becomes obvious. The sort command can't really sort correctly unless all input to be sorted is present. So it must wait until the commands up to that point in the command line are complete before starting to sort the output.

Since we might be fans of the graphical version of the vim (vi improved) editor, we can add another labor saving twist to our command line. We can send the output of our command to gvim. This has the advantage of being able to search the output using editor commands. Doing this in gvim will also cause the search terms to be highlighted within the text making it much easier to pick out from the surrounding text.

tar -tvf sample.tar | grep -i ".txt" | sort | gvim -

Obviously the commands above were simple examples to make explanation easier. I'll add a few slightly more advanced examples below with a brief explanation of what they do. Once you get the hang of it, you'll find you quickly come to rely upon this powerful feature. Most of the Unix/Linux style command line utilities are written so they can be easily combined to create more powerful command lines similar or more sophisticated than the ones we've been exploring.

A few more advanced examples

The command below uses the find command to search for all files which end with the ".txt" file extension. It then searches them to see which of them contain the string "project". Note the "-l" parameter causes grep to only output the filenames which match the search criteria. If you omit the "-l" you'll see a list of matching lines from within the files. Also note the use of the xargs command which may seem unfamiliar. It's a method of appending multi-line output from previous commands to form arguments for the command specified after xargs.

find . -name "*.txt" | xargs grep -l project

This command does essentially the same thing but sends the list of matching file names to the vim editor. It issues the command to search for the string "project" so that term will be highlighted in the file and the cursor will be placed on the first occurrence within the first file.

find . -name "*.txt" | xargs grep -l project | xargs vim -c /project

Try coming up with ways to use I/O redirection which make your time at the computer easier. You'll be glad you did.

Friday, April 11, 2008

I knew it!

Here's an interesting article about Ernest Hemingway. The part I find most interesting occurs on page 4 where Hemingway that the symbolism which English teachers so often attribute to stories is not premeditated. He states "No good book has ever been written that has in it symbols arrived at beforehand and stuck in".

This quote supports my long held belief that the symbolism English teachers claim to find in books was usually not intended by the author and as such is entirely subjective. In school I always hated being criticized by an English teacher for not seeing the symbolism they claim is the "only" valid interpretation. Frequently these teachers would speak as if they had some sort of notebook from the author containing their secret thoughts about hidden subtext they had woven into their novel. What a crock!

Sunday, February 10, 2008

Backing up data

I've been struggling trying to find a decent solution for backing up my wife's laptop computer. The program I'd been using ended up not backing up some key files. We almost ended up losing all her photos when the hard disk in her laptop started dying recently. It's never a good sign when you hear clicking noises when you try to list a directory. Fortunately the use of SpinRite and a little luck allowed the drive to continue functioning long enough to manually copy the files to an external drive.

After trying a number of different backup programs and not being fully satisfied with any of them, I decided to use something simple. The tar program has been around since the early days of Unix. Tar stands for Tape ARchiver and it was originally used to archive a group of files to a tape device. The beauty of tar is there are versions available for almost any operating system you can think of. That makes it easy to examine the tar files on any system to verify I've backed up all the files which needed backing up. I'd long ago installed cygwin (a version of the most common Unix utilities for Windows) on her laptop so I was good to go. Cygwin can't fix all of Windows' shortcomings but it is able to make Windows much more useful. The Windows command line tools are so woefully underpowered that I no longer consider using them for anything.

Having decided to use tar, there were still a couple other problems which needed to be solved. The resulting tar file for even a partial backup is likely to be quite big. By default tar concatenates all the files with some file information (file name, size, permissions, etc) about each file added. So using gzip to compress the tar files is highly recommended to avoid using too much space on the backup device.

The gzipped tar file for the backup of just the data files on her laptop ends up having a size of just over 11 GB. The external USB hard drive and NAS (network attached storage) drive we have are both formatted as FAT32 to make it easy to use on Windows, Mac, and Linux systems. That presents a problem since FAT32 drives have a maximum file size of 4 GB. So I was forced to use the Unix split program to split the huge tar file into smaller files which can be copied to a FAT32 drive.

After building the tar file, I was able to dump a list of all the files contained in the tar file. I was also to use the find command (the cygwin/GNU version not the lame Dos/Windows version) to build a list of all the files on her hard disk. Then it was a simple matter to use grep to get a list of all the JPG, GIF, DOC, etc files in both the listing of all files and the listing of all files in the tar file. That made it easy to verify that I've managed to back up all the data files.

Whew! It was a lot of work but now I can finally rest easier knowing all her data files are safely backed up.

Sunday, February 03, 2008

Bad advice from an IT guy

A relative called recently asking about her friend's laptop computer which was running slowly. Her friend had asked someone from the IT department at work how to make the laptop run quicker. The IT guy's response was to suggest they replace the hard disk with a faster model.

This is wrong on so many levels that it makes my head hurt. First, the IT tech didn't ask any questions to determine what the underlying cause of the slowdown might be. Laptop computers typically come with relatively slow hard drives since lower RPM drives create less heat. Laptops always have trouble dissipating heat because of the small cases. Chances are a faster drive may not be available or at least may not be affordable for the average user. A slow hard drive typically only causes delays in one of two circumstances: loading programs and reading or writing data files. Those two cases comprise a fairly small percentage of the overall usage time and will most likely not produce a noticeable delay.

A better approach to speeding up an older computer is to add more memory. Application software always seems to get larger over time. Data files also have a tendency to grow with use. Users also tend to use more applications simultaneously as they get more sophisticated. All of these conditions probably require more memory than originally came with the laptop. When the laptop doesn't have enough physical memory, Windows will be forced to swap unused applications and portions of the data files out to the swap file on the disk. Hard disk accesses are always much slower than memory accesses.

Sadly, bad advice like this is not at all uncommon. Hang around in the computer section at any big box electronic store any you'll undoubtedly hear something similar. Amazingly enough, $8.50 an hour and a few months of experience doesn't always produce quality technical advice. Imagine that...