Getting in Sync
Using recursion to compare directories in VB.NET
Introduction
One of the advantages of being a programmer is that when you don't have a program that does what you need, sometimes you can write one yourself. This is what happened the other day when I was going through my MP3 collection. I have most of my CDs ripped to disc and the collection is stored on more than one external drive as I've changed drives over the years.
The problem came up when I decided that I wanted to eliminate one of the copies. I wasn't quite sure when I had stopped using one disc and started using the other so I was afraid that by deleting the older one, I might be losing some files that hadn't been transferred over. There are thousands of files in the collection, all grouped in folders by artist and album so it would take a long time to manually compare. I thought about how I might be able to use an FTP program to sync them but I wasn't sure that would work the way I wanted it to. Finally, I thought "Hey! It should be a simple and useful utility to write in .NET so why not write my own?"
Download Options and Requirements
This program was designed using Visual Basic 2005 and requires Microsoft's .NET 2.0 framework to run. This framework is available from Microsoft as an optional Windows update or as part of the installation package below. I've successfully tested the program on Windows 2000, XP and Vista.For this project, I'm offering the following download options. For more information on these options, please see the download page.
- EXE Only (Download) 50.5 KB (51,770 bytes)
- Installation Package (Download) 273 KB (280,102 bytes)
- Program Code (Download) 87.3 KB (89,440 bytes)
The program code offered above is in the form of a Visual Studio 2005
solution which includes the main form for the application. This form
contains the bulk of the application code and the code package has been
tested in Visual Studio 2005 Express as well as Visual Studio 2010
Professional.
Project Design
I decided to do this as a Windows application with an interface that would require the user to enter a primary directory (i.e. F:\My Music) and a second directory (i.e. G:\My Music) to compare to the primary. The program would examine all the subfolders and files under the primary folder and check to see if there were corresponding files under the comparison directory.
Corresponding music stores on different drives. They look the same ... but are they?
In the screenshot above, you can see an example of the corresponding directories
that this program would handle. Both start with a "My Music" folder.
The program should even be able to compare the two structures if one of them exists
under another folder, i.e. it should be able to compare C:\Data\My Music to F:\Music
Collection to see how the folders and files under these directories correspond.
Some programs like FTP programs might use graphical displays to show the primary
and secondary directories and highlight areas of the display to indicate the differences
between the two. I decided a listing of which files or folders were missing
in the secondary location would be more helpful for me and that this could be done
through a textbox on the same form where the user entered the directories to be
compared.
Recursion
When writing programs, it's often necessary to perform repeating operations on collections of items such as customer orders or invoices. Often, you can just iterate through the collection or count the items to determine how many times to perform the operation. When dealing with a hierarchy of items such as a directory structure where you have an unknown and varying number of levels under each branch, it's a different story. For this, the typical method is to use recursive programming, often just called recursion. This is a method in which one routine is designed to analyze the items on one level of the hierarchy, look for any sublevels and then call itself to analyze each sublevel. Each time the routine calls itself, it creates another instance of itself that works independently until it's finished and then returns to the instance that called it.To show how this program would use recursion, the hierarchy starts with the F:\My Music directory for which the recursive subroutine is called first. The subroutine looks at this directory and finds a few files which it processes by looking for corresponding files in the other music collection. Mostly, though, it finds subfolders because Windows Media Player (which I use) automatically groups songs by artist and then by album.

At some point while iterating through the folders under F:\My Music, the subroutine reaches the subfolder for ELO. The routine then calls itself to examine F:\My Music\Electric Light Orchestra. That instance doesn't find any files but it does find the subfolders for the two ELO albums I have so that instance of the subroutine calls itself and creates another instance to examine the first subfolder in the list.
At this point, we have three instances of the subroutine running; one for each of the following folders:
1.) F:\My Music (waiting ...)
2.) F:\My Music\Electric Light Orchestra (waiting ...)
3.) F:\My Music\Electric Light Orchestra\A New World Record (processing)
The third instance doesn't find any subfolders because there are none but it does find a file for each song on the album so it processes those, ends and returns to instance #2 which moves on to the next subfolder it finds (..\Electric Light Orchestra\ELO's Greatest Hits) and calls another instance of itself to analyze that folder. When instance #2 is finished, it will return to the first instance (F:\My Music) which will then look for another subfolder and the entire process repeats until there are no more folders to work with.
The Code
If you're able to examine the code for this program, you'll see that the bulk of the work is done in two procedures within the code for the main form; the click event for the Compare button shown in the screenshot above and a method which it calls named CompareDirectories. The rest of the code and much of the interface exists to support these two procedures and provide a few bells and whistles. CompareDirectories is the recursive routine that accepts a directory path as a string parameter, examines that directory and then calls itself to process any subdirectories. The only limiter on the number of instances that might be running at one time is the number of levels to the directory structure.
The code makes extensive use of two elements in .NET:
1. The FileIO.FileSystem namespace which offers a number of shared methods to work with files and directories on the user's machine. This resource was introduced in version 2.0 of the .NET framework and makes it very easy to manipulate files and folders, especially since an instance of the class does not have to be declared.
2. String manipulation. The FileIO.FileSystem namespace has a method to parse a filename from a full file path but when switching between the two directories specified by the user, I'm performing a function that's unique to this program so I still have to construct the file paths manually. The StringBuilder class is recommended for frequent string operations but in this case, I've stuck with the methods available through the String class since the operations are small enough to fit on a single line.
Validation and Error Handling
One of the important parts of putting together a user interface is validating the information that the user enters. Fortunately, on this program, there are only two fields to worry about, each with only two potential errors that are likely to happen; either the user hasn't entered a directory value or the directory entered doesn't exist. The ErrorProvider control makes it pretty easy to catch and alert the user if either of these happen. Normally, this would be done in the Validating event of the fields as the data was entered but since this isn't really a data entry form, the fields might not even receive the focus and therefore the Validating event won't fire. So, I decided to write the ValidateInputs function to carry out the validation and call it from the Click event of the Compare button. Validation is performed before any other operations are carried out.
The ErrorProvider control makes it easier to validate user input and highlight problems.
There's enough error handling in the code to keep the program from crashing if something bizarre happens and to provide enough material for the screenshot I hope to receive if this program actually does throw an error. I thought about a log file but decided that would be a bit much at this point.
Watching it Grow
I meant to make this a simple program but it's amazing how features seem to present
themselves during the course of development and suddenly the program doesn't seen
complete without them.
I wasn't intending to add a menu or status bar at first but the interface is
a lot more professional with them. Building a menu is ridiculously easy in
Visual Studio and it's also nice that the menu options and command buttons can be
wired to the same events. If you're looking at the code, notice the
same code handles both the click event for the Compare button and the File /
Compare menu option. The status bar also offered a nice place to display the activity
messages rather than putting them in a separate label control.
Then there was the ability to save the output to a file instead of limiting the user
to viewing it within the form. Once I thought of it, there just didn't seem
to be as much point to the program without it and since I was already using a Rich
Text Box control to display the output, it was just another couple lines of code to let
the user save the results.
There are also the small touches like the About screen and the custom program icon to replace that awful generic Windows icon that Visual Studio wants to assign to an EXE file; the icon that suggests to me a program that should only be run by the operating system itself, if at all.
Weaving new features into the code, like having the program count the files processed or track whether discrepancies were actually found and telling the user if they weren't, is part of the fun of programming for me. Still, you have to draw the line somewhere. You have to leave something for Version 2, after all.
It's not often that something works just right the first time but this program worked surprisingly well and it wasn't long before I was able to use it to compare the two music directories it was designed for. It quickly found the discrepancies that accounted for the size differences that Windows was reporting and I was happy. I was pleased at how fast it ran, too.
Version Two
Right now, the program only determines if the files exist where they are
expected to, it doesn't look at the file sizes or dates to determine if it's
looking at two different versions of the files. That's probably not a
huge change but it takes the program to a new level so it can wait for the
next version. Actually having the program enable the user to sync
directories by copying files would be another step in its evolution.
See more articles on Drewslair.com
