Andrew Comeau ...
 Drewslair.com

 

 

 

Getting in Sync
Using recursion to compare directories in VB.NET
 

Introduction

One of the advantages of being a programmer is that when you don't have a program that does what you need, sometimes you can write one yourself.  This is what happened the other day when I was going through my MP3 collection.  I have most of my CDs ripped to disc and the collection is stored on more than one external drive as I've changed drives over the years. 

The problem came up when I decided that I wanted to eliminate one of the copies.  I wasn't quite sure when I had stopped using one disc and started using the other so I was afraid that by deleting the older one, I might be losing some files that hadn't been transferred over.  There are thousands of files in the collection, all grouped in folders by artist and album so it would take a long time to manually compare.  I thought about how I might be able to use an FTP program to sync them but I wasn't sure that would work the way I wanted it to.  Finally, I thought "Hey!  It should be a simple and useful utility to write in .NET so why not write my own?"

Download Options and Requirements

This program was designed using Visual Basic 2005 and requires Microsoft's .NET 2.0 framework to run.  This framework is available from Microsoft as an optional Windows update or as part of the installation package below.  I've successfully tested the program on Windows 2000, XP and Vista. 

For this project, I'm offering the following download options.  For more information on these options, please see the download page.

The program code offered above is in the form of a Visual Studio 2005 solution which includes the main form for the application.  This form contains the bulk of the application code and the code package has been tested in Visual Studio 2005 Express as well as Visual Studio 2010 Professional.

Project Design

I decided to do this as a Windows application with an interface that would require the user to enter a primary directory (i.e. F:\My Music) and a second directory (i.e. G:\My Music) to compare to the primary.  The program would examine all the subfolders and files under the primary folder and check to see if there were corresponding files under the comparison directory.

Corresponding music stores on different drives. They look the same ... but are they?


In the screenshot above, you can see an example of the corresponding directories that this program would handle.  Both start with a "My Music" folder.  The program should even be able to compare the two structures if one of them exists under another folder, i.e. it should be able to compare C:\Data\My Music to F:\Music Collection to see how the folders and files under these directories correspond.

Some programs like FTP programs might use graphical displays to show the primary and secondary directories and highlight areas of the display to indicate the differences between the two.  I decided a listing of which files or folders were missing in the secondary location would be more helpful for me and that this could be done through a textbox on the same form where the user entered the directories to be compared.

Recursion

When writing programs, it's often necessary to perform repeating operations on collections of items such as customer orders or invoices.  Often, you can just iterate through the collection or count the items to determine how many times to perform the operation.  When dealing with a hierarchy of items such as a directory structure where you have an unknown and varying number of levels under each branch, it's a different story.  For this, the typical method is to use recursive programming, often just called recursion.  This is a method in which one routine is designed to analyze the items on one level of the hierarchy, look for any sublevels and then call itself to analyze each sublevel.  Each time the routine calls itself, it creates another instance of itself that works independently until it's finished and then returns to the instance that called it.

To show how this program would use recursion, the hierarchy starts with the F:\My Music directory for which the recursive subroutine is called first.  The subroutine looks at this directory and finds a few files which it processes by looking for corresponding files in the other music collection.  Mostly, though, it finds subfolders because Windows Media Player (which I use) automatically groups songs by artist and then by album.



At some point while iterating through the folders under F:\My Music, the subroutine reaches the subfolder for ELO.  The routine then calls itself to examine F:\My Music\Electric Light Orchestra.  That instance doesn't find any files but it does find the subfolders for the two ELO albums I have so that instance of the subroutine calls itself and creates another instance to examine the first subfolder in the list. 

At this point, we have three instances of the subroutine running; one for each of the following folders:

1.)  F:\My Music  (waiting ...)
2.)  F:\My Music\Electric Light Orchestra  (waiting ...)
3.)  F:\My Music\Electric Light Orchestra\A New World Record  (processing)

The third instance doesn't find any subfolders because there are none but it does find a file for each song on the album so it processes those, ends and returns to instance #2 which moves on to the next subfolder it finds (..\Electric Light Orchestra\ELO's Greatest Hits) and calls another instance of itself to analyze that folder.   When instance #2 is finished, it will return to the first instance (F:\My Music) which will then look for another subfolder and the entire process repeats until there are no more folders to work with.

The Code



If you're able to examine the code for this program, you'll see that the bulk of the work is done in two procedures within the code for the main form; the click event for the Compare button shown in the screenshot above and a method which it calls named CompareDirectories.  The rest of the code and much of the interface exists to support these  two procedures and provide a few bells and whistles. CompareDirectories is the recursive routine that accepts a directory path as a string parameter, examines that directory and then calls itself to process any subdirectories.  The only limiter on the number of  instances that might be running at one time is the number of levels to the directory structure.

The code makes extensive use of two elements in .NET:

1.  The FileIO.FileSystem namespace which offers a number of shared methods to work with files and directories on the user's machine.  This resource was introduced in version 2.0 of the .NET framework and makes it very easy to manipulate files and folders, especially since an instance of the class does not have to be declared.

2.  String manipulation.  The FileIO.FileSystem namespace has a method to parse a filename from a full file path but when switching between the two directories specified by the user, I'm performing a function that's unique to this program so I still have to construct the file paths manually.  The StringBuilder class is recommended for frequent string operations but in this case, I've stuck with the methods available through the String class since the operations are small enough to fit on a single line.

Validation and Error Handling

One of the important parts of putting together a user interface is validating the information that the user enters.  Fortunately, on this program, there are only two fields to worry about, each with only two potential errors that are likely to happen; either the user hasn't entered a directory value or the directory entered doesn't exist.  The ErrorProvider control makes it pretty easy to catch and alert the user if either of these happen.  Normally, this would be done in the Validating event of the fields as the data was entered but since this isn't really a data entry form, the fields might not even receive the focus and therefore the Validating event won't fire.  So, I decided to write the ValidateInputs function to carry out the validation and call it from the Click event of the Compare button.  Validation is performed before any other operations are carried out.

The ErrorProvider control makes it easier to validate user input and highlight problems.

There's enough error handling in the code to keep the program from crashing if something bizarre happens and to provide enough material for the screenshot I hope to receive if  this program actually does throw an error.  I thought about a log file but decided that would be a bit much at this point.

Watching it Grow

I meant to make this a simple program but it's amazing how features seem to present themselves during the course of development and suddenly the program doesn't seen complete without them.

I wasn't intending to add a menu or status bar at first but the interface is a lot more professional with them.  Building a menu is ridiculously easy in Visual Studio and it's also nice that the menu options and command buttons can be wired to the same events.   If you're looking at the code, notice the same code handles both the click event for the Compare button and the File / Compare menu option.  The status bar also offered a nice place to display the activity messages rather than putting them in a separate label control.

Then there was the ability to save the output to a file instead of limiting the user to viewing it within the form.  Once I thought of it, there just didn't seem to be as much point to the program without it and since I was already using a Rich Text Box control to display the output, it was just another couple lines of code to let the user save the results.
 


There are also the small touches like the About screen and the custom program icon to replace that awful generic Windows icon that Visual Studio wants to assign to an EXE file; the icon that suggests to me a program that should only be run by the operating system itself, if at all.

Weaving new features into the code, like having the program count the files processed or track whether discrepancies were actually found and telling the user if they weren't, is part of the fun of programming for me.  Still, you have to draw the line somewhere.  You have to leave something for Version 2, after all.

It's not often that something works just  right the first time but this program worked surprisingly well and it wasn't long before I was able to use it to compare the two music directories it was designed for.  It quickly found the discrepancies  that accounted for the size differences that Windows was reporting and I was happy.  I was pleased at how fast it ran, too.

Version Two

Right now, the program only determines if the files exist where they are expected to, it doesn't look at the file sizes or dates to determine if it's looking at two different versions of the files.  That's probably not a huge change but it takes the program to a new level so it can wait for the next version.  Actually having the program enable the user to sync directories by copying files would be another step in its evolution.

For more information:

Microsoft Visual Basic 2008 Step by Step
Michael Halvorson

See more articles on Drewslair.com


© 2011, Andrew Comeau, except where otherwise noted. Drewslair.com content should not be republished without written permission from the author.  Read our privacy policy.  More questions? Contact us at this address.

Microsoft is a registered trademark of Microsoft Corporation in the United States and other countries.