Microsoft Access for Beginners
Part I - Organizing the Data
Contents
Introduction
A Different Set of Tools
Tables and Relationships
Types of Relationships
Referential Integrity
Changing Direction
Remaining Tables
Conclusion
Introduction
Recently, a friend talked to me about setting up a database for tracking project information within her company. She had never used Access before and wasn't sure if she wanted to do it herself or ask me to take on the project. I always like to see people learn something new so I told her we could do it either way. A couple weeks later, she sent me her first attempt at a database, asking me not to laugh.
What my friend had designed was actually typical of many beginners' attempts at Access design. She had used Access in much the same way as she would use Excel; one table with fields for all of the information that needed to be tracked. Since more than one record will often refer to the same project, items such as customer and contractor names were repeated from one record to the next. In addition to taking up more space, this can also lead to errors and mis-spellings that will make it difficult to find the information needed for various reports.
It's understandable that many people automatically try to use Access this way as most people are familiar with the spreadsheet concept and many are unfamiliar with higher database concepts. Access presents tables that look something like the spreadsheet format that people are used to and, since it's offered as part of the Microsoft Office suite, it's easy to see it in relation to the other programs and overlook its power as a standalone product.
For this series, I've designed another example based on an employee information database. The information includes contact information for all employees, both past and present, as well as current job information and other data required by the company and various agencies.
(Note: I use the employee database example as something that is familiar to most people. This does not mean that I recommend using Access for the type of extremely sensitive information that can be stored in such an application. For more information on the limitations of Access security, please see Information Security in Microsoft Access , a supplemental article to this series.)
The Excel spreadsheet at the link shown below is what a beginning effort at tracking this information might look like. In this first chapter, I'll introduce some of the concepts involved in the process of translating this information into a well-designed Access database.
Sample Files
If you have not already done so, please download the demonstration files:
Demo.zip - Access demonstration database and original Excel spreadsheet (These files require Access and Excel 2000 or later.)
A companion slideshow is also available for this chapter.
A Different Set of Tools
Access is referred to as a relational database system for two reasons:
- Data is organized into tables which are also called relations. This is because each table has a subject (i.e. tblPersonnel for employees) and every field in a given table is supposed to be directly related to the subject of the table. If you look at the design of the Personnel table, you will see that each of the fields is a piece of information that pertains to a given employee. You also won't see more than one address for an employee or more than one job title.
- Tables can be linked or related to each other. In the sample database, this is demonstrated by the link between tblPersonnel and tblAbsences (shown below). Since there can and likely will be more than one absence for each employee, the absence information is placed in a separate table and the two tables are linked by a number assigned to the employee. This eliminates the duplicate entry of information such as the employee name and department. This, in turn, reduces the size of the database and the possibility of data entry error and corruption.

By selecting Relationships from the Tools menu in Access, you can view the links between the tables in the database. In the Personnel database, a relationship is defined between tblPersonnel and tblAbsences.
This is how Access differs from a program like Excel. While Excel organizes information and allows for calculations, Access takes it a step further and lets you define relationships between different categories of data. This also enables easier analysis of your information.
Let's start by looking at the original table as it's shown in the spreadsheet.
An Excel spreadsheet showing the original table listing the employee information.
The original spreadsheet shows the necessary information for each employee with separate pages for employee absences and lookup information for the different departments. This retains the information but it is tedious to use and mistakes are easy to make. There are no controls over the editing of information. Also note the mis-spellings and unlikely dates in certain records which can happen in a spreadsheet that is updated over months and years by a busy HR clerk.
One of the goals of a user-friendly Access database is to minimize the amount of work required to enter and retrieve the information, thereby reducing workload and the possibility of error. This is where the ability to group data into separate tables comes in handy. The tables you create and the relationships between them are the backbone of your database. Once you have the tables designed correctly, you're halfway there.
Tables and Relationships
After downloading the demonstration files, start by opening the demo database. Press the F11 key to show the database objects window and select Relationships from the Tools menu at the top of the screen. The database window will show you all the tables and other objects in the database. The Relationships window will show you the relationships between the tables.
In another chapter, I'll go into more detail about actually creating a table and using the settings available for fields and tables. My interest here is to show you the proper way to design what's called the database schema, which is the collection of related tables and other objects that form the structure of the database.
We'll start by looking at three of the tables in the database; the tblPersonnel, tblAddresses and tblAbsences tables. The central table in the database is tblPersonnel which contains the essential information for each employee. In accordance with good database design, it only contains the information that specifically relates to the employee such as the name, relevant dates and status. In this case, all of the other tables in the database link to it in one way or another.

If you view the design of tblPersonnel by selecting the table in the database window and clicking on the Design button, you will see that the EmpKey field is identified as the table's Primary Key. This means that the field contains a value that is unique for each record in the table, enabling the database to identify that particular record. Access encourages you to create a primary key for every table that you design.
In the second column of the design view, the data type assigned to the EmpKey field is called AutoNumber. This is a unique number assigned by the database to every record that the user enters and cannot be edited by the user. In most cases, an AutoNumber should not be of any significance to the people using the program and can even be hidden from the user's view.
For example, you would not want to use this number as an employee number within the company because there is no guarantee that it won't skip numbers if records are deleted or cancelled. It will also assign the numbers in the order that the records are entered while employee numbers are often assigned in order of hire. An AutoNumber used as a primary key only serves as a unique value and to allow for the linking of related tables as you'll see in a moment.
(It is possible to set other non-AutoNumber fields to be unique so that the database will not allow the same value to be entered in the field twice.)
You'll also notice a few extra fields in each table. Another advantage that comes with grouping information in separate tables is the ability to store more information without getting lost in it as you might with a large spreadsheet.
In the bottom half of the table design screen, you'll see some extra settings for each field. When adding fields to a table, in particular you should pay attention to settings such as field size, input mask, default value, the validation settings and required. Proper use of these settings will provide the controls needed to help ensure that data is accurately entered. I provide more information on these properties in the next chapter and you can get more information on any of them by placing the cursor in one of the setting boxes and pressing F1.
Access has a few rules concerning field names but is very flexible in what it will accept. I try to keep my field names relatively short but descriptive enough to avoid confusion. One rule that I follow is that while Access will allow for spaces in field names, I don't use them. As your databases become more sophisticated, you may find that they lead to confusion and possible errors. Also, be careful of using reserved words like "Date" which is an Access function. Access will allow you to use them but they can cause errors later that are hard to track down.
Both tblAbsences and tblAddresses also have fields titled EmpKey but the data type is Number instead of AutoNumber. In terms of the relationships between these tables and tblPersonnel, the EmpKey field in tblAbsences is referred to as the foreign key while it's called the primary key in tblPersonnel. This is because the value originates in tblPersonnel and corresponding records in tblAbsences will relate back to tblPersonnel by having the same value in the foreign key field. Another feature of tblAbsences is what's called a composite key, or a key that is made up of more than one field. In this case, you cannot have more than one record with the same combination of employee, date and pay status.
Substituting the EmpKey value for the employee name throughout the other tables not only saves space but eliminates misspellings of names and allows for additional report functions that you will see later. It also solves the problem of having more than one employee with the same name. By using the number, the database knows exactly who you're talking about.
At this point you may be thinking "Why not just use the SSN for the primary key, since it's unique and you're storing it in another table anyway?" I cannot stress it enough - NEVER use a value as confidential as a Social Security Number or credit card number as a table key. In another part of the series, I will talk about how to hide this information from those who should not have access to it. You cannot prevent users from viewing table keys if they wish, especially if they're referenced by every other table as in this database.
Types of Relationships
By looking at the relationships between these tables, you can see how these relationships are defined. Drawing a relationship between two tables or double-clicking on an existing relationship line will bring up the relationship edit screen where you can view the settings. In the relationship between tblPersonnel and tblAbsences, the 1and ∞ symbols above the line indicate a one-to-many relationship between the tables. This means that for every employee listed in tblPersonnel, there can be many absences recorded.

The relationship between tblPersonnel and tblAddresses is different. In this case, you'll see a 1 on both sides of the relationship. There is only one address recorded for every employee and it is called a one-to-one relationship. The same is true with tblEEOCData, tblNumber and tblPastEmployees.
There are a number of reasons why information may be split off into another table like this. In this case, it's mainly to reduce the size of tblPersonnel by keeping a number of fields related to the same aspect of employee information (i.e. the contact info) in another table. The information is still just as accessible by the use of queries and forms which I'll talk about in later chapters. Another reason is for security. Access to things like Social Security Numbers often needs to be limited to certain personnel. Putting it into a separate table is the first step. With appropriate assignment of user rights and design of the program interface, you can place a certain degree of security on this and other sensitive information. Those are also topics for later chapters but when designing a database, it is important to anticipate security needs and other concerns from the very start.
( Click here for more information about Access security )
A third type of relationship - a many-to-many relationship - is sometimes needed. While it is not shown in this database, an example would be if the database tracked employee training classes. Each employee might take many classes and there would be many employees per class. Many-to-many relationships are not directly possible in Access. To create them, it is necessary to create a third intervening table that breaks the relationship down into two one-to-many relationships. In the employee training class example, you might have a registration table that would include fields for the Employee ID and the Class ID as well as any other information pertinent to that employee's registration in the class. This table would then link to both the employee and class tables on those fields.

Back in the relationships window, you'll also notice that the direction of the arrow points to tblAbsences. If you click on the Join Type button on the Edit Relationships panel, you can see that the relationship is set to show all records from tblPersonnel and then any records in tblAbsences that match those records. This will pull data from tblPersonnel for employees even if are no records for that employee in tblAbsences and is referred to as a left outer join. The first option would pull only the records where the value for this field is present in both tables and this is called an inner join. In this case, employees with no absences would not be listed which may or may not be what you want. You'll see the importance of using the proper join type when you start designing queries to work with your table data.
Going back to the original spreadsheet, notice the last page
titled "Departments". This reference information is brought into the database
through the tblDepartments lookup table shown in the relationships window.
If you view the table design, you'll notice that the primary key, DeptNo, is not
an AutoNumber field. I left it as a number field since many companies assign
their own numbers to various internal departments. The table is only linked to
tblPersonnel. Again, a lookup field will be provided so that when entering
the department information for an employee, the user selects from a list of department
names and the department number is substituted in tblPersonnel.
In this case, referencing the department number instead of the name has another advantage. If the company should reorganize, as companies often do, and a department name changes, the name only has to be changed once in the departments lookup table and all of the employees records for that department will reference the new name. If departments merge or are eliminated, simple functions can be designed to change one number to another in all the records. For more information, see my article on creating a lookup field.
By using its relationship to tblDepartments, the personnel table stores a number corresponding to a given department.
Because of the lookup function added to the Department field of tblPersonnel for this example, the table displays the appropriate title even though only its number is stored.
Referential Integrity

In the bottom half of the Edit Relationships box, you'll see a section dealing with something called Referential Integrity. This is a method for ensuring that data from one table that is supposed to match data in another table is not changed on only one side of the relationship by accident.
An example of this would be if the number assigned to a department in tblDepartments was changed or a department was deleted. If there were employees assigned to that department, this could cause a problem. Without referential integrity, you would be left with employees who were not assigned to an existing department. It would get even worse if a number of departments were changed at once. With referential integrity activated, Access protects records and primary keys that are referenced by records in other tables. If you try to delete a record that is referenced by another table, you might get a message like this one:
On the other side of the example, it would not be possible for someone to enter a non-existent department number in tblPersonnel because the database would not find a corresponding department in the tblDepartments.
Of course, in some cases, you do need to be able to delete or change records, even if they are referenced in other tables. The two options under referential integrity for 'cascade' updating and deleting of records address this. Using cascade update, if a record's primary key is changed, that value is also changed throughout the database in any records that reference it. If cascade delete is active, then instead of getting the message shown above, you'll get something like this:
If you select Yes, then any records in other tables relating to that record would be deleted. In the case of the relationship between tblPersonnel and tblDepartments, a cascading update would change the department number in the appropriate records in tblPersonnel. If a department was deleted, a cascading delete would delete any employee records for that department. The options for referential integrity are not automatically appropriate for every situation. Always take the nature and needs of the data being stored into account when you set options for tables and relationships.
Changing Direction
The skills table (tblSkills) demonstrates another significant difference between common spreadsheet and database design. In the original spreadsheet, you'll notice the last three columns on the main page list the significant skills for each employee. Since there are only three columns, the assumption is that the listing is limited to three skills per employee and not much detail is provided.
Another problem is that while the arrangement allows for some information to be entered on each employee, it doesn't accommodate searches for specific skills. Just from this small sampling of employees, notice that there is more than one person with an MOS (Microsoft Office Specialist) certification. If there were a couple hundred employees listed here and you wanted to do a search for everyone with this or other qualifications, you would have to do a manual search on all three columns. This is not such a big deal in Excel but if this is transferred to an Access table, it's a different story.
The solution in Access is to actually change the direction of the data. Instead of listing the skills horizontally, tblSkills lists a record for each instance of a skill along with the appropriate EmpKey (employee) value and a field for additional information such as years of experience or type of degree. In the screenshot above, you can see the employee name substituted for the number by the lookup function along with the skill description and details. In a later chapter, you'll see how this makes it easy to search on a specific skill and pull up a printable list of all qualified employees.
Remaining Tables
The last tables in the database are tblNotes and tblPastEmployees which may be self-explanatory. The notes table is a generic storage place for any comments and notes that may need to be entered for each employee. Its unique feature is that while it references the EmpKey field in tblPersonnel, it does not use it as part of its own key as no restriction is needed concerning the number of notes entered for a given employee. The primary key is an AutoNumber field that only serves as a unique identifier. When I discuss the design of forms, you will see how the data from this table can be displayed in the same form as the rest of the employee data while taking up a minimum of space on the form and still allowing the user to see all available records.
The past employees table is another one-to-one relationship where the data is split off from the main employees table. In this case it is done to make specific reporting on past employees a little easier. Another reason is to avoid a lot of blank records in the main table pertaining to information that is only available for a small percentage of people who are no longer with the company. Back to Top
Conclusion
This process of organizing data in Access or any other type of database is called data normalization. A normalized database follows a set of rules known as Normal Forms. While I've provided a practical example of this process, Microsoft's support site also offers a couple of very good articles which detail the rules of normalization and provide more examples. For more information, please refer to the following Microsoft Knowledge Base articles:
283878 - Database Normalization
Basics
283698 - ACC2002:
"Understanding Relational Database Design"
As I discussed the various tables in this database, you may have found yourself thinking that it looks like a lot more data entry than is shown in the original spreadsheet. If the tables were actually intended for data entry, that would be true. This is why I almost never allow users to enter data directly into the tables - that's what forms are for! Properly designed forms use the relationships between the tables to bring much of the data entry work together in one place. They can automatically supply many of the values required in the tables and can also include interesting features like drop-down lists and default values that eliminate a lot of typing and help to ensure correct data entry. I have a couple of chapters later in the series on form design that will show you how to create a good user interface for your application.
In the next chapter , I'll go into greater detail about how you can create tables like the ones you've seen here.
For more information:





