©Mike Moxcey 2001

An Introduction to Relational Database Design

When many folks first build a "database," they'll set up some kind of spreadsheet and start loading all their data into the various columns. Each row contains all the stuff you need to know about that particular item. At first, it seems to work well. But if the "database" lives for a long time, then it will have a lot of rows and if you need to run reports or change data, you'll quickly run into problems. The principles of good relational database design takes care of these problems and allows your database to

run quickly
maintain accurate data at all times, and
hold as much information as you need.

For example, suppose a folk musician wanted to keep a list of songs he played. At performances, he likes to talk about the songs a little so he'd want the Title and the Author and perhaps a little History. If he just jumped right in, he could make a table (or spreadsheet) with those three columns.

Title Author History

Suppose he entered a bunch of songs and had three by Stephen Foster. If he wanted to run a report of all songs by an author, it should list out the three songs in order. However, what if he hadn't done all the data entry himself, but instead had the guitar player enter some and the fiddler enter some others? Now the database has one song by Stephen Foster, one by S. Foster and one by Foster, Stephen Collins. The report won't group them correctly.

One fix for this problem is to split the Author field into separate First Name and Last Name fields. You might even want a Middle Initial Field.

Title Author History
F_Name MI L_Name

This is better (although the names can still be misspelled). If you haven't already put a lot of data in, making this change isn't too hard. You'll have to go back and redo all the Author data you've already entered. Perhaps you can let the mandolin picker do it.

Duplicated Data

As you're going along, you realize that some of the history is about the song—such as how Old Susannah was the number one hit of 1849 and was stolen from Foster by a record company. But some of the history information is actually about Stephen Foster—how he was a bookkeeper who won a song writing contest—and that information could be contained in the History of any of the songs by Foster. You haven't been too worried about repeating the First and Last Names over and over, but all this history information is time consuming and you don't want to repeat it, so now you think about splitting the Songs and Authors apart.

This is the first step in designing a database.

Entities

Entities are separate things that each have their own data. When you first start thinking about keeping data, you should try to think of all the various things you might want to keep info about. These come in 5 varieties: Person, Place, Thing, Concept or Event. We don't have to get into the distinctions between them, but in a well designed database, all your main tables are based on entities.

When you first begin to design a database, write down all the possible nouns you might want to track. These are your candidate entities. Define them and see which ones group together, which split into new entities, and which disappear. An example of this brainstorming is in the next section .

In this exercise we now have two tables:

    Song                   Author
    Title                    F_Name
    History                  MI
                             L_Name
                             History

Each Song has its own Title and its own History. Each author has his or her own Name and a separate History. This has simplified data entry considerably. The mandolin player is happy.

Relationships and Indexes

Now we need a way to connect the information in Songs with the appropriate information in Authors. You could carry the Author Last Name in Song, but that could get confusing if you had the authors Stephen Foster, Barry Foster, and Suzy Foster. You could also carry the first name and even the middle initial, but that's starting to double up your data entry and if you misspell something, then everything is trashed.

Another problem is that a song title is not necessarily unique. You can copyright words and music, but the title can be used again by anyone. There could be twenty songs called "I Love You" with different Histories and Authors, so you need some way of making each record unique. A standard technique is to create a meaningless number for each row in the table. This Index number helps keep all the data distinct. The actual number is irrelevant, but the fact that each record is unique because of this number is crucial.

Primary Key

In our Music database, we'll call this number a sequence number (Seq_NR) and put one in each table.

    Song                    Author
    SEQ_NR                   SEQ_NR
    Title                    F_Name
    History                  MI
                             L_Name
                             History

Now that we have a Unique Index or Primary Key, we can use that to keep all our rows separate. Both tables have primary keys and we can refer to them using the notation Table_name.Column_name as in song.seq_nr and author.seq_nr.

Foreign Key

Now we still don't have a connection between the two tables. What we need to do is carry the primary key of one of the tables inside the other one. Which one goes where is discovered by creating an Entity-Relationship Diagram (E/RD). For this example, we will just put the Author key into the Song table and call it a Foreign Key. (You don't need to remember the name).

Song
Seq_NR
Title
History
Author_Seq_NR

Now, if we want to find the author of a song, we pull up the Song record, look at the Author_Seq_NR and then match that number to the specific Seq_NR in the Author table.

2. When designing your own database, after you've got all the nouns turned into entities, start drawing lines to signify relationships between pairs of entities. Some pairs will not be related directly to each other but that's okay. Label each line with a meaningful name.

Relationship Types

The relationship rules of an E/RD are discovered by asking two questions.

1. Is the relationship mandatory or optional?
2. How many of one entity instance (or record) are related to another entity instance?

Mandatory or Optional

Sometimes, there is no relationship, so those entities aren't connected at all. But if there is a connection, then we need to know whether it is mandatory or not. For example, at first glance we'd say every song must have an author, and we'd be technically or philosophically correct. Someone somewhere at some point in time had to write the song. However, we may have no idea who the author is. Therefore, the actual answer is that a Song in our database won't necessarily have an Author. Thus, the relationship from Song to Author is optional.

Now we could enter Authors in our database who don't have songs, but why bother? We've defined Author as the author of a Song so we'll only put in people who've written songs that are entered in our database. Therefore, our business rule will say that every Author must have at least one Song, so the relationship from Author to Song is mandatory.

One or Many

In relational theory, there are only two sizes of a relationship we're worried about: One or Many. If there could ever be more than one instance (record) of an entity related to another one, then we just say there are many. It doesn't matter whether there are two or ten or a bazillion, we just call it many because the design principles are the same: If an Author can write multiple Songs, then you put the Author's primary key inside the Song table as a foreign key.

3. Check each of your relationships for Optionality and Cardinality. Check them in each direction for both rules. That means there will be four separate checks for each relationship. You will probably discover a few Many-to-Many relationships which will have to be converted into Associative Entities.

Associative Entities

One-to-One and One-to-Many relationships can be programmed into a database. Unfortunately, there are a lot of relationships that are Many-to-Many. Just as an Author can write many Songs, a Song can have many (at least two) Authors. Just think of the classic teams of Rogers and Hammerstein or Lennon and McCartney.

There is no way to set up two tables for two entities with a Many-to-Many relationship between them. What you do is create a third table in between that carries the foreign key of each of the two main tables for each record. This third table, called an Associative Entity, also carries its own unique key (Seq_NR) to keep its records separate.

Song         Song_Author_Association       Author
                   Seq_NR
Seq_NR.............Song_Seq_NR
Title              Author_Seq_NR.............Seq_NR
History                                      F_Name
                                             L_Name
                                             MI
                                             History

There are many associative entities in any database). If you're writing SQL statements, these entities seem like a big waste of time, but actually, they're what gives databases most of their power.

How this works is that if there are two authors for a song, then there will be two records in the middle table. Each record will have the same song number (Song_Seq_NR) and will have a different author number (one Author_Seq_NR for each unique author). Similarly, if an author wrote two songs, there would also be two records in the associative entity. These two records would have the same author number but different song numbers. And if two authors wrote 4 songs together, there would be 8 entries in the middle table (2 x 4 = 8).

Here are some made-up examples from the music database:

Here are some relevant Song records:

Seq_Nr Title               History
----- -----               -------
22   Oh Susannah         Number one hit of 1849
462   Old Kentucky Home   Foster never lived in Kentucky
468   Yellow Submarine    Written about a school bus
5760   Let It Be
7001   Can't Buy Me Love

And here are some relevant Author Records:

Seq_Nr F_Name L_Name    MI History
------ ------ ------    -- -------
    7   Stephen Foster     C Died penniless
   30   John    Lennon     W Murdered in 1980
   31   Paul    McCartney     Owns most copyrights

The associative records joining Stephen Foster (7) to the two songs Oh Susannah (22) and Old Kentucky Home (462) would look like this:

Seq_Nr Song_Seq_Nr Author_Seq_Nr
------ ----------- -------------
1 22 7
2 462 7

Looking at just those records makes it seem confusing, but computers work better than we do at making links. If I want to find out who wrote Oh Susannah, I look in the Song table and find its number is 22. Then I go to the associative table and find all the records that have 22 in the Song_Seq_NR column (because there could be more than one author). Then I take the relevant Author_Seq_NRs and look through the Author table for the Name and whatever other information I'm looking for.

Here are the records for the three Beatles tunes and their two authors:

Seq_Nr Song_Seq_Nr Author_Seq_Nr
------ ----------- -------------
    3       468          30
    4       468          31
    5      5760          30
    6      5760          31
    7      7001          30
    8      7001          31

You need to remember that every record will have a meaningless number used just by the database to track it. And many tables will just have a jumble of these numbers to relate information together.

Creating Attributes

After you've got the Entities defined, you will put Attributes under each one. You will actually put in attributes as you go along just to keep your thoughts straight and that's what we've done. Seq_NR, Title, History, and F_Name are all attributes. These will become the columns in the tables.

Put in all the attributes you can think of under each table. Chances are, every single table will have a Seq_NR and a comments/remarks block.

Domains

At this time, you'll probably also want to think about Domains. A Domain contains characteristics about an attribute or collection of attributes. You can have a the characteristics apply just to an attribute, but if you have them in a domain, then they are easier to locate. You need to be concerned with the characteristics Type and Length. Most fields in a database can be either Character, Number, or Date. Within each broad grouping, there are lots of subgroups. For characters, Oracle has CHAR and VARCHAR2 and for Numbers it has Number, Integer, Binary Integer, Decimal, Float, Double Precision, Real, Long, and SmallInt. Having a good type allows the database to verify some data entry. You'll also want to know about how long each field is both for creating the initial tables and for putting on the form.

It is useful to think about the actual data entry at this time. For example, I've been using MI for Middle Initial of a name, but what if I wanted to use a full middle name or had someone who used their name different such as M. Stanton Evans? Turning MI into MNAME at this point in the process is a lot easier than redoing the tables and forms later on.

Home Index Next