When many folks first build a "database," they'll set up some kind of spreadsheet
and start loading all their data into the various columns. Each row
contains all the stuff you need to know about that particular item.
At first, it seems to work well. But if the "database" lives for a
long time, then it will have a lot of rows and if you need to run reports
or change data, you'll quickly run into problems. The principles of
good relational database design takes care of these problems and allows your
database to
For example, suppose a folk musician wanted to keep a list of songs he played.
At performances, he likes to talk about the songs a little so he'd want the
Title and the Author and perhaps a little History. If he just jumped
right in, he could make a table (or spreadsheet) with those three columns.
Title Author History
Suppose he entered a bunch of songs and had three by Stephen Foster.
If he wanted to run a report of all songs by an author, it should list out
the three songs in order. However, what if he hadn't done all the data
entry himself, but instead had the guitar player enter some and the fiddler
enter some others? Now the database has one song by Stephen Foster,
one by S. Foster and one by Foster, Stephen Collins. The report won't
group them correctly.
One fix for this problem is to split the Author field into separate First Name and Last Name fields. You might even want a Middle Initial Field.
Title Author History
F_Name MI L_Name
This is better (although the names can still be misspelled). If you
haven't already put a lot of data in, making this change isn't too hard.
You'll have to go back and redo all the Author data you've already entered.
Perhaps you can let the mandolin picker do it.
This is the first step in designing a database.
When you first begin to design a database, write down all the possible
nouns you might want to track. These are your candidate entities.
Define them and see which ones group together, which split into new entities,
and which disappear. An example of this
brainstorming
is in the next section
.
In this exercise we now have two tables:
Song Author
Title F_Name
History MI
L_Name
History
Each Song has its own Title and its own History. Each author has his
or her own Name and a separate History. This has simplified data entry
considerably. The mandolin player is happy.
Another problem is that a song title is not necessarily unique.
You can copyright words and music, but the title can be used again by anyone.
There could be twenty songs called "I Love You" with different Histories and
Authors, so you need some way of making each record unique. A standard
technique is to create a meaningless number for each row in the table.
This Index number helps keep all the data distinct. The actual number
is irrelevant, but the fact that each record is unique because of this number
is crucial.
Song Author
SEQ_NR SEQ_NR
Title F_Name
History MI
L_Name
History
Now that we have a Unique Index or Primary Key, we can use that to keep all
our rows separate. Both tables have primary keys and we can refer to
them using the notation Table_name.Column_name as in song.seq_nr and
author.seq_nr.
SongNow, if we want to find the author of a song, we pull up the Song record, look at the Author_Seq_NR and then match that number to the specific Seq_NR in the Author table.
Seq_NR
Title
History
Author_Seq_NR
2. When designing your own database, after you've got all the
nouns turned into entities, start drawing lines to signify relationships
between pairs of entities. Some pairs will not be related directly
to each other but that's okay. Label each line with a meaningful
name.
1. Is the relationship mandatory or optional?2. How many of one entity instance (or record) are related to another entity instance?
Now we could enter Authors in our database who don't have
songs, but why bother? We've defined Author as the author of a Song
so we'll only put in people who've written songs that are entered in our
database. Therefore, our business rule will say that every Author must
have at least one Song, so the relationship from Author to Song is mandatory.
3. Check each of your relationships for Optionality and Cardinality.
Check them in each direction for both rules. That means there will be
four separate checks for each relationship. You will probably discover
a few Many-to-Many relationships which will have to be converted into Associative
Entities.
There is no way to set up two tables for two entities with
a Many-to-Many relationship between them. What you do is create a third
table in between that carries the foreign key of each of the two main tables
for each record. This third table, called an Associative Entity, also
carries its own unique key (Seq_NR) to keep its records separate.
Song
Song_Author_Association
Author
Seq_NR
Seq_NR.............Song_Seq_NR
Title
Author_Seq_NR.............Seq_NR
History
F_Name
L_Name
MI
History
There are many associative entities in any database). If you're writing SQL statements, these entities seem like a big waste of time, but actually, they're what gives databases most of their power.
How this works is that if there are two authors for a song,
then there will be two records in the middle table. Each record will
have the same song number (Song_Seq_NR) and will have a different author
number (one Author_Seq_NR for each unique author). Similarly, if an
author wrote two songs, there would also be two records in the associative
entity. These two records would have the same author number but different
song numbers. And if two authors wrote 4 songs together, there would
be 8 entries in the middle table (2 x 4 = 8).
Here are some made-up examples from the music database:
Here are some relevant Song records:
Seq_Nr Title
History
----- -----
-------
22 Oh Susannah
Number one hit of 1849
462 Old Kentucky Home Foster
never lived in Kentucky
468 Yellow Submarine
Written about a school bus
5760 Let It Be
7001 Can't Buy Me Love
And here are some relevant Author Records:
Seq_Nr F_Name L_Name
MI History
------ ------ ------ --
-------
7 Stephen Foster
C Died penniless
30 John Lennon
W Murdered in 1980
31 Paul McCartney
Owns most copyrights
The associative records joining Stephen Foster (7) to the two songs Oh Susannah (22) and Old Kentucky Home (462) would look like this:
Seq_Nr Song_Seq_Nr Author_Seq_Nr
------ ----------- -------------
1 22
7
2 462
7
Looking at just those records makes it seem confusing, but computers work better than we do at making links. If I want to find out who wrote Oh Susannah, I look in the Song table and find its number is 22. Then I go to the associative table and find all the records that have 22 in the Song_Seq_NR column (because there could be more than one author). Then I take the relevant Author_Seq_NRs and look through the Author table for the Name and whatever other information I'm looking for.
Here are the records for the three Beatles tunes and their two authors:
Seq_Nr Song_Seq_Nr Author_Seq_Nr
------ ----------- -------------
3
468 30
4
468 31
5 5760
30
6 5760
31
7 7001
30
8 7001
31
You need to remember that every record will have a meaningless
number used just by the database to track it. And many tables will
just have a jumble of these numbers to relate information together.
After you've got the Entities defined, you will put Attributes under each
one. You will actually put in attributes as you go along just to keep
your thoughts straight and that's what we've done. Seq_NR, Title, History,
and F_Name are all attributes. These will become the columns in the
tables.
Put in all the attributes you can think of under each table.
Chances are, every single table will have a Seq_NR and a comments/remarks
block.
It is useful to think about the actual data entry at this time. For example, I've been using MI for Middle Initial of a name, but what if I wanted to use a full middle name or had someone who used their name different such as M. Stanton Evans? Turning MI into MNAME at this point in the process is a lot easier than redoing the tables and forms later on.