Skip to content

Nermalisation

Photo: MySQL-dump cat
An introduction to the normalisation of databases that requires no prior knowledge and serves as an excuse to introduce cat content into MySQL-dump.

Simply speaking, normalization is about flexibility and avoiding redundantly storing information, that is, storing the same piece of knowledge in more than one place.

Consider some cats and their owners (or, in Cat, our "tin-openers"):

catownercity
AdaAndreaAnchorage
ShelleySarahSan Diego
LynnLouiseLos Angeles

Photo: MySQL-dump cat

So far, so good. Now another cat strays to Louise:

catownercity
AdaAndreaAnchorage
ShelleySarahSan Diego
Lynn, LaraLouiseLos Angeles

Photo: MySQL-dump cat

It's pretty obvious that we'll be in a world of pain once we also wish to store more attributes for each cat (date of birth, vaccination, ...).
Moreover, cat would be a string in databases that do not support multi-valued fields (columns), so looking up Lara would be awkward ("show me all rows that have 'Lara' as a whole word anywhere in the field cat_name"). This is awkward, error-prone, and slow, so let's discard the idea while we can:

catownercity
AdaAndreaAnchorage
ShelleySarahSan Diego
LynnLouiseLos Angeles
LaraLouiseLos Angeles

Photo: MySQL-dump cat

Now we stored "Louise" and "Los Angeles" twice. This is somewhat awkward, we'd rather only change one entry if Louise moves or changes her name:

ownercity
AndreaAnchorage
SarahSan Diego
LouiseLos Angeles

catowner (referencing other table, "foreign key")
AdaAndrea
ShelleySarah
LynnLouise
LaraLouise

Photo: MySQL-dump cat

We have eliminated the redundant city, but if Louise's name should change, we'll have to update three entries (her name in the owners table, and two references to that entry from the cats table). Since in the real world, people do move and do get married (and sometimes just change their names because, well, they wish to), maybe the name is not a very good key. We'll make up an arbitrary one that will never change because it does not rely on the actual payload data:

owner_noownercity
1AndreaAnchorage
2SarahSan Diego
3LouiseLos Angeles

catowner_no (referencing other table, "foreign key")
Ada1
Shelley2
Lynn3
Lara3

Photo: MySQL-dump cat

Now a new cat strays to Sarah. Like Louise, Sarah fancies the name Lara:

owner_noownercity
1AndreaAnchorage
2SarahSan Diego
3LouiseLos Angeles

catowner_no (referencing other table, "foreign key")
Ada1
Shelley2
Lynn3
Lara3
Lara2

Photo: MySQL-dump cat

The cats are still unique — Sarah's Lara is Lara/2 in our cats table, Louise's is Lara/3. At this point however, word gets around in cat circles that Louise rocks — Ada escapes from Andrea-world and strays to Louise! (cat_name,owner_no) no longer considered stable key!

owner_noownercity
1AndreaAnchorage
2SarahSan Diego
3LouiseLos Angeles

cat_nocat
1Ada
2Shelley
3Lynn
4Lara
5Lara

cat_nolives_with_owner_nofromuntil
1119982002
132002NULL (NULL: still living there, makes looking up current location nice and easy)
221997NULL
331850NULL
431999NULL
522002NULL

Photo: MySQL-dump cat

Sarah's Lara heard the news as well, and also strays to Louise:

owner_noownercity
1AndreaAnchorage
2SarahSan Diego
3LouiseLos Angeles

cat_nocat
1Ada
2Shelley
3Lynn
4Lara
5Lara

cat_nolives_with_owner_nofromuntil
1119982002
132002NULL (NULL: still lives there, makes looking up current location nice and easy)
221997NULL
331850NULL
431999NULL
5220022002
532002NULL

Photo: MySQL-dump cat

Louise is unaware of the newcomer's name and calls her Callisto — apparently, a cat's name is not a constant. Once more, we fix our tables:

owner_noownercity
1AndreaAnchorage
2SarahSan Diego
3LouiseLos Angeles

cat_name_nocat_name
1Ada
2Shelley
3Lynn
4Lara
5Callisto

lives_with_owner_nofromuntilunder_the_name
1199820021
32002NULL1
21997NULL2
31850NULL3
31999NULL4
2200220024
32002NULL5

Photo: MySQL-dump cat


Now, a cat can have any number of names (one after the other), and each owner can feed any number of cats. We can even make overlapping entries in our latest table to account for cats sneakily eating in several places. So far, so normal. Or is it?
What happens if Louise moves, but not with all "her" cats?

What changes will we need if twin cats arrive that get the same name because Sarah can't tell them apart, anyway? (Rows must be unique!)

Denormalisation:

In some cases, it may be worthwhile to separate a table's history from its status quo (to keep the table holding the current data small, to speed up lookups in it, or to put the history table in one of MySQL's merge tables):

owner_noownercity
1AndreaAnchorage
2SarahSan Diego
3LouiseLos Angeles

cat_name_nocat_name
1Ada
2Shelley
3Lynn
4Lara
5Callisto

lived_with_owner_nofromuntilunder_the_name (history)
1199820021
2200220024

lives_with_owner_nofromunder_the_name (current)
320021
219972
318503
319994
320025

In this example, we'll have to insert the relevant line from the current table (using the current date for the until value) before updating the line in the current table.

Photo: MySQL-dump cat

If in doubt, use consecutive integers as a (primary) key (integer primary key auto_increment not null etc.); do not use strings as keys if you can avoid it, if the above didn't convince you, maybe this will: string lookups aren't exactly fast. Do not use "match-codes" or any other keys that encode payload data; keys should be opaque.

Trackbacks

Katze mit Wut on : Nermalisierung

Eine Einführung in die Normalisierung von Datenbanken, die kein Vorwissen voraussetzt und Katzencontent liefert. ASCII version, English versionStark vereinfacht geht es bei Normalisierung um Flexibilität und Redundanzvermeidung. Gegeben seien Katzen u

The Quiet Earth on : Databases and Normalization

Relational Database Management Systems (RDBMSs) use special forms to keep data from being kept redundant: normalized tables. If you’re like me, you haven’t memorized or completely understood normalization up to now. If you do, however, wan...

MySQL-dump on : Annotations to "Five Common PHP database problems"

In Five Common PHP database problems, Jack Herrington writes about five common database problems that occur in PHP applications -- including database schema design, database access, and the business logic code that uses the database -- as well as their s

Comments

Display comments as Linear | Threaded

eckes on :

I normally use fixed char(36) primary keys containing an UUID. If collation is binary this is not much slower than int (of course it is since it is larger), but it has so many advantages. You get a unique id for the lifetime of the record (including exports, backups and syncronisation). This is good for auditing and merging of data streams.

Azundris on :

While this extends beyond the scope of an introduction, it might be as good a time as any to draw everyone's attention to the *auto_increment_increment* and *auto_increment_offset* server-options (both added in 5.0.2). These might be relevant depending on one's use case.

http://dev.mysql.com/doc/mysql/en/server-system-variables.html

Azundris on :

On a semi-related note, there's also the "serial" shorthand now.

fdask on :

Nice little article, and I like how you've used cats in your examples :)

I am a little unclear what the point here is though... break tables up?

Azundris on :

The point is that you have the option to not do, but then you'll be in a world of pain. The point is also to show that if you do it, how to do it. A more technical approach is here:
http://en.wikipedia.org/wiki/Database_normalization

Max Sinister on :

I can't help but miss one table, or at least one column. Because the last two arrangements don't state WHICH cat lives with which owner under which name.

Kimbo Slice on :

@Max Sinister: Yes, there are many possible problems that could arise from this. The tables should be: #catID; #cat_nameID, name, .catID; #ownerID; #locationID, location; #arrangementID, .catID, .ownerID, .locationID, from, until. The other good thing about having locationID as its own table is that it doesn't turn on the owner. A cat can have its own residence (i.e., Stray). Owners can change residences, and the 'lives with' would be implicated by the catIDxownerIDxlocationID. You can just access it via your scripting language: "Lives with" + SELECT catIDxownerIDxlocationID;.

Kimbo Slice on :

How do I auto-update the dates on changes? Is it best to use a view for the final table, or can we set up a dependency in a better way? I've seen that this can be very complicated. Also, wouldn't it be better to use a tighter grain for the dates? We may as well because there's a time stamp option.... Finally, don't you create a primary key in the lives_with table for the contingency that the cat has two owners (for example, if divorced parents send their child's cat back and forth between houses with the child).

Add Comment

E-Mail addresses will not be displayed and will only be used for E-Mail notifications.

To prevent automated Bots from commentspamming, please enter the string you see in the image below in the appropriate input box. Your comment will only be submitted if the strings match. Please ensure that your browser supports and accepts cookies, or your comment cannot be verified correctly.
CAPTCHA

BBCode format allowed