I want to present to you a brief and shameful history of how I’ve mishandled SQL lookup tables over the years. I hope it’s useful to anyone wrestling with the issue of massive and proliferating lookup tables.
Years ago, when I first started out with Zend Framework, I took a look at a Magento installation and was horrified by how many SQL tables it required. I didn’t want to have to deal with dozens of tables, and lookup tables were the main thing I thought I could eliminate. Then I had a project that required a couple of dozen fields in a single table, many of which would be empty for many of the rows. This made the issue even more glaring: SQL Lookup tables are useful, but is there a better way to lookup data in PHP? In my hubris I thought there was. Say you have a customer table in your database, and you want to mark each customer with a status. There were 3 approaches that I looked at:
- Classic lookup tables: Use a foreign key in the status field of the customer table, referencing the primary key of a lookup table.
PRO: This is the classic solution. It’s a very common approach because it will work in any RDBMS software: reference tables are a fundamentally relational solutionPRO: Each lookup column is lookup-able with a single JOIN statementPRO: All logic is performed inside SQL. This is already in place and does not require much in the way of code or configuration.CON: If we have a big database these lookup tables will proliferate. Given how individually trivial they are (the customer status example might have 5 states at most), complicating our SQL with these JOINs doesn’t sit right. This is the reason I went looking for another solution…
- Config option: Use a config file read by PHP at runtime to load application reference tables, mapping numbers to an int in the customer table status column.
PRO: I have done this and it works pretty well: you can load it once and have it available globally. If you store the statuses as a simple key-value pair array, you can easily lookup the statuses with little overhead.CON: This solution breaks the RDBMS model since the config file becomes in a sense an external database containing multiple tables, and the statuses in the database are foreign keys but cannot leverage the integrity constraints of the db. So the logic to handle integrity must be built into the application, adding development overheadCON: Distributing the lookups to the application also burdens the PHP with knowledge of the database, and makes it responsible for this data.This is an ugly solution, and it’s what I did in at least two projects, patting myself on the back all the while and congratulating myself on my ingenuity. Then I discovered this way:
- Use enums in our customer table status field.
This was a revelation. Up to this point I’d been adhering to what I knew about SQL standards, and hadn’t looked at what MySQL offered.
PRO: This is a much more attractive solution in that it takes the JOINs out of the SQL entirely. It abstracts the mapping of terms to ids by making MySQL take care of the implementation, and leaves us with a much more readable value in the status column.
PRO: Again, this is handled within SQL so the strengths of the platform are used fully
CON: It’s not supported in all DBMSes, mainly just PostgreSQL and MySQL.
CON: It isn’t trivial to query enums. So we can’t easily retrieve a list of valid customer statuses. But we will need them for form elements and validation; do we store them in the config of our application? Then we have duplicated data, and the potential for the database and the application to drift. The more I consider this the more of a problem it seems.
- Another possibility: kvp lookup tables
Then I started looking at how WordPress does things. If you’re not familiar with it, WordPress uses a table called wp_options that stores attribute/value (or key/value) pairs. Built-in Wp functions provide access to this as a kind of abstracted storage, so Wp and third-party plugins can read and write their configurations to it.
PRO: The kvp model is very flexible. you don’t need to alter your SQL tables to add a new ‘field,’ instead you just add it as a new key. This is great for plugin authors especially.
CON: Again, this places the responsibility for data integrity on PHP, and leaves the database ignorant.
CON: Speed might be a concern with this solution
CON: The fact that ‘joins’ (are these joins?) are done in PHP promotes general poor coding. I have seen this kind of logic appearing in, for example, page templates. Ugh.
I have used this system myself on a couple of small projects, and it is convenient, but it is not good practise for the reasons outlined above.
So I’ve tried lots of different ways of doing this and I came full circle. Lookup tables are the best way of implementing lookup data. I think my informal education in programming led me down this wrong-headed path: if I’d studied Computer Science at University I would never have considered the config option. Even the case of the table with two dozen fields could be resolved more simply with some forethought: perhaps the data type that I’m looking at there is made up of several sub-types? Defining the data more clearly might have allowed me to abstract the data from its storage to a degree (using ‘field_6′ instead of ‘preferred_city’, for example).
Anyway, I hope this has been of use. I’m keen to hear how others have approached this… let me know!
Good stuff!
I was reading out your take on lookup tables, and wonder your thoughts on a good solution. For example it seems like you come to the conclusion that the basic lookup table is the way to go.
What is your thought on say a persons name field? For example a lookup table has a persons full name in it with an ID that is reference in other tables. How do you lesson the load on mysql when there is many tables that lookup the full name value since those tables onlynhave the foreign key.store the table as session array son that it is a one time call for the data. Some sort of master class that creates it, issue though of per page having itnrun again.
Any thoughts?
Chris
Sorry I missed this reply earlier: I get lots of spam comments and it can be hard to tell the real ones sometimes!
I don’t think a name field is a good fit for a lookup column. Good lookups have these attributes are not unique, and are one of a limited group. Title is a good example: for each individual the possible titles are probably limtied to Dr, Mr, Ms, Miss, Mrs, and perhaps Sir. First names have such a variety of spellings and so on that they don’t make good lookups. The exception might be if you have massive scale: for example if you have millions of rows. But even then, there will be many non-English names.
OK, I just re-read your question and I think I understand what you mean now. I’ll try to paraphrase it:
You have a PHP web app were you make use of many pages where you use a list of People. You want to lookup the names frequent in your code, and you want to lessen the load on SQL by caching the people table. If you use the list of names multiple times in a single page, you can store it in some kind of global array or say in a singleton class. If you want to persist it across pages then yes an array that you store in the session could work nicely. I do this by buildign a lean array (ie just the fields I actually need – in your case ID and whatever name fields). Then I serialize it to the session on __destroy() and unserialize it on the next startup – this depends on whether it’s objects or just an array.
Hope that helps!