I want to present to you a brief and shameful history of how I’ve mishandled SQL lookup tables over the years. I hope it’s useful to anyone wrestling with the issue of massive and proliferating lookup tables.
Years ago, when I first started out with Zend Framework, I took a look at a Magento installation and was horrified by how many SQL tables it required. I didn’t want to have to deal with dozens of tables, and lookup tables were the main thing I thought I could eliminate. Then I had a project that required a couple of dozen fields in a single table, many of which would be empty for many of the rows. This made the issue even more glaring: SQL Lookup tables are useful, but is there a better way to lookup data in PHP? In my hubris I thought there was. Say you have a customer table in your database, and you want to mark each customer with a status. There were 3 approaches that I looked at:
- Classic lookup tables: Use a foreign key in the status field of the customer table, referencing the primary key of a lookup table.
PRO: This is the classic solution. It’s a very common approach because it will work in any RDBMS software: reference tables are a fundamentally relational solutionPRO: Each lookup column is lookup-able with a single JOIN statementPRO: All logic is performed inside SQL. This is already in place and does not require much in the way of code or configuration.
CON: If we have a big database these lookup tables will proliferate. Given how individually trivial they are (the customer status example might have 5 states at most), complicating our SQL with these JOINs doesn’t sit right. This is the reason I went looking for another solution…
- Config option: Use a config file read by PHP at runtime to load application reference tables, mapping numbers to an int in the customer table status column.
PRO: I have done this and it works pretty well: you can load it once and have it available globally. If you store the statuses as a simple key-value pair array, you can easily lookup the statuses with little overhead.CON: This solution breaks the RDBMS model since the config file becomes in a sense an external database containing multiple tables, and the statuses in the database are foreign keys but cannot leverage the integrity constraints of the db. So the logic to handle integrity must be built into the application, adding development overheadCON: Distributing the lookups to the application also burdens the PHP with knowledge of the database, and makes it responsible for this data.
This is an ugly solution, and it’s what I did in at least two projects, patting myself on the back all the while and congratulating myself on my ingenuity. Then I discovered this way:
- Use enums in our customer table status field.
This was a revelation. Up to this point I’d been adhering to what I knew about SQL standards, and hadn’t looked at what MySQL offered.
PRO: This is a much more attractive solution in that it takes the JOINs out of the SQL entirely. It abstracts the mapping of terms to ids by making MySQL take care of the implementation, and leaves us with a much more readable value in the status column.
PRO: Again, this is handled within SQL so the strengths of the platform are used fully
CON: It’s not supported in all DBMSes, mainly just PostgreSQL and MySQL.
CON: It isn’t trivial to query enums. So we can’t easily retrieve a list of valid customer statuses. But we will need them for form elements and validation; do we store them in the config of our application? Then we have duplicated data, and the potential for the database and the application to drift. The more I consider this the more of a problem it seems.
- Another possibility: kvp lookup tables
Then I started looking at how WordPress does things. If you’re not familiar with it, WordPress uses a table called wp_options that stores attribute/value (or key/value) pairs. Built-in Wp functions provide access to this as a kind of abstracted storage, so Wp and third-party plugins can read and write their configurations to it.
PRO: The kvp model is very flexible. you don’t need to alter your SQL tables to add a new ‘field,’ instead you just add it as a new key. This is great for plugin authors especially.
CON: Again, this places the responsibility for data integrity on PHP, and leaves the database ignorant.
CON: Speed might be a concern with this solution
CON: The fact that ‘joins’ (are these joins?) are done in PHP promotes general poor coding. I have seen this kind of logic appearing in, for example, page templates. Ugh.
I have used this system myself on a couple of small projects, and it is convenient, but it is not good practise for the reasons outlined above.
So I’ve tried lots of different ways of doing this and I came full circle. Lookup tables are the best way of implementing lookup data. I think my informal education in programming led me down this wrong-headed path: if I’d studied Computer Science at University I would never have considered the config option. Even the case of the table with two dozen fields could be resolved more simply with some forethought: perhaps the data type that I’m looking at there is made up of several sub-types? Defining the data more clearly might have allowed me to abstract the data from its storage to a degree (using ‘field_6′ instead of ‘preferred_city’, for example).
Anyway, I hope this has been of use. I’m keen to hear how others have approached this… let me know!