OOP and Relational Databases
Posted by in Programming onWhy do programmers choose relational databases instead of object-oriented database systems?
The short answer is that there is a fundamental difference between storing data and manipulating data. Code should be written to manipulate data, and databases should be designed to store the data which code manipulates. Object-oriented databases conflate these two concepts in such a way that data anomalies become an inevitability for many projects; which in turn renders them unsuitable for the primary data-store.
Object Oriented Programming
Object Oriented Programming is a solution to a question of efficiency and conceptualization. A well-designed Object-Oriented system — one which follows SOLID design principles to the greatest extent practical — will use OOP to create classes which create a digital taxonomy of traits and behaviors.
For example, a door and a window both create holes in a wall. However, a door is meant to be walked through, and a window is not. A door and a window can both inherit from a class that binds them to a wall, but can be distinct in their behavior.
What you absolutely do not want to do, is store the size, position, or number of doors and windows directly in the binary to your program. That is the main difference between data and code: your code should be written in a way that manipulates the data, but the data should exist separately from the code which manipulates it.
Which brings me to…
Data
I broadly group data into three categories. Which category a piece of information falls into will usually determine how I store it, if I store it at all. The first category would be Transient, short-lived information. At the high-end of the persistence scale, this will include things like the password of a login-form — I can discard the data as soon as I know whether the login passed or failed.
The second category would be Long-Term information that is closely-tied to my program. This includes things like video-game save-files, crash-dumps, cache-files, or other proprietary information that is only relevant in the immediate context of the application that uses it. The third category — the one relevant to this conversation — is Long-Term information that is Loosely-Tied to my program. This is arguably the only category where databases are a reasonable storage media. Relational Models reign supreme for this because they have stable, well-defined, and well-tested behavior. Object-Oriented Databases do not.
Relational Databases also tend to be primarily concerned with ACID compliance (which can be loosely described as an obsession with long-term reliability) and the relationships between data, while Object-Oriented Databases tend to be primarily concerned with performance (For a chuckle, reference the “MongoDB is webscale” meme). Your priorities here will probably, more than anything else, determine which solution you reach for.
Why is it important to separate business logic from data? You (probably)
I’m glad you asked!
When you design a database, you are making a bet that you can account for every possible use-case of your data. Every database is a balancing act between speed and quality. If you don’t normalize well enough, the quality and reliability of the stored data will inevitably degrade over time; yet if you normalize too aggressively, your database will become frustrating to program around and will be sluggish.
This problem is only exacerbated by the tendency for business logic to evolve over time. If your database structures are too tightly bound to your business logic (as is almost guaranteed to be the case with an object-oriented database), then changes to the business logic will necessitate changes to the structure of your database. (To be fair, this is the case regardless of whether you choose a relational database or an object-oriented database; but relational databases do not encourage this particular behavior.)
It also cannot be understated that relational databases do an extremely good job at representing the relationships between atoms of information in a way that object oriented databases do not. A well-normalized relational database allows you to store a unit of information in exactly one place. You can then use indexes to reference that information in multiple places, and use schemas to define the relationships between those units of information. Business logic for these programs is typically written so that these units of information can be reconstituted ad-hoc into whatever form you need them for any given task.
By contrast, object-oriented databases tend to duplicate information for the sake of retrieval speed. A typical object-oriented database won’t be able to perform cross-entity joins of any sort; so the only practical way to fetch large amounts of data at once is to store all of the data you may need into one object. The result is that, when using an object-oriented database, people tend to either create a pseudo-relational database (at which point, you may as well use a system like PostgreSQL), or duplicate information all over the place (in which case, data anomalies are an inevitability). If you care about the integrity of your data — and let’s be honest, you probably care a lot about whether your data is correct — then this creates a very narrow scope of work where an object-oriented database is suitable.
This is not to say that object-oriented databases are never useful. Just that an object-oriented paradigm can simultaneously be a great abstraction tool, and also a terrible data-storage tool.