Database management systems (DBMS) based on the relational model have been the dominant type of DBMS for about four decades now. But today the common wisdom is that relational database systems are ageing and that there may be better approaches to managing data. Is the reign of the relational database nearing its end?
We’ll examine this question by reviewing the history of relational, its climb to dominance, and the current challenges it faces.
History of Relational
The relational model of data upon which the modern DBMS industry is based, has its origins in the 1970 paper A Relational Model of Data for Large Shared Data Banks published by Edgar F. Codd. Codd worked for IBM at the time and IBM research developed prototypes of database systems based on the relational model, notably System R.
But actual working, production-capable software based on the relational model was slow to come. In the early 1970s, working at the University of California, Berkeley, Michael Stonebraker based the development of Ingres (Interactive Graphics and Retrieval System) on Codd’s relational model.
Nevertheless, both System R and Ingres were essential first steps to prove the commercial viability of relational database systems. Both were forerunners of relational products still popular today: System R eventually gave us Db2… and Ingres over time gave us PostgreSQL.
But it really was not until the early 1980s that commercial relational software started to become available.
Popularity of Relational
The first commercial relational DBMS was Oracle, which was first released in 1979. It was quickly followed by a commercial release of Ingres and IBM’s SQL/DS. A few years later, IBM released DB2 for its mainframe computer systems.
Although slow to catch on at first, relational databases soon became the norm for most new software development. From the late 1980s through the early 2000s the three most popular DBMS offerings were Oracle, Microsoft SQL Server, and IBM DB2. By 2013 the global market for database software was worth almost $30 billion, and it was dominated by relational DBMS.
What is a Relational DBMS?
Before moving on we should pause momentarily to define just what a relational database system is. At a high-level, it is one that is implemented based on Codd’s 12 rules. Relational is traditionally defined as adhering to these rules, as well as Codd’s original 1970 paper.
But the first thing we have to admit is that there are no “true” relational database management systems. Although IBM Db2, Oracle, Microsoft SQL Server, PostgreSQL, and many others are based on the relational model, none of them support all the features that would make them truly relational. It is probably better to refer to them as “SQL DBMS products.” However, popular nomenclature does not always adhere strictly to scientific research principles, so using the term “relational” to define database systems that adhere at least mostly to Codd’s 12 principles is not going to go away.
Furthermore, it is worth discussing SQL, at least a little bit here. SQL, or Structured Query Language, is the standard query language for interacting with a relational database management system. Indeed, every RDBMS — and many non-relational DBMS products — supports SQL as the method for accessing data. Although a query language is a requirement for relational, it need not be SQL. Nevertheless, although SQL has its critics, it remains an easy-to-use, effective language for manipulating and accessing data. The popularity of SQL, and its relative portability from one DBMS to another, is likely one of the primary reasons that relational became and stayed popular for as long as it has.
What About Now?
These days, circa 2023, relational databases are viewed by some as ageing, impractical, and costly. New types of database systems are regularly popping up claiming to be better than relational. But is relational technology really outdated? There are all sorts of pretenders to the throne for data management including Hadoop, and the various types of NoSQL and NewSQL database products. But what are the actual shortcomings in the relational way of doing things that would require its wholesale replacement?
Let’s discuss some of the new types of database systems. The first and most pervasive “new” type of DBMS is the NoSQL database system. Unlike relational, there is no specific, defined model of data for NoSQL. Indeed, there are multiple different types of NoSQL databases, so the term NoSQL probably is not a very useful one, other than to lump a bunch of technologies together into a single bucket and make some generalisations about it.
At a high-level, there are four different types of “NoSQL” database systems:
- Document database systems store data in document-like structures using formats such as JSON, BSON, and XML. MongoDB and RavenDB are examples of document database systems.
- Key-value databases store pairs of unique keys and associated values. The data is essentially the key and a block of data related to the key. Redis and Aerospike are examples of key-value database systems.
- Graph database systems are based on graph theory. Data domains are described as a connected graph of nodes and relationships with properties and labels. Neo4j and Amazon Neptune are examples of graph database systems.
- Wide-column store databases maintain data in tables that contain large numbers of columns (in a flexible schema). Cassandra and HBase are examples of wide-column store database systems.
Stating the obvious, each type of NoSQL DBMS behaves differently and works better for different use cases. That said, there are some high-level issues pertaining to NoSQL DBMS offerings that need to be recognised:
- Although the name is NoSQL, most of these products have adopted SQL, or a SQL-like language to make it easier to access their data. This speaks loudly to the pervasiveness of SQL and its efficacy.
- Most of the NoSQL offerings are promoted as schema-less or as providing flexible schema. All this is saying is that you do not define a rigid schema in the database. In order to use the data, you must have a schema though. This means that you will have to provide a schema (data type, length, definition, etc.) to access the data even though none is available in the DBMS. Or, if the schema is flexible, then you must programmatically access the schema because every iteration of the data may have a different structure. If you truly think about this, lacking a predefined, rigid schema is only beneficial in a few, particular use cases. Enforcing the schema in the database helps to enforce data quality and to simplify data access.
- Many of the NoSQL DBMS products do not provide ACID transaction consistency. ACID ensures that transactions are always consistent, thereby protecting the quality of data in the database. Instead, many NoSQL offering deliver eventual consistency, which means that the data, for a period of time anyway, can be inconsistent. See ACID vs. BASE for more details.
The thing to takeaway here is that a NoSQL database may be a reasonable choice for a particular use case, whereas a relational database is a general good choice for most use cases.
Then there are the NewSQL DBMS products, which are basically a new class of modern relational DBMS. These database systems are engineered to deliver improved scalability for online workloads but also with ACID compliance for data consistency. Basically, a NewSQL DBMS can be thought of as a relational, SQL database system with a distributed, fault-tolerant architecture. So, this is more a case of modernising relational, than replacing it.
An additional point of confusion comes into play with the cloud. But the cloud is not a type of DBMS, it is a deployment platform. So, a cloud DBMS can be relational, NoSQL, or indeed, any other type of DBMS. But the cloud is important as it is currently the predominant method of deployment for applications.
What this means is that you have to work your way through the marketing hype to reveal the truth hidden underneath. Consider, for example, Snowflake, which at its heart is a relational DBMS. It is built for the cloud, with a data platform for data governance, data protection, and collaboration. Why do I mention Snowflake here? Because of the hype surrounding it.
The company named Snowflake, which was founded based on its Snowflake cloud database platform, went public in 2020. The stock began trading at $345/share which valued the company at almost $68 billion! For a company generating about $500 million in annualised revenue. To put that into perspective, the market cap of IBM is about $120 billion on annual revenue of around $57 billion. Clearly, this is an example of hype over value. Of course, a more recent (February 2023) share price for Snowflake is around $163/share.
So, Is Relational Over?
Let’s circle back to the genesis of this article and ask the question “Is the relational database dead or dying?”
My answer is: absolutely not! Relational database systems are all around us, powering the infrastructure of modern business. Furthermore, they offer a well-defined, general-purpose mechanism for data storage, management, and access. Not to mention, they have a history that spans 50 years of research and development, making it a tried-and-true technology for today, and well into the future
And let’s not forget that relational databases use SQL. And there are many SQL developers and coders “out there” who can build and create systems using relational database systems.
Whither goest relational? It goes with us, bravely into the future!