Clean Architecture Series: The Database
Because it holds the physical representation of a Model (which in turn is critical to the existence of a software system), the Data is certainly very important. When Edgar Codd defined the principles of Relational Databases back in 1970, everybody was excited about this new cool technology that provides an elegant, disciplined and robust way of accessing data. But no matter how good or brilliant a technology may be, it remains technology - and that means it’s just a detail.
While Data is critical to the existence of a software system, how the Data is accessed is not. One of the biggest mistakes made during the last decades was to put Databases at the core of software systems which, in other words, means that the system would solely rely on details (i.e. how to access data) and not on policies (i.e. what is the actual data). If we really think about it, nothing is exciting about Data being arranged into rows within tables. At the end of the day, the Database is just a piece of software and thus, by definition, it should be easy to change/replace. Considering database rows to be the model of our Application is an architectural error that causes business rules, use cases and even some parts of the UI to be tightly coupled to the relational structure of the Data. The Database represents just a mechanism for accessing data from persistent storage (i.e. non-volatile memory), but from an architectural point of view, this is a low-level detail that should be deferred for as long as possible. Thus the Database is to be decided later upon and should not pollute the early architecture of a system.
Why do Relational Database Systems seem to be everywhere?
The reason why database systems are so prevalent today is because of rotating magnetic disks, which have been (and to a certain degree, still are) the industry standard of persisting data for more than five decades. While these disks have evolved over the years, allowing more data to be stored on less space, there is one technology trait that has remained the same: disks are slow (i.e. at least 100,000 times slower than RAM when comparing access times). To mitigate this time delay imposed by disks, you need special data structures called indexes, caches and optimized execution plans and query schemes - in a word, a Relational Database Management System (RDBMS - e.g. Microsoft SQL Server, MySql, Oracle DB). These systems are content-based, which allows them to provide a natural and convenient way to find records based on their content. Additionally, each of these systems eventually brings some data into memory, where it can be quickly manipulated.
But what if there were no Disks? Then, we wouldn’t need SQL at all! As popular as they have been, disks are will soon be going the same way as floppy and CD did. They will eventually be replaced by RAM. And when this will happen (which will happen pretty soon, given AMD Epyc servers holding up to 4TB RAM and Intel Optane SSDs offering non-volatile RAM-like performance), RDBMS will die along with their beloved Relational Model. And what kind of data structures will programmers use then? As surprising as it might sound, programmers will use the same data structures that they will have been using until then, namely lists, hash sets, stacks, queues, etc. The reason behind this is that we, as programmers, rarely leave the data in the form of rows or tables - we usually load it up into memory and rearrange it according to our needs. At the end of the day, databases are all about moving data from RAM to disks in an efficient way as possible. From an architectural point of view, this is irrelevant and indeed, we should not care about the existence of disks at all. As important as the performance of a system may be, it should be clearly separated from the business rules, as it has nothing to do with the overall architecture of that system. To conclude with, the Business Data Model is architecturally important, while the technology and systems that manipulate data on rotating magnetic surfaces are not. Data is significant - SQL is not.