As we approach the end of one year and anticipate the next, it is always a good time to take stock of things and plan for an effective and constructive new year. With that in mind, let’s take a look at the realm of data management and how it is being used advantageously by businesses circa 2022/2023.
Data will continue to be a focal point for every organisation. Data is required to conduct business and focusing on its effectiveness, accuracy, and usability will continue to deliver benefits. Without data you cannot reliably make informed decisions. As W. Edward Deming famously said, “Without data you’re just another person with an opinion.”
So, what are the trends that will impact how data is managed and used circa 2023?
Digital Transformation
The first thing to recognize is the impact of digital transformation and the pandemic on not just business, but life in general. These had a transformative effect on how we manage and process data.
Customer behavior changed but so has employee behavior. As such, the trend is for data to be made available for access outside of the traditional methods. This means remote access and remote work must continue to be accommodated. It also means that mobile access to data of all types must be provided.
Cloud-First Delivery
Undoubtedly, cloud computing is one of the most powerful trends ever to impact information technology. The general idea of cloud computing is using a network of remote servers hosted on the Internet rather than a local server or a personal computer. Analysts at Gartner estimate that as much as 95% of new digital workloads will be deployed on cloud-native platforms by 2025.
It is clear that the emphasis has shifted to the cloud and many DBMS providers have shifted to a cloud-first delivery strategy. This means that features and new capabilities show up first in the cloud DBMS offering, and later in the on=premises offering. This shift requires attention to the details of the new capabilities in terms of applications that access the cloud offering, because the changes may impact application behavior. But it also requires DBAs to monitor and track when the capabilities make their way to the on-premises software. More about on-premises databases in a moment.
So, a lot of data will be hosted in the cloud or accessed by cloud services. This means that organizations need to have sufficient capabilities to manage cloud data and cloud database implementations. What database services are provided by the cloud service provider and what must still be provided by in-house database administrators and developers? And even for completely managed cloud databases organizations would be wise to staff professionals who can oversee those services and ensure that they are performing optimally and within budget.
Finally, we can expect most organizations to adopt a hybrid multicloud approach, at least for the foreseeable future. This means supporting applications and databases with work both on-premises and off-premises, preferably managed in an integrated fashion. Why? Well, it is improbable that all of an organization’s databases will move to the cloud, at least initially. Literally decades of work went into building the existing infrastructure at many organizations and it cannot be magically switched to the cloud without effort and cost. That means we must work in a hybrid environment, maintaining traditional skillsets for on-prem data, while embracing and extending our capabilities to manage the cloud data.
Improved Data Governance Efforts
Organizations are more actively embracing the truism that you cannot effectively manage and take advantage of that which you do not understand. Practicing data governance helps organizations to understand and effectively utilize their data through activities such as assigning data stewardship, establishing data quality initiatives, and adopting master data management (MDM) programs.
According to TechTarget, “Data governance (DG) is the process of managing the availability, usability, integrity and security of the data in enterprise systems, based on internal data standards and policies that also control data usage. Effective data governance ensures that data is consistent and trustworthy and doesn’t get misused.”
As the amount of data continues to grow, and its importance to the organization increases, effective governance of data becomes increasingly essential. Data governance, properly implemented, can ensure that data is not only understood properly, but that it managed to improve its accuracy and that it is used appropriately and consistently throughout the organization.
One of the factors driving the adoption of data governance programs is the need to comply with industry and governmental regulations. Regulations like GDPR and CCPA, and standards like PCI-DSS require strict data management procedures be implemented to ensure compliance. Data governance practices like data mapping and classification to ensure appropriate data usage, protection, and disposition will benefit any compliance-related effort. Without effective data governance and management in place compliance will be difficult, if not impossible.
Additionally, some of these trends are inter-connected. For example, consider that an organization with solid data governance in place will have an easier time migrating data to the cloud.
Data governance is something that all organizations do, at least to some extent, to enable data to be utilized. Of course, the actual name “data governance” may not be used to describe the practice. For example, you may know that Joe in accounting is the go-to guy to get information about your accounts. That means he is likely the data steward, even if he does not have that title. Nevertheless, more organizations are embarking on establishing a more comprehensive and controlled program for data governance.
The Rise of the Data Catalog
And that brings us to the data catalog, which extends the concept of metadata capture and management further through automation and modern discovery techniques. Gartner has defined a data catalog as a tool that “creates and maintains an inventory of data assets through the discovery, description and organization of distributed datasets.”
The ability to keep the information in a data catalog up-to-date using automated discovery and search techniques solves the greatest failure point of earlier Repository and data dictionary products—keeping the data fresh and useful without requiring tedious manual processes. In many cases, AI and machine learning capabilities are being built into data catalogs for scanning and automatically discovering metadata and meaningful relationships among and between the data elements. By using machine learning to simplify and automate the discovery and maintenance of metadata, a searchable data catalog can act as a recommendation engine for data.
It has long been the goal of data professionals to create and maintain an inventory of all the data assets of an organization, but the goal has proven to be difficult and costly. Data silos, immature tools, rapid data growth, and multiple copies of data have contributed to this failure. But now with the data catalog and its improved capabilities for automation, discovery, and classification, the ability to create and maintain an inventory of your organization’s data looms as a possibility.
Look for a steady increase in data catalog usage as the product category proves its ability to deliver context to data. As such, it can better enable data scientists, developers, data analysts, and other business data consumers to find the data they need and understand the meaning of the data they are using.
Big Data… Yes, Big Data!
Big data is definitely still a trend but not the pervasive term that it used to be. By this I mean that the volume, variety, and velocity of data creation and storage has not slowed down. More and more data is being created and stored than ever before.
Analysts and IDC estimate that from 2020 to 2025, new data creation will expand at a compound annual growth rate (CAGR) of 23 percent. They further estimate that by 2025 there will be around 175 zettabytes of data created. Double-checking with analysts at Statista, their estimate is that the total amount of data to be created, captured, copied and consumed globally in 2022 was 97 zettabytes, growing to 181 zettabytes by 2025.
We are still squarely in the age of Big Data… don’t let anyone tell you otherwise. This can be verified further by examining the technologies in Gartner’s “Hype Cycle for Data Management, 2022.” This study shows several technologies reaching the plateau of productivity this past year, including multimodel DBMS, wide=column DBMS, and data preparation and integration tools. In-DBMS analytics is climbing the slope of enlightenment and should reach the plateau of productivity in 2023. So, we should continue to expect growing adoption and reliance upon technologies that were spawned from the Big Data era including data lakes and other types of DBMS offerings.
The trend here is that what we used to call “Big Data” is now just an overall part of the entire data management spectrum. And data management practices will continue to adopt and embrace as mainstream the things that were once seen as dramatically different.
Data Fabric and Data Mesh
Yet another significant trend that will continue to expand in 2023 is the adoption of data fabric and data mesh technologies.
The goal of data fabric and data mesh is to provide better control over your data, and to deliver better understanding and shareability of information about what data resources are available. Most organizations have adopted data technologies such as DBMS products, data movement software, data catalogs, data modeling tools, MDM solutions, and so on. But these solutions were adopted from different vendors over a long period of time.
So, there was really no integration between them. Data fabric and data mesh can help to deliver an integrated framework to enable organizations to better manage and use data, wherever it resides, on-premises and in the cloud. Both focus on the delivery of self-service data and are built on modern technologies such as AI and machine learning.
Additional Trends
Of course, there are many additional trends that will impact data management in the upcoming years. These include:
Automation, which has been trending upward continuously for some time now. More processes, automated more efficiently, can help tpo improve data consumption, data administration, and data movement.
Artificial Intelligence, which has also been trending upward for the past few years, is becoming more capable and more effective. It will continue to find its way increasingly into more applications and databases. And we will start to see sub-categories of AI, such as Natural Language Processing (NLP), begin to become more mainstream and useful.
And again, there is interplay between the trends. As AI and machine learning (ML) capabilities mature and get embedded into automation tools, DBAs can offload some forms of performance management, change management, and database provisioning to the tooling. The technology is becoming more robust, so DBAs and their organizations will begin to prepare for and invest in automated intelligence to take advantage of it as it matures.
The trend to provide continuous delivery of improved software will continue as DevOps and DataOps become more widely adopted. This means that many DBAs are working in teams with developers instead of in teams of other DBAs, at least for periods of time when development projects are very active. Working together in teams to improve software delivery works, and DevOps, implemented effectively with a view of data and database management, is an effective way to implement this.
Furthermore, negative trends are not going away. Consider that data breaches continue to happen, even though there has been considerable attention paid to them, at least in terms of the IT news cycle. The Privacy Rights Clearinghouse, which began keeping records on data breaches back on February 15, 2005, contains data on over 9000 data breaches impacting over 10.5 billion total records that have occurred since then. This is an average of 12 data breaches a week! So even though I’d like to report that data breaches will decline, “all signs point to no” as the Magic 8 Ball might say!
Which brings me to the last of the data management trends I want to mention: data protection and security. More needs to be done to better protect and secure sensitive corporate data. This means that organisations will continue to look for improved technologies to secure data. This include spending on technologies like improved data encryption, data masking, identity and access management, intrusion detection, and database auditing.
The Bottom Line
That is a lot of different trends impacting data and database management. Some organisations will adopt these technologies more rapidly than others, but data professionals would be wise to dig in and begin to understand these trends and how they impact their jobs.