If you have been paying attention to the recent trend in data architecture and data management, then you have undoubtedly heard the terms “Data Mesh” and “Data Fabric.” But it is also likely that you do not understand what these terms mean. And that is to be expected, because they are thrown around a lot by vendors and analysts without always providing explicit definitions or explanations.
But there is no denying that the current trend is to better control and manage data and to automate data management processes. Both data mesh and data fabric are methods of achieving these goals.
Why Now?
We must raise the question “Why are we seeing an increased interest in concepts like data mesh and data fabric?” I mean, data management and administration have been long-term goals of IT for decades now. But it always seems like organisations give up on fully implementing data management, including things like properly defining metadata, defining data usage rules, establishing data lineage, and so on. Sure, company executives mouth platitudes like “We treat data as a corporate asset,” but if you look at what is being implemented, it is clear that they don’t really.
So why now? Well, the vast amount of data being created and accumulated makes it imperative for organisations to implement better data infrastructure and management capabilities. Current trends like analytics and AI are driven by data. But if you don’t know what data you have, how to get access to it, and ensure its quality, then any attempt to use your data for analytical or AI purposes will surely fail. The bottom line is that putting AI on top of bad data just results in bad decisions!
Furthermore, data infrastructure at most organisations is simply too complex to understand without documentation and assistance. There is structured and unstructured data, production and test data, streaming data, and more. And we are creating so much new data daily that it is difficult to track. Once data arrives within an organisation, it is not static. It gets ingested, modified, and then moved all over the place. Without a means of controlling and managing this labyrinth of data there is no hope for understanding what data you have and how to use it.
Additionally, with more data analysts and data scientists requiring access to data, proper information about what data is available is necessary… as well as ways of providing self-service access to that data. I mean, do you really want to task your DBAs with procuring every piece of data that these data professionals require?
How Can We Do It?
So, the goal is to provide better control over your data, but also to better understand and share information about what data resources are available. Accurate data definitions, automated management and procurement, and self-service are the current goals of most organisations’ data management programs.
And that brings us to Data Mesh and Data Fabric, two competing but complementary frameworks (or concepts) for data management. What are the differences between the two? Well, I won’t dig into all the details here, after all there are numerous definitions and explanations “out there on the web.” I will, however, offer up my high-level overview.
The goal of both is to enable organisations to better manage and use data, wherever it resides, on-premises and in the cloud. Both focus on the delivery of self-service data and are built on modern technologies such as AI and machine learning. All of these are laudable goals.
Data Fabric generally includes an architecture combined with services that enable orchestration and management of data. With a Data Fabric approach, there is typically a single unified data architecture with an integrated set of technologies and services on top of that architecture. Data may be all over the place, but it is integrated by the technologies and services of the Data Fabric.
These technologies and services exist to define, describe, and enrich the data with the goal of ensuring its quality and accessibility. The Data Fabric provides the capabilities for data management and data governance, as well as self-service to data across the organisation. Usually, the Data Fabric provides a data catalog, data pipeline management, and other key aspects of data management, all accessible via a unified architecture.
Data Mesh grew more out of the data warehousing and data lake world, and it is more of an API-driven approach. It focuses more on people than technology, relying on subject matter experts who administer domains within the Data Mesh. A domain, in this context, is essentially a group of micro-services that facilitate access to data via APIs.
Experts with a core understanding of the data within their domain are responsible for establishing and ensuring all the ongoing management needs of their data, including data standards, data governance, and all things associated with data. The mesh is intended to extend across all data sources, locations, and types, delivering access to consumers of the data.
Of course, there are additional concepts behind all this, for example, continuous data quality improvement, populating and maintaining the data catalog, and so on. But there are beneficial qualities to both approaches. And the two can interact with, and augment each other.
What Does It Mean to You?
But the terms “data fabric” and “data mesh” are not used consistently in the market. In some cases, analyst groups try to put together a concise definition and list features or aspects that are required to conform to their definition. In some cases, vendors claim a product or service delivers a data fabric or a data mesh. In other cases, I’ve seen data fabric used as a synonym for data architecture.
“So what?” you may ask. And that is a great question. I don’t care if it is a mesh or a fabric or an architecture. The bottom line is that if it has “data” in its title, I like it! As long as it helps to facilitate the management, sharing, and usage of data, it can be helpful!
It’s all data management. And it needs to be done better — and more consistently — throughout all industries. If you’re doing things like ensuring data quality, establishing data stewardship, governing your data, ensuring compliance, and cataloging your data, these are long-overdue requirements I’m pleased to see being taken more seriously.
Consider data catalogs, which are a needed component of data architecture to enable the creation and management of metadata that allows us to actually use data. It sounds a lot like a data dictionary circa 1985, doesn’t it? Or a repository circa 1995? I guess if you wait around long enough, things have a way of resurrecting themselves… at least if they are useful. But they almost always have a new name!
So, now we call it a data catalog. Oh sure, there are differences, such as the incorporation of machine learning to improve the metadata, a social media-like capability for users to define and improve metadata (like a Wiki), and the ability to perform semantic Google-like searches on the metadata. But it is still the same concept. We needed it then, and we need it even more now!
Some Useful Advice
OK, with all the above in mind, what steps make sense for your organisation to pursue? After all, the confusion surrounding “all things data” is real, but so are the challenges of managing data, as well as the potential benefits of gaining more control over your data assets.
The first thing to keep in mind is that you do not have to choose between data mesh and data fabric. Not only can a data mesh integrate and complement a data fabric (and vice versa), but both can also integrate and augment data management software already in use, thereby increasing its effectiveness.
Worry less about what is a mesh and what is a fabric and more about what you are attempting to accomplish. Work toward automating data management, assuring data governance, and establishing as much self-service to data as possible. And feel free to use concepts, processes, and tools from any source or framework that works for you and your organisation.
Think more of your data architecture and how you want to build it and manage it than whether you are implementing a data fabric or a data mesh, or indeed, any future “data thingie” that shows up.
As you automate, look to adopt and embrace AI and machine learning capabilities to bolster and improve your data architecture.
Finally, do not forget people and processes. You cannot solve all your data management issues with technology alone. Transforming your people and processes is just as crucial as transforming your data!