Connect with us

Database

IBM Db2 13 for z/OS SQL Data Insights

Published

on

On May 31, 2022, IBM released its latest and greatest model of the mainframe, the IBM z16. And there was a lot of good “stuff” that came with that announcement, so much so that other things announced the same day were a bit overshadowed… like Db2 13 for z/OS!

This is unfortunate for many reasons. First of all, it is the first new version of Db2 in a long time. IBM switched to a continuous delivery model with Db2 12 for z/OS, which has been generally available since 2016. And so it makes sense that there would be fewer new versions because new capabilities were regularly added to Db2 via function levels. Nevertheless, it would have been nice for IBM to make a big deal about Db2 13… at least for long-time users.

But there is also a lot of new functionality worth promoting and learning in this new version! Perhaps the biggest new capability in Db2 13 for z/OS is SQL Data Insights. This feature delivers new AI functions built right into Db2 and accessible using Db2 SQL queries. By combining deep learning in AI with the new IBM Z processor, SQL Data Insights enables users to write SQL-based semantic queries on their Db2 tables and views.

Let’s face it, though, there is a lot of noise “out there” about AI and machine learning and deep learning, but very little actually being delivered as a core function at the DBMS level. But SQL Data Insights is a built-in capability of Db2 delivered with the new release at no additional cost. Because it is part of Db2, there is no need to move data around (such as with ETL) before you can perform AI functions on it. The data lives on the mainframe, Db2 lives on the mainframe, and the AI delivered with SQL Data Insights is part of Db2!

SQL Data Insights should be helpful to organizations looking to uncover heretofore unknown relationships in their data. And because it uses built-in functions, you can use it anywhere you use SQL!

From a developer perspective, the basic functionality of SQL Data Insights is delivered via three new built-in functions:

The first function, AI_SIMILARITY, is used to compute score that can be used to compare comparing data for similarity. For example, you could specify a customer and ask Db2 to return other customers that are most similar to it; or most dissimilar. And that can be quite useful for organizations looking to improve their understanding of their market and customers. And this is just one use case. Any data stored in Db2 is fair game.

Another new function delivered with SQL Data Insights in AI_SEMANTIC_CLUSTER. This function computes a semantic clustering score of a member argument against a set of clustering arguments. For example, you could specify a set of customers and ask Db2 to return other customers that best belong to that set.

And finally, we have the AI_ANALOGY function, which computes an analogy score between two sets of values. This function works like an analogy: A:B as X:?. You can use analogy queries to determine whether a relationship between a pair of entities applies to a second pair of entities. For example, if a customer prefers certain products, can we find another customer with a similar preference perhaps for other products?

Keep in mind that SQL Data Insights is free with Db2 13, but it is also optional, so it must be installed before you can use it. Furthermore, before any of these AI functions can be used, it is necessary to first train an AI model by collecting key statistics and building metric scores for the functions to use. An embedded Apache Spark cluster is used for training the machine learning model during the AI query enablement process. If you have ever built such AI models you will know that the process can be lengthy and consume a lot of CPU resources. Fortunately, zIIPs can be used to build the models.

So, there will be some work to do before you can start using the AI capabilities of SQL Data Insights. The manuals use the term “enabling an AI query” and it is relatively easy to request a table or view to be made ready for use with the AI query functions. After the object has been added, you can further refine your request, choosing columns to include and exclude as needed, filtering out any values not needed, and then enabling the query. But as the enabling process involves training a machine learning model and loading the model into Db2, it can take some time before the data can be queried. But it can all be queried right there on the mainframe, where the data lives!

The Bottom Line

With SQL Data Insights and Db2 13 you can extend your queries into the realm of AI, which is exciting because it can help you gather heretofore undiscovered insight into your data.

Of course, this is a high-level overview of SQL Data Insights. Consult the Db2 documentation for details on requirements, installation, enabling queries, and the actual formulation of queries using these functions.

Database

Data Mesh? Data Fabric? I Don’t Care What You Call It, You Need It!

Published

on

If you have been paying attention to the recent trend in data architecture and data management, then you have undoubtedly heard the terms “Data Mesh” and “Data Fabric.” But it is also likely that you do not understand what these terms mean. And that is to be expected, because they are thrown around a lot by vendors and analysts without always providing explicit definitions or explanations.

But there is no denying that the current trend is to better control and manage data and to automate data management processes. Both data mesh and data fabric are methods of achieving these goals.

Why Now?

We must raise the question “Why are we seeing an increased interest in concepts like data mesh and data fabric?” I mean, data management and administration have been long-term goals of IT for decades now. But it always seems like organisations give up on fully implementing data management, including things like properly defining metadata, defining data usage rules, establishing data lineage, and so on. Sure, company executives mouth platitudes like “We treat data as a corporate asset,” but if you look at what is being implemented, it is clear that they don’t really.

So why now? Well, the vast amount of data being created and accumulated makes it imperative for organisations to implement better data infrastructure and management capabilities. Current trends like analytics and AI are driven by data. But if you don’t know what data you have, how to get access to it, and ensure its quality, then any attempt to use your data for analytical or AI purposes will surely fail. The bottom line is that putting AI on top of bad data just results in bad decisions!

Furthermore, data infrastructure at most organisations is simply too complex to understand without documentation and assistance. There is structured and unstructured data, production and test data, streaming data, and more. And we are creating so much new data daily that it is difficult to track. Once data arrives within an organisation, it is not static. It gets ingested, modified, and then moved all over the place. Without a means of controlling and managing this labyrinth of data there is no hope for understanding what data you have and how to use it.

Additionally, with more data analysts and data scientists requiring access to data, proper information about what data is available is necessary… as well as ways of providing self-service access to that data. I mean, do you really want to task your DBAs with procuring every piece of data that these data professionals require?

How Can We Do It?

So, the goal is to provide better control over your data, but also to better understand and share information about what data resources are available. Accurate data definitions, automated management and procurement, and self-service are the current goals of most organisations’ data management programs.

And that brings us to Data Mesh and Data Fabric, two competing but complementary frameworks (or concepts) for data management. What are the differences between the two? Well, I won’t dig into all the details here, after all there are numerous definitions and explanations “out there on the web.” I will, however, offer up my high-level overview.

The goal of both is to enable organisations to better manage and use data, wherever it resides, on-premises and in the cloud. Both focus on the delivery of self-service data and are built on modern technologies such as AI and machine learning. All of these are laudable goals.

Data Fabric generally includes an architecture combined with services that enable orchestration and management of data. With a Data Fabric approach, there is typically a single unified data architecture with an integrated set of technologies and services on top of that architecture. Data may be all over the place, but it is integrated by the technologies and services of the Data Fabric.

These technologies and services exist to define, describe, and enrich the data with the goal of ensuring its quality and accessibility. The Data Fabric provides the capabilities for data management and data governance, as well as self-service to data across the organisation. Usually, the Data Fabric provides a data catalog, data pipeline management, and other key aspects of data management, all accessible via a unified architecture.

Data Mesh grew more out of the data warehousing and data lake world, and it is more of an API-driven approach. It focuses more on people than technology, relying on subject matter experts who administer domains within the Data Mesh. A domain, in this context, is essentially a group of micro-services that facilitate access to data via APIs.

Experts with a core understanding of the data within their domain are responsible for establishing and ensuring all the ongoing management needs of their data, including data standards, data governance, and all things associated with data. The mesh is intended to extend across all data sources, locations, and types, delivering access to consumers of the data.

Of course, there are additional concepts behind all this, for example, continuous data quality improvement, populating and maintaining the data catalog, and so on. But there are beneficial qualities to both approaches. And the two can interact with, and augment each other.

What Does It Mean to You?

But the terms “data fabric” and “data mesh” are not used consistently in the market. In some cases, analyst groups try to put together a concise definition and list features or aspects that are required to conform to their definition. In some cases, vendors claim a product or service delivers a data fabric or a data mesh. In other cases, I’ve seen data fabric used as a synonym for data architecture.

“So what?” you may ask. And that is a great question. I don’t care if it is a mesh or a fabric or an architecture. The bottom line is that if it has “data” in its title, I like it! As long as it helps to facilitate the management, sharing, and usage of data, it can be helpful!

It’s all data management. And it needs to be done better — and more consistently — throughout all industries. If you’re doing things like ensuring data quality, establishing data stewardship, governing your data, ensuring compliance, and cataloging your data, these are long-overdue requirements I’m pleased to see being taken more seriously.

Consider data catalogs, which are a needed component of data architecture to enable the creation and management of metadata that allows us to actually use data. It sounds a lot like a data dictionary circa 1985, doesn’t it? Or a repository circa 1995? I guess if you wait around long enough, things have a way of resurrecting themselves… at least if they are useful. But they almost always have a new name!

So, now we call it a data catalog. Oh sure, there are differences, such as the incorporation of machine learning to improve the metadata, a social media-like capability for users to define and improve metadata (like a Wiki), and the ability to perform semantic Google-like searches on the metadata. But it is still the same concept. We needed it then, and we need it even more now!

Some Useful Advice

OK, with all the above in mind, what steps make sense for your organisation to pursue? After all, the confusion surrounding “all things data” is real, but so are the challenges of managing data, as well as the potential benefits of gaining more control over your data assets.

The first thing to keep in mind is that you do not have to choose between data mesh and data fabric. Not only can a data mesh integrate and complement a data fabric (and vice versa), but both can also integrate and augment data management software already in use, thereby increasing its effectiveness.

Worry less about what is a mesh and what is a fabric and more about what you are attempting to accomplish. Work toward automating data management, assuring data governance, and establishing as much self-service to data as possible. And feel free to use concepts, processes, and tools from any source or framework that works for you and your organisation.

Think more of your data architecture and how you want to build it and manage it than whether you are implementing a data fabric or a data mesh, or indeed, any future “data thingie” that shows up.

As you automate, look to adopt and embrace AI and machine learning capabilities to bolster and improve your data architecture.

Finally, do not forget people and processes. You cannot solve all your data management issues with technology alone. Transforming your people and processes is just as crucial as transforming your data!

Continue Reading

Trending On Elnion

Copyright © 2021 ELNION ONLINE - All rights reserved.