Lots of organizations struggle to manage their data in a way that enables efficient processing and activation of insights from different sources and applications to meet their D&A requirements. Why? The rise in data and application silos in the last decade is old news. Less frequently discussed is the fact that the number of data engineers with the capabilities to loose data from its hidey-holes and derive insights has stayed stagnant or dropped.
More importantly, pre-AI, the technology to derive semantic insights — patterns from natural language, i.e. what people actually care about rather than highly reduced automated heuristics like survey scores — didn’t exist. The outcome: time between the initiation of a data integration request and its realization has reached an unprecedented peak. Growing adoption of cloud solutions for data handling only intensifies the challenge of establishing and upholding a harmonized data management blueprint that addresses both current and future data integration demands.
A few terms and ideas have begun floating around to address this challenge of both integrating data to derive holistic insights and making these insights easily available in business tools. The language is comparable but the forms are different. While data fabric refers to a data management design, data mesh is productized into technology layers. Data orchestration bridges the gap between both. Below, a primer on these emerging technologies that represent the billion dollar intelligence layer.
What is Data Fabric?
Gartner defines data fabric as “an emerging data management design that enables augmented data integration and sharing across heterogeneous data sources.” In this model, data fabric is a knowledge discovery pipeline. It bridges data from varied applications, no matter their original design or location.
Aimed at achieving adaptable, recyclable, and enhanced data integration processes and services, these cater to a range of operational and analytic needs and are available on various deployment and management platforms. Data fabrics merge multiple data integration approaches and employ active metadata, knowledge graphs, semantics, and machine learning to bolster the design and implementation of data integration.
The intertwined data within this fabric gives rise to dynamic interactions stemming from current and emerging data elements. This dynamic nature is a departure from the conventional static views seen in reports and dashboards. For instance, by employing a data fabric, supply chain experts can correlate supplier lag times with production hold-ups, capturing these connections as soon as the data surfaces. As such, data fabric empowers these experts to spot budding threats and make real-time, informed choices.
Through timely and credible suggestions, data fabric propels data integration. It assures business personnel of data reliability. Moreover, it empowers citizen developers, broadening their capabilities in the realm of data integration and modeling.
What is Data Mesh?
“Data fabric” and “data mesh” are often used interchangeably, understandably. The differences are subtle but important and have primarily to do with implementation readiness around meta-data availability.
Data fabric is an emerging data management design that is primarily a technology pattern driven by utilizing metadata to automate data management tasks. Data mesh, as defined by Gartner, is an architectural approach driven by the federation of data management and governance responsibilities.
Both designed to democratize access to data, a data mesh is a strategic approach to creating data-driven products for businesses, emphasizing a technology-agnostic stance. On the other hand, a data fabric is a flexible technology framework that can be employed for various purposes and outputs.
One interpretation of the core concept involves using domain-driven design and product-oriented approaches to tackle issues within the realm of data and analytics. Similar to how a DevOps culture is built, fostering a data mesh culture revolves around connecting individuals, fostering empathy, and establishing a framework of shared responsibilities. In doing so, it becomes possible to scale the sustainable generation of business value from data.
The ultimate objective is to foster a close collaboration between data creators and data users. Ideally, it would be best if the same team could both generate and utilize the data, aligning interests, responsibilities, and capabilities within a single group. However, in practice, this often isn’t feasible because data-producing teams already have numerous responsibilities in their specific domain, making it challenging to also manage data consumption applications. Thus, a significant step in the right direction is to split these roles into two teams that can communicate directly without intermediaries.
The primary aim of a data-producing team should be to present their data in a way that allows others to extract value without requiring intricate domain knowledge – in other words, data producers should shield the “behind-the-scenes” technical details. Such a team could also simultaneously assume the role of data consumers. Some data domains are inherently complex and necessitate an entire team of domain experts who, in turn, draw data from a source-aligned data domain.
What is Data Orchestration?
Orchestration refers to a technology layer that harmonizes a multitude of systems, applications, and services, linking data together to generate and activate robust, on-demand insights wherever they’re needed without the aid of engineering resources.
As McKinsey puts it, “Only a fraction of data from connected devices is ingested, processed, queried, and analyzed in real time due to the limits of legacy technology structures, the challenges of adopting more modern architectural elements, and the high computational demands of intensive, real-time processing jobs. Companies often must choose between speed and computational intensity, which can delay more sophisticated analyses and inhibit the implementation of real-time use cases.”
Data orchestration activates unified intelligence from disparate data securely in a self-service manner to enable a holistic, robust view of customer and business data across the enterprise to support applications and insights.