Why it is time for Data Mesh to fuel your NextGen SAP Analytics Strategy

Gaining Speed and lower TCO with SAP DataSphere supporting your Data Mesh

Companies often struggle with choice of what enterprise platform(s) to support its future analytics initiatives without first cementing how it wants to drive analytics.

Data Fabric vs Data Mesh

Two concepts that companies are frequently trying to mirror in its setup are data fabric or data mesh. According to Gartner’s definition, a data fabric is an emerging data management design for attaining flexible, reusable, and augmented data integration pipelines, services, and semantics (often via a universal layer support by IT), whilst Gartner´s definition for data mesh is a cultural and organizational shift for data management focusing on federation technology that emphasizes the authority of localized data.

So, do companies need to choose one over the other? Absolutely not, in fact data fabric and data mesh concepts can very much be complementary, but it is important to understand 1) how underlying technologies supporting localized data and federation of the same is rapidly maturing and 2) how one’s organization is ready to support the data mesh concept - as it demands higher degree of business ownership than data fabric.

Data Federation Maturity (in An ERP Context)

Let’s address the maturity question in the context of ERP data where SAP plays an important role for roughly 80% of fortune 500 companies. Companies who already upgraded to S/4 HANA (SAP’s NextGen ERP HANA platform) or are in progress of upgrading, can tap into thousands of pre-delivered Fiori apps and underlying business models (known as ABAP CDS models) supporting all areas of its enterprise processes.

Although S/4 does offer customization options via both Fiori apps and developer tools, it is not positioned to support more complex customizations, e.g. linking up to other SAP enterprise products or external data. SAP is positioning its NextGen analytics offering SAP Datasphere (SQL and Python supported) to service those scenarios. What sets SAP Datasphere apart from other enterprise analytics platforms when modelling with ERP data is its ability to support modelling across more complex S/4 ABAP CDS views in federated mode using the proprietary ABAP SQL service that will allow push down of filters, joins and more complex logic and with its semantic onboarding concept allow business owners to rapidly develop ERP Data Products, in its own spaces, in matter of minutes or hours, then consumable by other enterprise analytics platforms via its open JDBC access protocol. 

So, what will the architecture look like with SAP Datasphere? Think of SAP Datasphere as a proxy in term of representing ERP data products (not as a separate platform) as it is deeply integrated with S/4:

The failure of data lakes from an ERP standpoint

Over the past 5-years most companies have from an ERP data standpoint practiced more of a data fabric concept with replicating data in change delta capture mode from both SAP ECC and SAP S/4 into file storage layers such as S/3 and ADLS G2 only to then start the journey of re-composing the vast business models (ABAP CDS views), which is a lengthy and expensive proposition. That is not to say that file storage is not relevant but for ERP data, just that the semantic consumption of that ERP data in file storage layer is better orchestrated via Datasphere, at least for reporting scenarios, but use cases for ML and other AI scenarios of course could warrant to have a company’s corporate memory layer in a file storage. 

Organizational readiness

Let´s address the organizational readiness question, if companies buy into the concept of data mesh, now that we see the technology maturity question is largely addressed. What is required here? Often, companies without a data product mindset and with access to a universal consistent (file storage) data layer develop identical or similar ERP analytical solutions in stove pipes increasing the development costs manyfold. That symptom is less likely to occur if 1) the right business unit in terms of ownership within companies already develop data products based on rich standard delivered S/4 models and 2) immediately register those data products in the data catalog within Datasphere and 3) determines its approach to share those data products with the right security and workflow in mind.

How to Succeed with Data Mesh using DataSphere

A few important, soft as well as hard, principles are key in succeeding with Data Mesh when placing Datasphere in the centre for producing ERP Data Products:

Organizational mindset

For a long time, Power BI has across many companies been seen as the only self-service tool. Whilst autonomy is great when you have all data imported into Power BI, it has a lot of disadvantages: 1) high cost in getting data ready, 2) not easily sharable onwards as a data product in new modelling scenarios (stove pipe) and 3) computing at scale. So, what is the alternative? Data products needs to be orchestrated in domains with clear ownership. In Datasphere that is referred to as space or space organization. Companies will need to buy into the concept that Data Products are also constructed using a modelling layer tightly integrated with run-time (in DataSphere, SAP HANA Cloud; in DataBricks, SQL DW cluster) whilst only the visualization (interaction with the data), and perhaps simple calculations remains with Power BI.

Modelling with federation in mind

One may wonder, so how do we model Data Products with federation in mind? Datasphere, Databricks or Snowflake, the same sound modelling principles prevails when wanting to execute Data Products on the fly, it boils down to calculating in small datasets, which include 1) filtering fast, as close to table as possible, and for dynamic filters that those are mapped from consumption layer to table level; 2) modelling by exception, ie. when calculating, you only bring in minimal source columns required (not full ledger columns) and derive what you need. 3) evaluate where those derived columns are calculated (inbound, harmonization or propagation); 4) understand the difference between joins and associations from a runtime execution. Most mistakes occur when logical models are implemented 1:1 in physical models. 

Modelling pattern with use case in mind & ensure minimal regression impact

Due to powerful compute platforms, we are past the days with traditional data warehousing principles where sub-products were calculated at the EDW layer for re-use (in-between inbound and consumption layer). Now that re-use is promoted by having those sub-products either available as more complex standard delivered ERP business models (ABAP CDS views) or in central repositories. It can sound a bit counter intuitive, but companies do not want any re-use of sub-products because ownership is unclear and often data products using identical artefacts, e.g. billing and financial ledger in combination from S/4 wants to attack its analytics from a different angle. E.g. sales team wants to understand how sales orders are shipped, hence, narrowing your datasets starts from your sales invoice ledger. Other business units like accounts receivable, wants to see how those sales orders are being paid and hence wants to start from its financial ledger and then link back to select attributes in its sales invoice ledger. With Data Mesh, we are past the days where linked datasets are persisted and used for all-purpose use cases.

Promote Data Ownership & Sharing Data Products

Often companies will have two or three major analytics platforms as each what have its own strengths and that is exactly what Data Mesh when executed correctly is supporting. It is pivotal that companies already are or have a 2–4-year plan to co-locate its major sources of data on a single hyperscaler of choice (be that AWS, Azure or Google Cloud) to ensure minimal network latency, easy of security design and integration. Secondly, vendor partnerships increasingly play an important role when it comes to supporting best practice modelling principles. For example, when operating multiple analytics platforms companies typically want a centralized data catalog (in practice local platform data lineage tools are still relevant for deeper technical insight), centralized security platform from here to push unified data policies, and centralized modelling principles so that data duplication across analytics platforms stays minimal. Often, datasets are blindly copied in full whereas perhaps it only warrants to copy a select set of columns with more complex scenarios. SAP has from a vendor partnership context signed long-term commitments with the likes of Collibra (for centralized data catalog), Databricks (in supporting cross platform Data Product scenarios and centralized modelling principles). It would be good to see SAP also initiate partnership with a leading security platform such as Inmuta, although leading platforms all promote open API principles these days. 

Where do companies start?

Every company has a different analytics legacy setup and organizational maturity (read culture), so it is important to always do a baseline assessment to understand if Data Mesh is right for your company. If done right, what companies will harvest with Data Mesh, particularly for ERP Data Products is much lower TCO (Total Cost of Ownership), not just in terms of platform run cost with significantly less batch processing, but also in speed of delivering new ERP Data Products but it requires proactive thinking on how to centralize data catalog, security policies as well as modelling principles.

In essence, SAP DataSphere is not just a tool; it's a paradigm shift in moving data closer to business user and in helping them deliver ERP Data Products at speed and for openly sharing those ERP Data Products across its various enterprise platforms leveraging a Data Mesh concept.
Mogens Madsen, Partner Nordics