Enterprise Self-Serve – How have you considered a hybrid Data Architecture approach?

baluvenkata · ‎07-03-2023

Data is the Key

Data-driven decision-making has become a ubiquitous practice across the tech industry. It is rare to find a tech company that does not leverage data and analytics to inform their decision-making. This is because data-driven analytics is widely recognized as the key to company success. It allows companies to make more informed decisions, which is critical for staying competitive in today's rapidly changing business landscape. One of the most significant advantages of data-driven analytics is that it enables companies to be more agile and responsive to changing market conditions. By constantly monitoring and analyzing data, businesses can quickly identify emerging trends and adjust their strategies accordingly.

Power of Enterprise Self-Serve

Enterprise Self-Serve refers to a system or platform that allows employees to access and use various services, tools, and resources without requiring assistance from IT or central analytics teams. The key idea behind Self-Serve is to give employees greater autonomy and flexibility while reducing the burden on IT departments and other technical staff. By allowing employees to self-serve, companies can increase efficiency, reduce costs, and empower their workforce to be more productive and innovative.

How did we get here? A bit of History.

Let's draw parallels between data platforms and data engineering and examine their evolution.

< Source - Internet>

A couple of decades ago, data was primarily stored in a centralized manner to derive insights. This was achieved through a well-established Data Warehouse, where data signals were sourced and stored in a central location using Extract, Transform, and Load (ETL) processes. flows, and data was curated with transformation logic and report tools consumed this data to present an appealing visualization like charts. With advancements in technology and the need for more, our data platform evolved from legacy systems to MPP systems to meet the demands, however, this approach had its setback, where CPU and Disk were always bundled, and one could not pick one among them if needed.

On the data engineering side, schema on write was the crux, where the knowledge source data format is important and must be in sync with the ETL/ELT/Data pipeline for the Job to run, many ETL tools, even now, few of them support this. With changing data, maintainability was an issue, and we had a dedicated data foundation team to the bridge source and sink for data to replicate.

Fast forward, to the data platform side, cloud computing took shape, where Disk and CPU were separated and one can choose among based on their demand, and this showed us the true elastic nature of our data platforms. On the data engineering side, our quest never stopped, Data lakes were then introduced as an easier way to dump all the data files of interest in one place without worrying about data structure, and one can read specific data attributes of interest, kind of Schema-on-read approach, allowing users to consume only the necessary data, and write it back to the Data Warehouse. This was a significant improvement, reducing the time data traveled from the data lake to Data Warehouse. It also allowed data science teams to explore the data directly, reducing their dependence on IT or Data foundation teams.

However, Data lakes come with inherent issues, as files stored in lakes have no ACID properties, making it difficult for teams or applications to consume data directly. To address these issues, the Data Lakehouse/Delta Lake/ Modern Data Warehouse architecture was introduced, making it easier to identify and manage changes in data files. This further improved the efficiency of data management and ensured that teams could consume data without any issues, leading to more accurate and timely insights.

With that said, In the data platform & engineering world, nothing is perfect, and things keep moving forward as our demands increase, In recent times, Data Fabric (coined by Forrester analysts) has been resounding in a big way and many enterprises are pursuing this path, and our company is no different.

you can always google for the definition coined by Gartner, but in simple, provide universal data access to the business/teams who need data to churn out the insights, along with the growing complexity of multiple data storage systems, databases, cloud platforms, and data processing technologies, etc. In a nutshell, data fabric is not any tech stack but rather an architectural approach or framework that aims to provide a unified and integrated view of data across various sources and formats within an organization and this is supported by key components like – Data Integration, Metadata management, data governance, data virtualization etc.

And the last one, the data mesh, has been around last 3 years, and I believe, very few companies have set out to pursue this path, is a decentralized approach for managing Data analytics, is very interesting, and again in a nutshell, a decentralized approach to data architecture and governance that aims to improve scalability and flexibility. It advocates for domain-driven ownership of data, treating data as a product, and enabling self-serve capabilities for domain teams. By distributing data responsibilities and promoting a data-driven culture,

Enabling Self-Serve Analytics: Empowering Business Teams for Data-Driven Insights

Wow, we have reached the halfway point and still haven’t talked about the actual Self-Serve architecture. Well, all the time, we were setting the premise. Now, let’s deep dive, At ServiceNow, we are a big fan of driving data-driven decision, and our Data & Analytics team sits right at the crux, building and managing analytical products for our internal business domain, like Sales, Marketing, Finance, Customer, Partner, Product, People, etc. Our analytics product could be as simple as an operational report, a complex recommendation dashboard, a research & insights paper, or could be data signal to other applications, that help the target business domain to make data-driven decisions. Our land space is powered by an enterprise data platform with rich data signals from heterogeneous systems, curated and ready to consume for insights.

As we grew, more business team members leaned towards data for decisions and wanted to explore enterprise data to run their ad-hoc analysis or insights report, by themselves. As the Enterprise D&A team, we loved this theme, as this reduced business teams’ reliance on us, and they can cater themselves to these ad-hoc demands as we focus on Enterprise analytics initiatives. With that, it became clear to us that enabling Self-Serve Journey for our business teams is the right way to go, thereby promoting data democratization across the enterprise.

Know What you trying to solve.

While we recognize the benefits of Self-Serve, it is essential to understand the challenges we aim to address first, below are the most crucial ones that needs to be addressed.

Facilitate access to a comprehensive and unified dataset for seamless data exploration.
Ensure data is governed.
Provide flexibility for businesses to curate the data i.e., run their own DDL and DML.
Ability to bring own data to Enterprise data platform.
Provide a reliable, scalable, and secured data platform.
Educate the team on processes, data, and data platform tech stack.

Our Approach ( Solution Design)

Now that we know what needs to solve, At ServiceNow, we went with a hybrid Data Architecture approach detail, best of Data Fabric + Data Mesh, which has given us the needed flexibility.

Being a big fan of Data Fabric, a centralized theme, that provides a unified architecture layer for data management and various systems like cloud/muti clouds/on-prem source systems to leverage this data for analytics. As we transition to Data Fabric, various subsystems are being built to make it more relevant like data catalog, Lineage, metadata framework, knowledge graph, MDM Governance, etc.

This centralized approach works very well for us Enterprise Analytics team, as we need that data in our data platform for churning out the insights as dashboard/Research & Insight paper to target business team.

Let’s look at the high-level solution design, where each business domain team will be provisioned with a dedicated workspace to

Access the rich data (raw/enriched)
Curate the data in their workspace (build a table/views/Data pipelines with business rules) and leverage our secured ML workspace for predictions.
Consume the data thru rich BI tools by our power user groups.

let's evaluate this solution design against the principles of Data Mesh to determine what key principles hold good.

Domain Ownership – Domain ownership is a fundamental principle of Data Mesh that emphasizes giving ownership of domain-specific data to the team primarily responsible for producing, consuming, and sharing it. This approach aligns with the above self-serve model, where we empower the business team to access and curate data according to their changing business needs, so our design supports this principle.

Data as a Product – Treating data as a product is an amazing concept within Data Mesh. Instead of considering data as a byproduct of an event/action, it is viewed as an asset. Thereby adhering to product standards like discoverability, addressability, security, trustworthiness, and accessibility. With the above solution design, we advocate a self-serve theme, the business team, with access to rich data, can build analytical assets on their own and this can be shared across. The possibilities are endless, making this principle highly effective. A tick mark to this principle.

Self-Serve Data Platform – In simple, Data platforms serve as common ground for data producers and consumers to interact and exchange data for analytics. In Data Mesh, a decentralized approach is taken, where respective teams own and drive the data platform. With the changing technology landscape, this principle may not align well. Business teams may end up with multiple roles like admins, architects, developers, etc. This would eventually derail the team's focus and end up with more time on managing the platform.

Federated Computation Governance – Data governance is crucial in any organization, and Data Mesh advocates for maintaining a balance between the team that owns the data and the central governance team. It is important to ensure autonomy while adhering to governance standards, enabling teams to work at a quick pace. This approach is supported by our self-serve design, as domain experts have the autonomy to define governance on the data within their workplace, but have to follow enterprise data governance guidelines while sharing the data insights.

In Nutshell

ServiceNow, a growing enterprise, kept data at the center for all the data-driven decisions, Our leaders including CEO, C suite, and Data & Analytics leaders strongly advocate the Self-Serve theme which supports induvial business teams to run on their own. Our Enterprise Self-Serve design combines the best of principles of Data Fabric & Data Mesh, methodologies, and have designed a product that excels in serving the needs of self-serve data management effectively.

Thanks to our team Naveen Sanka - Staff Data Engineer, Sunil Saini – Senior Staff Architect, and Pradeep Reddy Anireddy , Senior Staff, BI Engineer for their contribution.