Data is the lifeblood of businesses today. It’s used to make informed decisions on everything from what products to work on to where to allocate resources. However, if that data is inaccurate or low quality, it can lead to disastrous consequences for companies. In this blog post, we’ll explore how poor-quality data can impact businesses and some steps you can take to mitigate those risks.
Let’s start from the beginning.
What does being a data-driven business mean?
Data-driven businesses are those that base their decisions on data and analytics. When you use data to guide your business decisions, you are using observable facts rather than your gut feeling. Sounds easy, right? Far from it. EY recently conducted research on how companies are using data across functional areas. The results showed that while 81% of organizations think data should be at the core of business decisions, they are still approaching it in a non-optimized way — therefore limiting the business value of the gathered information. In fact, the same report showed that only 31% of the organizations included in the research significantly restructured their operations to accommodate the new needs that large amounts of data bring.
Adopting a data-driven approach can be beneficial for organizations in multiple ways. Some of the benefits of being a data-driven organization are:
Outperforming competition: data-driven companies can make predictions, earning insights on what might happen in the next months before it actually happens. This information can provide businesses with a great headstart over the competition. For instance, by understanding what customers want and need, companies can provide them with solutions that do not yet exist in the market.
Increasing customer retention: data allows companies to identify both happy and unhappy customers. This is extremely powerful. On the one hand, this information allows businesses to understand what makes customers happy and use it to further improve their offerings. On the other hand, data enables companies to identify unhappy customers before they leave. This allows them to provide remedies before it’s too late.
Overall, evidence-based decisions can make companies more confident in their choices and lead them to create better products and services. So, if data has become one of the most powerful tools companies can use, what is stopping them from fully adopting a data-driven approach?
Why is it so difficult to become a data-driven organization?
There are many challenges that companies face when trying to adopt a data-driven approach. They can depend on company size, data governance, culture, and data literacy among employees. Let’s go through these challenges in detail.
Complexity & scale: the complexity and scale of many enterprises today create a major challenge when it comes to adopting a data-driven approach. When the data is available to the business, the second step is to make sense of it all. The incredibly large volume of data today can make this task difficult and overwhelming.
Data governance: when data governance is absent or poor, data is lost. Data governance helps to ensure that data is usable, accessible, and protected. Appropriate data governance leads to better data analytics and, consequently, to better decision-making.
Culture and executive disinterest: when changing established data behaviors, organizations may face cultural challenges. Driving a small business forward with data is far easier than embedding a data culture in larger organizations. In addition to this, if the executive team has a bad habit of ignoring or distrusting provided information, successfully leveraging the potential of data can become extremely difficult.
Data literacy: data literacy can be defined as the ability to work with, analyze and communicate with data. Decision-makers must acquire these skills before they can successfully leverage the data they are provided with.
Data quality: 50% of the companies included in a study conducted by EY cite poor data quality as a major obstacle to delivering actionable insight. One of the main reasons why organizations struggle with data quality is because there is no ownership for quality, and enterprises do not treat data as a business asset equal to others.
Let’s dive deeper into the issue of data quality.
How is poor data quality impacting your business?
The specific cost of data quality issues varies from organization to organization. But according to a 2021 Gartner report, bad data costs businesses around 13 million dollars annually on average. In other words, every year, companies waste valuable resources because of data quality problems. This is a huge amount of money that could be used to improve the business in other ways rather than dealing with data quality issues.
What is data quality? And why is it so important?
Data quality is a measure of how accurate and consistent your data is. Data quality issues can occur at any stage of the data pipeline, from ingestion to BI tools.
As data consumption increases within organizations, it becomes more and more crucial to trust the data to be reliable. Unreliable data can quickly become detrimental to the business. Missed opportunities, financial costs, customer dissatisfaction, failure to achieve regulatory compliance, inaccurate decision-making, etc., can all impact data quality. There are many ways to assess and improve data quality, but it ultimately comes down to ensuring that your data is clean, complete, and consistent. There are a few metrics you can use to understand whether you can trust your data:
Accuracy: is the data correct, duplicate-free, and in the expected format?
Completeness: are there missing values, missing data records, or incomplete pipelines?
Freshness/Timeliness: is the data up-to-date?
Relevance: is the data intended for business use?
Consistency: is the data homogenous across the organization?
Failing to assess and improve data quality can have many negative effects on the business. Some examples are:
Faulty decision-making: The primary role of business data is to enable better decision-making. Utilizing inaccurate or poor data leads business leaders to come up with faulty conclusions and, therefore, make bad decisions.
Increased costs: Inaccurate decision-making derived from bad data can cause a wide variety of mistakes that, in turn, can lead to increased costs.
Damaged reputation: Decisions taken based on faulty data can lead to problems like reduced productivity, compliance issues, and customer support — directly affecting customer satisfaction and customers’ perception of your products and services.
Teams losing data trust: Having inconsistent data leads business leaders and practitioners to stop trusting the data they have to make the right decisions for the organization.
Beyond data quality issues: Data entropy
At Sifflet, we have come up with a concept called Data Entropy, which symbolizes all the chaos and the disorder that many data practitioners have to deal with — especially with the growing complexity of the data platforms and the growing expectations from the business in terms of data and data infrastructure.
How does Data Entropy manifest itself?
Entropy in data can manifest itself in different ways:
Entropy can mean that within an organization, data consumers cannot find their data assets or have discoverability and traceability issues.
Another way in which data entropy manifests itself is when data engineers spend half of their time troubleshooting data issues rather than focusing on value-creating initiatives.
Entropy can also mean that many processes that rely on data are completely broken because the data quality is poor and hence trust in data is low among data practitioners.
And finally, and most importantly, data entropy means that an organization is in a state where it wants to use data and keeps on making investments in data platforms, infrastructure, tooling, and people, but yet struggles to become data-driven.
To sum up, Data Entropy is caused by a mix of technology and people/culture in organizations.
So, is data entropy inevitable in a data platform?
The short answer to this question is yes. In the past few years, the data ecosystem has undergone the revolution brought by the adoption of the modern data stack — which can be defined as a collection of different technologies used to transform raw data into actionable business insights (e.g., data warehouse, ELT, transformation, BI, reverse ETL). The complexity of the modern data stack necessarily allows for data entropy to increase.
The modern data stack gives more flexibility to data practitioners to do more with data, but to do more with data, you need to control or try to reduce the entropy surrounding the workflows around the data. There are two main ways to do that:
People. Start by making sure that everyone is aligned in terms of objectives.
Best practices and tools. Adopting best practices and the tooling that are becoming more and more available to navigate the complexity of data platforms. Data observability is a good example of a technology that can help reduce data entropy.
What are the types of data errors and issues that can be prevented?
There are plenty of data errors and issues that can be prevented. But rather than the errors themselves, what is really important is the stage at which you catch them from propagating downstream and causing negative outcomes for the business and data team. The stage at which you catch these anomalies enormously affects the outcome that they can have on the rest of the organization.
In an ideal world, everything would be preventable. In this ideal world, you would have a 360 view of all of your data assets, you would know who is using what data, who changed what at all times, etc. But the reality is that, even in the most modern organizations, data platforms have become too complex to obtain this kind of overview.
But what are the ways in which you can start preventing data issues?
The most basic way to go about this is to implement manual checks to get ahead of data incidents. You can start by implementing testing at the orchestration layer, you can check the ingestion patterns, you can look at the schemas, etc. Obviously, the earlier the checks are implemented in the data lifecycle, the better because it enables you to catch the problem at the source and avoid further propagation. Unfortunately, this is not always straightforward. And on top of that, in most cases, it is not enough. In the current environment where data consumers want more and more control over their data assets, catching problems at the sources is only half of the job. You still have to follow the whole workflow of the data and how the issue is likely to propagate downstream.
Who is responsible for identifying when problems (might) occur? Who should be responsible for data quality?
It’s important to start by saying that there is no single right approach to tackle this. Different practices work for different organizations. For instance, at GoCardless, the data team implemented the concept of Data Contracts — which is an example of data quality checks implemented early on in the data lifecycle.
There are also other examples in which companies adopt a fully decentralized approach, implementing concepts like the Data Mesh. And here, ensuring data quality becomes the responsibility of the data consumer.
As previously mentioned, there is no right or wrong approach. The best practice to adopt depends on the organization’s resources, how the team is set up, and on the ratio of data engineers versus data consumers. But as a best practice, it is important to keep in mind that the best data quality programs are the ones that are adopted by every data practitioner — from data engineers to data producers.
So, what are some practical steps that organizations can take to reduce data entropy?
In the current economic environment, where companies are downsizing in one way or another while facing a lot of crucial decisions, it becomes essential to remove everything that is not backed by reliable data.
On top of that, as expectations around data, data platforms, and data teams increase, companies show less and less tolerance toward data incidents. Therefore, data entropy — the uncertainty and disorder within the data team — needs to be reduced to optimize the full potential of what data can help achieve for the business while fostering and nurturing a data-driven culture.
These challenges can be very overwhelming for businesses. However, there are some actions every enterprise can take to successfully start embedding data in every business decision:
Plan carefully and set appropriate goals.
Bring decision-makers on board and evangelize data quality within your teams. Data quality needs to be cultural and respected “religiously”.
Improve organizations’ internal understanding of data and have a clear definition of business needs. In order to make good decisions, businesses need to have a clear understanding of what data they have and how it can be used. Too often, data is siloed within departments or even individual business units, which makes it difficult to get a holistic view of the organization’s data landscape.
Don’t lose momentum. Bad data is expensive, and the cost of poor data quality is only increasing. Use this sense of urgency to bring decision-makers and teams on board.
Invest in data literacy. Data literacy is the ability to source, interpret and communicate data. To put it simply, data literacy enables decision-makers to know how they can translate data into real business value.
Invest in the right tools to monitor data quality at scale. Data observability tools enable organizations to automatically monitor their data across critical ecosystem features, allowing data teams to identify and troubleshoot data quality issues and prevent them from breaking your analytics dashboard. Adopting a data observability tool is one of the best ways to unlock the unlimited possibilities of data-driven decision-making without constantly worrying about the reliability of the data.
Although data entropy/data quality issues are inevitable in modern data ecosystems, there are ways to ensure that bad-quality data does not impact your business. Starting with a data governance framework, setting appropriate goals, and investing in data literacy are necessary steps for any organization that wants to become fully data-driven. Ultimately, the key to success lies in making sure that good data quality practices are adopted and followed by all the members of the team — from data engineers to decision-makers.