This blog post showcases key trends we think will shape the way organizations engage with data in 2024.
AI Is Here to Stay and Reshape the Industry
In 2023, AI took center stage with the advent of Generative AI and Large Language Models (LLMs). As we head into 2024, the momentum in AI shows no signs of waning.
New applications are expected to keep on transforming both personal and corporate life, reshaping jobs as we know them. As AI applications transition from experimentations to actual implementations in production environments, a shift in technical positions’ skillset is bound to happen. The successful deployment of AI applications indeed requires a hybrid set of skills and this is actually something we experienced very directly a few times over the past year as we worked on shipping our own AI features to production. You either need to have someone who knows enough about: machine learning, data engineering, software engineering, and infrastructure management or you need to put experts of these different fields in the same room to get things done. Our prediction on that one is that traditional roles will consequently most likely undergo transformations, and we might see new positions emerge, specifically tailored to bridge these interdisciplinary gaps.
Additionally, while more technologies and actors may enter the scene, we also expect the AI landscape to clarify in 2024, with clear winners emerging.
The Rise of Unstructured Data
Entering 2024, we anticipate a surge in unstructured data. This type of data does not have a predefined data model or organization and is commonly found in the form of text, images, audio, and video.
Why? Mostly because of the foreseen extensive adoption of AI use cases where unstructured data is a valuable and abundant data source. The evolving nature of digital communication with the continuous growth of online platforms and messaging apps, the proliferation of multimedia content, and the steady expansion of IoT devices will also more than likely contribute to the exponential growth of unstructured data.
This rise in unstructured data usage will keep fostering the development of specialized technologies handling the inherent complexity associated with this form of data. Vector databases notably will most likely keep gaining traction.
More generally though, we expect tools designed to operate at every stage of the lifecycle of unstructured data to keep popping up, mirroring the comprehensive suite of solutions available today for structured data within the Modern Data Stack.
More Data Streaming and Instant Value
The rise of AI unlocks a significant potential in real-time applications, offering instant actionability of intelligence at scale. Consider how real-time and AI-based products can assist in crowd management within high-density areas like airports or stadiums, facilitate instantaneous fraud detection in financial services, or enhance the precision of recommendations in online purchases. These are just a few of the use cases we got excited about while discussing 2024 trends but the common denominator here is: these applications require real-time access to high volumes of data and events. Because we think these types of applications will pick up in 2024, we also think that the adoption of data streaming technologies within organizations will accelerate as a consequence.
On a different note, if these streaming technologies manage to meet organizations requirements more cost-effectively, we feel like there might also be an opportunity to extend the integration of real-time values into conventional data products such as Business Intelligence (BI).
Surge of the Semantic Layer
The adoption of the semantic layer fell a bit short of expectations in 2023, leaving the data community curious about its trajectory for the upcoming year. Once more though, it wouldn’t be such a surprise to us if the expansion of AI use cases ended up acting as a catalyst, triggering a surge in the implementation of the semantic layer. There's indeed a growing consensus that semantic layers could be a missing link, particularly in scenarios where raw data proves insufficient for robust AI insights. Semantic layers could provide a contextual layer to data, adding depth, meaning, and relationships that can significantly enhance the capabilities of AI algorithms.
Data Privacy Still Top-Of-Mind Amid the Evolving Regulatory Landscape
In 2024, we expect the democratization of data, driven by self-service capabilities, and the emergence of new data use cases to keep increasing the complexity and opacity in the way personal information circulates within organizations. Organizations will need to establish strict policies about personal information access and use amid a regulatory landscape expected to evolve with a particular focus on governing innovative technologies. Artificial intelligence, positioned at the heart of these concerns, will likely be a central focus in emerging data regulations, reflecting the imperative to balance technological advancements with privacy, safety, and security considerations.
Industry Leaders Consolidating Their Offering Towards E2E Data Solutions
Similar to the movement we have been witnessing for several years in the infrastructure and software industry, where cloud providers progressively enlarge their offering to make it more and more comprehensive, cloud database management industry leaders will keep on consolidating their capabilities towards more end-to-end data solutions. We consequently expect more strategic partnerships and acquisitions for the coming year, for sure within the data ecosystem, but also potentially from adjacent categories such as cloud monitoring players or cloud providers themselves.
Data Products: Data Mesh Big Winners
Although the community seems to be growing more and more skeptical about the actual implementation of data mesh, what will not go away in 2024 is the need that triggered this paradigm: teams need autonomy and agility in their data initiatives. 2024 will tell us how this need might actually get solved, if not through data mesh. Some predict that a hybrid approach where some functions would be centralized and others decentralized might be the key (see Unveiling the Crystal Ball: 2024 Data and AI Trends by Sanjeev Mohan) and we do not disagree.
If there have been challenges in the data mesh concept implementation, we have however been seeing growing adoption of data products among our prospects and customers. In 2024, we believe teams will keep focusing on creating tangible value from data by developing “data products” and considering them as strategic assets that require reliability, accountability, and support.
We consequently also expect organizations to continue seeking new technologies supporting them to integrate these paradigms into their operations fully and efficiently.
Catalogs of Catalogs for E2E Data Enrichment and Discovery
In 2024, despite data warehouses’ built-in catalogs like Snowsight for Snowflake or Unity Catalog for Databricks, we expect the demand for standalone data catalogs to persist. We believe that the key lies in the ability of these independent catalogs to comprehensively cover the entire data lifecycle, consolidating existing assets into a unified and defragmented manner. These "catalogs of catalogs" indeed offer a panoramic view of data, tracking its journey from source through ETL processes, storage in the data warehouse, transformations, and culminating in final visualizations on BI tools or in AI products. We are convinced (although we might admittedly be slightly biased on that one) that this end-to-end approach will remain valuable to data stakeholders in the upcoming year, not only to help them navigate and understand their data assets but also to enhance data governance by illuminating dependencies and silos within a complexifying data ecosystem.
Data Observability Need Intensifying
Last but not least, and we might be, again, biased on that one, but we strongly believe that in 2024, the need for centralized data observability will intensify as organizations seek to mitigate data chaos and enhance the reliability of their various data products.
We think that data observability will extend beyond traditional data products, like business intelligence dashboards, to encompass the complexities of new AI-based products. The significance of data observability will go beyond reliability alone; it will start assuming a crucial role in optimizing the efficiency, privacy, and security of data platforms. This includes the observability of pipeline resources and costs and the monitoring of sensitive data.