Iceberg, Governance, and the Trust Layer: Tristan Handy and Salma Bakouk at Signals25

If you missed it live, the rewind is available here. What follows is our take on the conversation, with the moments that stuck.

Apache Iceberg: the promise, and the fine print

Salma opened by asking Tristan to reintroduce Apache Iceberg to the audience, given that the topic has gone from niche to inescapable in about two years. His framing was clean: instead of loading your data into a proprietary database and paying a vendor forever for the privilege of accessing it, you store data in an open table format and let any compliant compute engine read it. True data portability, in theory.

What gave Iceberg its recent momentum was a moment Tristan described as unexpectedly decisive:

"Both within a week of each other, both CEOs were on stage saying we're all in on Iceberg and may the best engine win. I don't know what made that happen, but I think it is one of the most foundational things to happen in the data processing world in many decades."

That convergence from Snowflake and Databricks did not happen by accident. Tristan's read: the largest enterprise customers demanded it. The vendors followed.

Salma pushed back immediately, and it was the right push. She raised Hadoop: same promise, same language around openness and interoperability, same energy. We all know how that ended. Is Iceberg different, or just better-funded?

Tristan's answer was that the cloud changes the calculus entirely. Hadoop was complex to own and operate. Running it yourself required serious infrastructure and serious people. Iceberg lives on S3 or GCS, catalog management is handled by a vendor or managed service, and the easy buttons exist now in a way they simply did not in 2008.

That said, Iceberg is not a free ride from a reliability standpoint. When metadata becomes the source of truth, traditional monitoring misses the gaps entirely. Sifflet has written in detail about what it actually takes to close the Iceberg observability gap, and why active metadata observability becomes the architectural requirement that holds an open data stack together.

The vendor velocity trap: you're still locked in, just differently

This is where the conversation got genuinely sharp. Salma named the elephant in the room: every vendor now has an Iceberg story, but they are not moving at the same speed. Snowflake has invested aggressively. BigQuery, at the time of the session, was still largely read-only on external Iceberg tables. The gap in feature completeness means that in practice, customers end up defaulting to whoever moved fastest because that is where their use cases are actually supported.

The irony is real. Iceberg was supposed to liberate enterprises from vendor lock-in. Instead, as Salma put it, customers risk a new kind of trap:

"You are still gonna be locked in to the vendor that moves fastest. I call it future velocity lock-in, for lack of better words."

Tristan's take was pragmatic. What actually matters, he argued, is reads, writes, and efficiency. Those are the core capabilities that determine whether Iceberg is genuinely working for a given organization. Proprietary extensions and feature velocity races are noise if the fundamentals are solid. And usage data from the dbt community showed healthy, exponential adoption in practice. The curve is real, even if the promises need calibrating.

Governance: the gap nobody wants to own

Both Salma and Tristan circled back to governance multiple times, and each time they landed in the same place: the technical governance that Iceberg enables is not the same as the semantic governance enterprises actually need.

Time travel, versioning, schema tracking: Iceberg handles those well. But Salma pushed further:

"We need semantic governance and we need to bring in more business context into how we look at governance and observability broadly speaking. I think today it's still not tackled properly by the different vendors that have been active in the Iceberg movement."

Tristan agreed, and framed it structurally. There are at least two distinct layers: the technical catalog at the storage and file level, and a higher-order control plane where semantics, permissions, and business context live. The first is largely solved. The second is still wide open.

This is precisely the gap Sifflet has been talking about for years. A broken table is only a problem once you know which KPI it affects, which team is consuming it, and what the downstream business cost actually looks like. That kind of context does not emerge from the Iceberg spec. It has to be built deliberately into the observability layer. Our complete guide to data observability goes deep on why the technical pillars alone are never enough without business context sitting on top.

The multi-persona problem: why trust gets siloed

One of the most honest exchanges in the session happened when Tristan reflected on what nobody tells founders building in the data space:

"One of the hard things about building in data is that you have to build for multiple humans. There are so many different humans that need to be in the loop of a working end-to-end data system."

He mapped it out: platform engineers keeping the catalog running, data engineers building pipelines, analytics engineers adding business context, analysts querying, executives consuming. Each of them needs to trust data differently. Each speaks a different language.

In theory, a thumbs-up from a dashboard consumer should propagate as a trust signal back through the pipeline. In practice, because every tool in the stack is optimized for its own primary persona, those signals rarely cross boundaries. Trust ends up siloed by role rather than shared across the organization.

Salma put it directly: making data quality everyone's business is not a tagline. It is an architectural and organizational challenge with no clean solution yet. And it is one of the reasons data observability tools that serve only engineers consistently underdeliver on their broader promise.

The control plane: where it all has to land

As the session moved toward closing, Tristan articulated what amounts to the strategic thesis of the whole conversation. In a world where compute is heterogeneous and data lives in open formats across multiple engines, something has to provide centralization. His position:

"The control plane layer, orchestration, observability, metadata, all the roads lead there. And I think that layer is stronger when it has the ability to actually do things as opposed to just watching."

Salma agreed immediately. And then said something that took some courage:

"I do realize that collectively as a data observability category we are falling short of the promises we made to our customers. It's no longer about just telling you what's wrong. In today's world, that is useless information. That is harmful information."

It is a hard thing to say on a public stage about your own category. But it is also exactly right. The standard has shifted. In a world where AI-native tools cut task time by 80%, a product that surfaces alerts and stops there will not survive. People expect tools to act, not just observe.

That is the direction Sifflet is building with AI Agents for Data Observability: not just detection, but reasoning, triage, and guided resolution. And it is the same logic behind business-aware observability more broadly: connecting incidents to the outcomes that actually matter, so teams know what to fix first and why.

Iceberg and AI: more connected than they look

One of the underrated moments in the session was when Salma asked Tristan directly about the relationship between Iceberg adoption and AI readiness. His answer reframed the question:

"AI and Iceberg are two different threads, but they can add fuel to each other's fire. When you start thinking about your data infrastructure feeding not just analytics use cases but AI use cases that are literally going to determine your success or failure as a company over the next decade, then all of a sudden the carrot gets a lot bigger."

The implication is significant. Organizations that have been slow to migrate to open table formats now have a more urgent reason to move: proprietary data locked in vendor formats is a liability when you are trying to build AI systems that depend on that data being accessible, governable, and auditable. Iceberg, done right, makes AI more auditable because you can snapshot every version of every dataset that fed a model. That is not just a data engineering win. It is a governance and compliance win.

Where things stand

Tristan closed with something that felt genuinely optimistic rather than performed:

"There was this explosive period in data from 2020 to 2022. And then there was this period of retrenchment. But I think there's more positive movement in the space than there has been in several years, and I'm excited about the next few."

Salma called the earlier period "the malaise." As a French person, she added, she appreciated the precision of the word. And as someone building in this space every day, she clearly felt that the malaise is lifting.

If you want to go deeper on what a reliable, AI-ready data stack actually requires, the Data Observability Buyer's Guide is the right starting point. And if you missed Signals25, the full rewind is here.

Be Kind, Rewind, Signals25, how AI, Open Formats, and Metadata are Reshaping Enterprise Data

Discover more ressources

Be Kind, Rewind: Signals25 Opening Keynote on Data Reliability in the Age of AI

10 Top Data Observability Platforms

Data Contracts Don't Work. Here's How We Fix That.