Customer story
000 min.

Automating Data Quality at Scale: Inside Penguin Random House’s Sifflet Implementation

Penguin Random House needed a smarter way to ensure reliable data across hundreds of imprints and millions of titles. With Sifflet, the team replaced manual checks with real-time monitoring, giving everyone instant visibility into data health. The result: faster insights and stronger trust in every decision.

Industry
Media
Headcount
1 000-10 000
Headquarters
London, UK
Implementation time
3 months
Data stack
Table of content

Penguin Random House (PRH) is the world’s largest trade book publisher, with hundreds of imprints producing thousands of new titles each year and distributing to more than 100 countries. Data fuels every part of that mission. Data aims to empower the organisation to achieve its strategic aims using data where possible to inform and guide better, faster decision making. 

Global leader in trade publishing:

  • 100 + imprints across fiction, non-fiction, and children’s books
  • Tens of thousands of active titles, millions of ISBNs
  • Distribution network spanning 100 + countries

Manual Data Checks Slowed Down Decision-Making

The Data team at PRH relied on mostly manual checks and pipeline monitoring to keep massive dbt-modelled workflows healthy. Those checks flagged freshness, volume, and schema issues. Issues with the data were communicated to Data owners and Stewards solely through manual updates with no way of Stewards having real time updates. 

Automating Data Quality with Sifflet for End-to-End Integrity

The use of Sifflet at PRH so far has demonstrated that PRH can surface faster troubleshooting, and enhance collaboration with business users. Additionally, Sifflet improves decision-making, increase efficiency, and enhance PRH data users experience by providing a clearer view of data health and performance found in Sifflet. Sifflet has given the publisher a platform to demonstrate end to end integrity and put structure around the integral subject of Data Quality.

Faster Troubleshooting and Real-Time Collaboration

  • The PRH Data Team can perform daily checks with a glance at the results in Sifflet, a true efficiency improvement for Engineering
  • Less of the need of direct comms as the stakeholders have visibility over the progress of the incidents within Sifflet through Teams Channels  
  • Opportunity for latency monitoring to ensure PRH is meeting its business SLAs for data availability
  • Building monitors as code in order to build best practice into future Engineering activities 

Delivering Trusted Data for Better Business Decisions

  • Data monitor outputs are surfaced via Teams since the implementation of Sifflet. Highlighted as a success by Senior Data Stewards at a recent Data Steward network event
  • Sifflet allows for a thorough review of the monitoring requirements and recommends changes to monitors through AI observations, this is to be explored in Q4
  • Near-real-time alerts now reach both engineers and business analysts, closing the gap between data creation and decision-making
  • Delivering ‘for the business’ – listening to Data Stewards pain points and understanding what Data quality opportunities there are to support this, e.g. removal of manual checks