Secure by design: How Sifflet keeps your data safe

When considering granting a service such as Sifflet access to critical company data, data practitioners always have the same question: what is the risk of my data being accessed by unauthorized parties? And even if the service seems safe, will the compliance or security teams be okay with installing this service?

If you're considering adopting Sifflet’s data observability platform and want to know how we keep your data secure, this article is for you!

‍

ISO 27001 and SOC 2 compliance

Let's start with the fundamentals: Sifflet has earned ISO/IEC 27001 certification and SOC 2 attestation of compliance from an independent auditor.

Meeting the compliance requirements of these standards is only the first step. A well-known limitation of such audits is that they evaluate a company's security practices, not the security of the written and hosted software. So let's delve into the technology and how Sifflet keeps your data secure.

‍

How Sifflet protects your data

Sifflet is specifically designed with a few overreaching security principles:

Least privilege: Sifflet cannot alter your data and operates with as few permissions as possible.
No storage: Sifflet does not keep a copy of your data.
Single tenancy: Sifflet enforces strong isolation between customers.

Let's explore each one of these principles in detail.

‍

Least privilege

Sifflet cannot alter your data in any way.

Sifflet needs to be able to fetch information from your data sources. A data source is any system monitored by Sifflet, be it a database (PostgreSQL, MySQL, Oracle, etc.), a data warehouse (Snowflake, Redshift, etc.), a pipeline (Airflow), a business intelligence tool (PowerBI, Tableau, etc.), or any of the other integrations supported by Sifflet.

Our documentation carefully lists the minimal set of privileges Sifflet needs to be granted on each data source. In any case, Sifflet only needs read-only access. It's very strongly discouraged to grant Sifflet more permissions than what is described in the documentation. Of course, you should refrain from using the account used by Sifflet in any other system to keep an audit trail of what Sifflet does.

In some cases, Sifflet doesn't require permissions to your data systems. For example, you can directly push dbt run results and Airflow job results to Sifflet.

‍

No storage

Sifflet does not keep a copy of your data, not even single rows or columns.

When the Sifflet application displays previews of a data source, the data is discarded from our systems as soon as it's displayed. In other words: should an attacker somehow be able to read from the Sifflet internal database, he wouldn't gain access to your data (in this scenario, an attacker wouldn't even get access to your database credentials, read more below). This also means that the few Sifflet support engineers with privileged access to production databases can't see customer data either, making GDPR compliance easier for customers operating in the European Union.

If you ever choose to stop working with Sifflet, you can revoke the credentials used by Sifflet, and your Sifflet instance will immediately terminate access to your data.

Let's dive into an example of Sifflet's no-storage security principle. Sifflet can display rows that fail a data quality rule. To implement this, Sifflet doesn't store the actual failing row. Sifflet stores a reference to the data - the data source in which the row is located, the table, and the position. The row is then fetched from this reference only when displayed and is never kept in any Sifflet system.

Additionally, Sifflet doesn't store any credentials required to connect to data sources in the central database. A separate secret manager is used instead. The secret manager enforces a robust audit trail and tight access control.

The only thing that Sifflet stores are the metadata related to your data sources, such as table and column names.

‍

Single tenancy

Sifflet customers are isolated from one another.

This is one of the unique aspects of Sifflet's security model. Sifflet supports several isolation models between customers. By default, Sifflet uses what we call a "cell architecture". This topic deserves its own blog post, but here we’ll focus on a high-level overview.

Sifflet groups customers into "cells". A cell is a set of infrastructure components (such as a network or virtual machines) used to host the Sifflet application for a subset of Sifflet customers. Cells are isolated no matter how the network permissions are configured, a cell can’t connect to another cell.

Even inside a cell, Sifflet enforces strong isolation between customers. Sifflet deploys one application instance and one database per customer. In other words, no single “Sifflet application” is used by all customers.

Additionally, Sifflet further isolates applications in the same cell using a variety of technical solutions. For example, network micro-segmentation is enforced using AWS security groups.

Sifflet's sophisticated infrastructure brings our customers several advantages:

A Sifflet customer cannot access data belonging to another customer. Therefore, even in the worst-case scenario, data cannot leak from one customer to another.
Similarly, if attackers gain access to a Sifflet application, they could not access any other Sifflet application instances.
Finally, a customer who makes heavy usage of Sifflet does not impact the performance of the Sifflet application for any other customers.

‍

The Sifflet application is simple to manage, making it easy to deploy to your infrastructure. This topic brings us to the last section on deployment models.

‍

Deployment models: we run it, or you run it

So far, this article assumes that Sifflet manages your application - like a typical SaaS (Software-as-a-Service) application. In most cases, Sifflet can provide your security team with the documentation they need to approve the SaaS version of Sifflet.

There are, however, other ways we make Sifflet available for your data observability. For example, organizations with stricter compliance and security requirements can self-host Sifflet.

SaaS deployment

If you choose the default SaaS deployment, Sifflet manages the application for you. Upgrades (including security updates), backups, monitoring, capacity planning, and everything that is required to use the Sifflet application are handled by us.

Note: With this model, you can still choose in which region of the world your Sifflet instance is deployed: this can help with regulatory requirements. European customers generally want their Sifflet instances in the European Union (which makes GPDR compliance easier), while North American customers prefer a deployment in the United States.

Self-managed deployment

You can also deploy the Sifflet platform inside your production environment. With this solution, Sifflet runs entirely inside your network. You can deploy the application to any cloud provider and in any region of the world.

As mentioned above, this option meets the needs of organizations with strict security, regulatory and compliance requirements. If you opt for the self-managed deployment, you can apply any additional security requirements, such as scanning the provided containers for vulnerabilities (Sifflet already does this).

You can deploy Sifflet to a private corporate network. Sifflet doesn't require you to expose any port to the Internet. No data ever leaves your environment. You can optionally send application logs to Sifflet so we can help you operate the solution. These logs don't contain any data coming from your databases.

‍

Conclusion

Sifflet is designed with three overarching security principles: least privilege, no storage, and single tenancy. Together, these principles ensure that your data is secure. Additionally, your team can adopt a SaaS or self-managed deployment, depending on your organization's security needs. We'd love to chat if you want to learn more about Sifflet's security approach - contact@siffletdata.com.

‍