Reading time: about 2 minutes.
With STRM, we aim to shift left with data privacy and to make building with sensitive data much easier. Not by classifying after the fact or synthesizing data, but by embedding privacy (policies) inside your data: privacy by design for data.
When we set out to achieve this, one of our first important realizations was we would already need to design and build a set of complex technologies. Coping with the intricacies of different environments (like AWS or Azure in any possible configuration) is additional complexity. As that would only push the “minimal working” threshold further away, we decided to design STRM for portability, but to launch as SaaS only.
And that’s what we did, with our SaaS platform supporting the most critical functions of a privacy by design data platform:
- Data contracts to govern data shape and privacy implications
- A set of drivers and a gateway to receive data and embed the data purpose
- a processing engine to transform and split data according to purpose and consent into privacy streams, and connect back to existing data stacks
- With support for both streaming and batch modes (or stream-in, batch-out as a great middle ground)
the STRM Privacy platform (with example input and output for medical applications)
All quick and easy to setup through console and CLI as a SaaS solution.
Splitting the STRM Data and Control Plane
Being in “privacy” means we deal with sensitive data, often in sensitive domains like health. While a SaaS solution is a great way to leap ahead on the privacy dimension, data is often regarded so sensitive and strategic many prospects prefer (or simply require) to keep customer data inside their own cloud subscriptions.
As we’re learning and deepening our knowledge of customer demands and processes, we could also fill in important conditions to take that next step: running our data plane inside a (for us) foreign cloud/VPC.
Which, surprise surprise, we’ve tested across a bunch of clouds by now and are launching today 🎂 (officially as bèta).
Whole lotta benefits
There are important benefits that come with the self-hosted deployment as compared to SaaS. You get Privacy by Design for Data that is…
- More secure: data does not leave your environment
- Simpler to implement: security and privacy policies do not have to be extended or assessed (e.g. we’re not a data processor for customer data anymore!). Strict security and vendor requirments apply as if it was an internal service (which it effectively is)
- Easier to integrate: existing security policies and configurations apply, the data plane ties directly into existing data storage, or directly reads and writes on existing Kafka topics without extra roundtrips.
- Cheaper to operate: No extra bandwith and ingress/egress costs, the Data Plane runs on existing committed use or discounts and benefits of your current provider apply
- Easier to verify: Ofcourse you can trust us. But it certainly helps we’ve open sourced the Helm chart.
The only drawback: you need a sysadmin, SRE or DevOps team to set it up. So we’re still offering SaaS 😉
How the STRM data plane works
With STRM’s self-hosted option, all components that touch your customers’ data are split from other platform components. Only (meta)data on system health and configurations like data contracts, input streams and privacy streams are retrieved and stored on our control plane (for sign-up your own email is still required and of course needs to be stored).
An overview of the components in the platform and how the STRM control and data plane are split
This means you can run the STRM Privacy platform without data of your customers leaving the environment, as long as you run a cloud that supports
We verified the Data Plane on AWS, GCP, Azure and OVH Cloud (with storage on Clever cloud) already. Technical note: Redis and Kafka (necessary for streaming mode) are packaged, or you can hook them up to existing instances.
In order to run the Data Plane, STRM needs:
- An existing or new
Kubernetescluster with access to the internet to connect to the control plane
Helm(helm.sh) must be installed, and must have access to the cluster
- The bare minimum even runs locally with k3s
- For testing: at least two
k8snodes with ~8gb of memory and two cores are recommended
- For production we recommend at least 8 nodes, 4 cpu cores and 16GiB per node of available memory
Testing specs will take a few 100 requests per second. Of course we scale horizontally and the data plane is autoscalable with any good k8s scaling supervision. Good to know: from our experience, for production settings, we can run in headroom next to existing workloads of committed use (esspecially for batch, as that’s spin-up, spin-down with very little resources idling).
Setup and quickstart
Setup is straightforward if you’re familiar with
- Make sure your STRM subscription is upgraded to self-hosting (please request to if not)
- Retrieve the necessary
Helm chartand/or value from the installations pane
- Submit the
Helm chartto your existing
- Watch the magic: our control plane will instruct and setup everything and one by one all Data Plane containers will turn from red to green, ready to receive and process the data.
A more comprehensive technical explanation is included in the documentation on Customer Cloud Deployments.
You can see it in action in the following video, where Bart walks you through the quickstart:
Like magic 🪄
I believe this is an important milestone for STRM and an impressive achievement by our team, and I’m exited we can deliver even more trust to our customers and prospects with all the benefits of privacy by design for data.
I’d like to end on a small personal reflection: I come from computational linguistics and lead an applied ML group in e-commerce for quite some time, with often amazing outcomes. We never sold STRM as being machine learning or data science, although it’s place in ML and DS stacks is very clear to us. But seeing the
STRM Data Plane being pulled up like magic once the Helm is submitted is the closest thing to “artificial intelligence” I have personally been part of. 😇
Request a demo!
Curious and ready to test or run a self-hosted STRM deployment?
Get in touch to request a demo!
PS We’re hiring!
Want to see your code inside the Data- and Control Planes too? Come build STRM to help data teams deliver data products without sacrificing privacy in the process. We are hiring!