Skip to main content

Cloud Confidentiality

Today, no CSP can guarantee that your data will be secured “For Your Eyes Only.” Encryption algorithms and compliance policies can only achieve so much.

From the CSP perspective, we have to take reasonable measures to ensure customer data is not used in any way by the provider that is unintended by the customer.As a way to mitigate exposure of customer data, some CSPs encrypt data at rest using encrypted hard drives or encrypted file systems. The other part of the equation for risk mitigation is proper device destruction either logically, using an appropriate method of sanitization likeDoD 5220.22-M or physically DSS Clearing and Sanitization Matrix and Special Publication 800-88: Guidelines for Media Sanitization.

And then, there are the backups. CSPs manage several copies of data to prevent total failure in both onsite and offsite facilities, and more than likely, the data stored on tape or other hdds is encrypted.

Once you have encrypted drives and encrypted backups, you have to deal with a little problem called key management. This turns out to be a problem because some CSPs use only a handful (if that many) of encryption keys that are known to a select few of the system administrators in charge of ops and compliance. That begs the question: If a system administrator leaves (on their own free will or not), are the keys changed? More than likely, yes. But what about the backups? How long is the time window during which the now departed administrator can access customer data? I usually hear crickets when I ask this question of CSPs - even those with plenty of acronyms and certifications behind their name.

From the customer perspective, I don't want any of my data readable by anyone except me. If my CSP encrypts my data when it's at rest, I want to make sure no one can access it in other ways, such as from a backup my CSP performs regularly. What this means to me is, before data leaves my control and into my CSP, all data must be encrypted using standard encryption algorithms and key(s) that I manage.

For eg:- S3 is a HDFS-like entity made up of NameNodes and DataNodes.

Currently, Amazon offers Service Side Encryption (SSE) for S3. When data is put into S3, a process blocks your data (into X MB chunks) where each block of data is encrypted by a key chosen from this process. Every file has a separate key and after T amount of time, the file is re-encrypted with a new key.

Except that we don’t know where the keys are stored. Are there backups of the keys for disaster recovery and high availability purposes (think of HDFS NameNodes)? We are back in the same conundrum where we started with encrypted backups.


What Can Be Done Today


Two options based on their tolerance for risk:
  • Leave data unencrypted at rest and trust the CSP
  • Encrypt all data before it is sent to the CSP
In this post, I wanted to bring out the issues with confidential customer data in the cloud, as this is not a solved problem. Encryption is only meant to provide some assurance that no one other than key holders can view the data.

Comments

Popular posts from this blog

Python and Parquet Performance

In Pandas, PyArrow, fastparquet, AWS Data Wrangler, PySpark and Dask. This post outlines how to use all common Python libraries to read and write Parquet format while taking advantage of  columnar storage ,  columnar compression  and  data partitioning . Used together, these three optimizations can dramatically accelerate I/O for your Python applications compared to CSV, JSON, HDF or other row-based formats. Parquet makes applications possible that are simply impossible using a text format like JSON or CSV. Introduction I have recently gotten more familiar with how to work with  Parquet  datasets across the six major tools used to read and write from Parquet in the Python ecosystem:  Pandas ,  PyArrow ,  fastparquet ,  AWS Data Wrangler ,  PySpark  and  Dask . My work of late in algorithmic trading involves switching between these tools a lot and as I said I often mix up the APIs. I use Pandas and PyArrow for in-RAM comput...

Build Data Platform

I'd appreciate your likes and comments). Additionally, it will result in my lengthiest blog post to date. However, regardless of the length, it's a significant achievement that I'm eager to share with you. I'll refrain from delving into unnecessary details and get straight to the main points to prevent this post from turning into a 100-minute read :). As always, I'll strive to simplify everything to ensure even those who aren't tech-savvy can easily follow along. Why? Everything has a why, this project too. (DevOps for data engineering) and I needed to apply them in an end-to-end project. Of course, this project is not the best one out there, but it helps me to quickly iterate and make errors. (And it reflects the reality of Modern Data Engineering, with beautiful tool icons everywhere). End Goal The end goal of this project is to have a fully functional data platform/pipeline, that will refresh our analytics tables/dashboards daily. The whole infrastructu...

Kubernetes Configuration Provider to load data from Secrets and Config Maps

Using Kubernetes Configuration Provider to load data from Secrets and Config Maps When running Apache Kafka on Kubernetes, you will sooner or later probably need to use Config Maps or Secrets. Either to store something in them, or load them into your Kafka configuration. That is true regardless of whether you use Strimzi to manage your Apache Kafka cluster or something else. Kubernetes has its own way of using Secrets and Config Maps from Pods. But they might not be always sufficient. That is why in Strimzi, we created Kubernetes Configuration Provider for Apache Kafka which we will introduce in this blog post. Usually, when you need to use data from a Config Map or Secret in your Pod, you will either mount it as volume or map it to an environment variable. Both methods are configured in the spec section or the Pod resource or in the spec.template.spec section when using higher level resources such as Deployments or StatefulSets. When mounted as a volume, the contents of the Secr...