Skip to main content

Rethinking the way we build on the cloud: Part 2: Environments on the Cloud


The newly launched Mingle SaaS offering runs entirely on the AWS cloud. As discussed in our earlier blog on Layering the Cloud, because there is no existing system that we have to modify or integrate with, we've got the freedom to design the architecture from scratch. This has led us to rethink the role of environments in our development and deployment process.

What’s wrong with the traditional approach to environments?

In traditional data-center-based applications there are usually a small, fixed number of environments into which the application is deployed. For example there might be the production environment, a staging environment where candidate builds are deployed before they go into production, a test environment where new work is verified and a development environment where developers can run new code as part of a full system.
The availability and nature of these environments is strongly constrained by the availability of hardware and infrastructural systems for them to run on. We would like all our environments to be as similar to production as possible, but physical hardware and traditional infrastructural systems like databases are expensive and slow to provision.
In practice this means that there is a sliding scale of realism in the environments as you go from development to production. Development environments tend to be smaller than production (load-balanced applications may have only a single server when there are a dozen in production), they often use alternatives for some components (they may have a different database server) and they frequently share components (fileservers, databases, monitoring systems) when these are dedicated in production. Similar differences exist for test and even staging environments; although of course the hope is that they can be more realistic the closer they are to production.
This variation between environments causes several problems. The most obvious problem is that some bugs are found further down the pipeline, rather than being identified by developers as they are working on the code; this increases the cost of fixing the bugs. Another problem is that supporting the variation in environments increases the complexity of the code. And, more subtly, we end up making decisions that don't cause outright bugs but which cause our architectures not to be optimized for the real deployment environment, because developers are divorced from the reality of running the system in that environment.
The inevitable restriction on the number of environments also causes problems. Availability of environments is yet another dependency to be juggled, which can cause delays or influence us to miss out testing that we would like to do. And maintenance of the environments, cleaning up after running stress tests for example, also takes time.

How have we approached environments in the cloud?

We have found that building a system that runs entirely on the cloud has enabled us to reconsider our use of environments and ensure that we don't fall foul of any of these problems.
Stay tuned for our next blog where we discuss the principles we used; such as ad-hoc environments, shared-nothing environments and cookie-cutter environments to optimize our use of environments in the cloud.

Comments

Popular posts from this blog

Python and Parquet Performance

In Pandas, PyArrow, fastparquet, AWS Data Wrangler, PySpark and Dask. This post outlines how to use all common Python libraries to read and write Parquet format while taking advantage of  columnar storage ,  columnar compression  and  data partitioning . Used together, these three optimizations can dramatically accelerate I/O for your Python applications compared to CSV, JSON, HDF or other row-based formats. Parquet makes applications possible that are simply impossible using a text format like JSON or CSV. Introduction I have recently gotten more familiar with how to work with  Parquet  datasets across the six major tools used to read and write from Parquet in the Python ecosystem:  Pandas ,  PyArrow ,  fastparquet ,  AWS Data Wrangler ,  PySpark  and  Dask . My work of late in algorithmic trading involves switching between these tools a lot and as I said I often mix up the APIs. I use Pandas and PyArrow for in-RAM comput...

Build Data Platform

I'd appreciate your likes and comments). Additionally, it will result in my lengthiest blog post to date. However, regardless of the length, it's a significant achievement that I'm eager to share with you. I'll refrain from delving into unnecessary details and get straight to the main points to prevent this post from turning into a 100-minute read :). As always, I'll strive to simplify everything to ensure even those who aren't tech-savvy can easily follow along. Why? Everything has a why, this project too. (DevOps for data engineering) and I needed to apply them in an end-to-end project. Of course, this project is not the best one out there, but it helps me to quickly iterate and make errors. (And it reflects the reality of Modern Data Engineering, with beautiful tool icons everywhere). End Goal The end goal of this project is to have a fully functional data platform/pipeline, that will refresh our analytics tables/dashboards daily. The whole infrastructu...

Kubernetes Configuration Provider to load data from Secrets and Config Maps

Using Kubernetes Configuration Provider to load data from Secrets and Config Maps When running Apache Kafka on Kubernetes, you will sooner or later probably need to use Config Maps or Secrets. Either to store something in them, or load them into your Kafka configuration. That is true regardless of whether you use Strimzi to manage your Apache Kafka cluster or something else. Kubernetes has its own way of using Secrets and Config Maps from Pods. But they might not be always sufficient. That is why in Strimzi, we created Kubernetes Configuration Provider for Apache Kafka which we will introduce in this blog post. Usually, when you need to use data from a Config Map or Secret in your Pod, you will either mount it as volume or map it to an environment variable. Both methods are configured in the spec section or the Pod resource or in the spec.template.spec section when using higher level resources such as Deployments or StatefulSets. When mounted as a volume, the contents of the Secr...