Persistent test environments present a bottleneck for teams that can be overcome with on-demand Preview Environments.
The goal of any software development and test process is to take code that starts out buggy and refine it through a series of testing steps from left to right so that it ends up as a well functioning application for your end users. Making this process as efficient as possible is critical to maximizing team performance as it relates to lead time - how long it takes for a newly designed feature to reach end users, code stability - how often bugs delay releases or negatively impact end user, and overall development velocity - how many new features you can release to end users in any given period of time. In this article we'll take a look at how your test environments play a critical role in how well your team can perform according to these common Key Performance Indicators (KPIs) when producing new software.
The long running industry standard is the Persistent Environments Model. In this model software teams use one or a series of persistent, static test environments. Individual developers develop and test on their laptop, they merge into a persistent QA environment where additional tests - either manual, automated, or both - are executed, and, in many cases, there may be a Staging or User Acceptance Environment where final testing is completed before code is shipped to end users in a Production environment.
While this strategy has worked for many companies for a long time, it is inherently flawed. So much so that every team has a process for countering its ill effects. For many this has been the way business has been conducted for so long that the core problem has become an accepted reality as opposed to something that needs active revision.
Persistent Environments create a bottleneck where multiple changes to a code base arrive - often simultaneously - with little prior or formal testing. This leads to bugs being introduced into a trunk branch and, as often happens, environments "breaking" where testing cannot continue until the bugs are repaired. While the resultant "code freezes" are a major drag on productivity the real problem here is that bugs are introduced in the first place.
For every bug introduced into a trunk branch your team lead has to choose a lesser of two evils approach to fix the problem. They can try to revert the change, which carries its own set of risks, or they have to wait until the bug is repaired. Either way the whole team is slowed down until the problem can be addressed. You can ship features or changes in iterative steps but you can't ship them in combination with a significant bug. The key is keeping the bugs from being introduced in the first place, we'll get to the "Golden Path". But first let's understand how bugs show up in a trunk branch.
First let's start with local development. When a developer is working locally they are working in an environment that does not look or act like production. While tools like Docker and Docker Compose have made this much less of a problem than it was for a previous generation, you still have the problem of your local environment not being accessible to anyone else or most forms of automated testing. Simply, there's no URL endpoint for the application running on your laptop, and, of course, your laptop is not the cloud.
For new code to be tested beyond the Developer who wrote the feature it must be deployed to an environment. For it to be deployed to an environment it needs to be merged so that your Continuous Integration system can build and deploy the updated branch. This is how bugs are introduced. Testing prior to merging is limited because the only available pre-merge environment is the local Developer Environment.
You can always pull a teammate's branch, compile it, and run it yourself in your local environment. But this takes time and it can be difficult to guarantee that your environment configuration is the same as theirs - and the more complex your dependency map the harder this becomes.
Some tools have tried to address the local development accessibility problem with generally disappointing workarounds. For example if you want to test running code on someone else's computer there are tools like ngrok that allow you to VPN into their machine - of course they must have their machine up, actively connected to the internet, and the application running.
Another option that Docker introduced as Dev Environments is to take a snapshot of your running environment and share it with another teammate. But by and large these are inefficient tools for improving the testability of locally running code.
Teams have been forced into the Persistent Test Environment model because historically there have not been tools readily available to address the core problem. Environments have always been challenging - challenging to set-up, maintain, and protect. To quote Vlad Rusu - Head of DevOps at Lola Tech:
"10-11 years ago we needed weeks to create a new (moderately complex) environment. I remember configuring every single server and application by hand, following my own notes from a google doc."
Even with the overwhelming adoption of Cloud services and Infrastructure-as-Code over the last decade, creating and managing environments is still quite challenging and requires skillsets that are in the highest demand. This reality has made Preview Environments or more literally what is referred to as Ephemeral or On-Demand Environments something only accessible to teams with the resources and skillsets to build their own solution.
A great example of this is the team at StackOverflow. With limited options available to them they set out to build their own Preview Environment, or as they refer to it, a PR Environment solution. You can read about that on their blog, but here's why they did it:
"We’re using Kubernetes to host what we call PR Environments. Every pull request can be run in an isolated test environment at the push of a button . . . This isn’t something we invented; other organizations have been using this concept. The idea is that every code change goes into a version control system like Git through a pull request. Other developers will review the code, but the code won’t tell the whole story. You want to see the code in action. Normally, you’d have to download all the code locally, compile, and run it. That could be simple, but if you’re running a large application that draws code from multiple repos or—have mercy—a microservice architecture, then you may run into several hours of debugging. Even better, let’s say you’ve squashed all of the commits for a new feature into a single one and are committing it as a single PR. Send that PR environment to sales or marketing as a single link so that they can preview the feature in action. If your sales team wants to demo the app with specific features or custom builds, send them a PR environment link. You won’t have to spend time walking your less technical colleagues through the build process."
Of course wouldn't every organization love to have the talent and resources to build an internal solution like StackOverflow! Across the industry there have been a handful of large industry-leading organizations - typically with at least 50 or more software engineers - that dedicate a 3 to 4 person "platform team" to build out an internal Preview Environment solution. Of course, this means the capability has been limited to the few even though its benefits should exist for the many.
To put in perspective just how valuable Preview Environments are for an organization you can look at what a platform team costs. Given the experience and skill sets required for such an undertaking, if an organization has just three engineers working on their platform team, their salaries and benefits would combine for well over $500k annually. When you can improve the development velocity of a team by nearly 50%, the high cost is still worth it for organizations like this.
Fast-forward to 2022: everything I've just explained is exactly why the great majority of teams have historically been forced into a Persistent Test Environment model. Preview Environments have only been accessible to large organizations with the critical resources necessary to build an in-house solution.
What's different now from even a year ago is that Preview Environment solutions have become commercially available and Uffizzi is even available as an open source solution. As often happens in technology the economics have been completely flipped on their head. What a year ago would cost an organization over $500k annually in manpower alone is now available for 20x cheaper!
Front End Preview solutions have been available for the last two to three years on platforms like Netlify and Vercel. Then there are solutions like Heroku and Render that do partial Preview Environments (or what Heroku calls Review Apps) - that is they'll spin up a branch but not a full stack with a per environment database and other key dependencies.
What's really exciting about Uffizzi is that it's a purpose-built Full Stack Preview Environment solution based on the highly popular docker-compose specification. Uffizzi is available both in the open source and as a SaaS solution and it can be readily adopted by any organization to instantly upgrade their test environment strategy.
Uffizzi is designed for advanced teams to be able to simply plug in a Preview Environment capability to your existing CI/CD workflow. No changes to your core infrastructure - just a valuable addition that helps your team merge clean code in half the time. If you'd like to get started for free with an example application you can spin up your first Preview Environment in just a few minutes with our quickstart application - you don't even need to login to Uffizzi - you can do it all from GitHub.