In this article we discuss two strategies for seeding database instances for each of your ephemeral Preview Environments.
Depending on your application and test data requirements, you can use one of the following methods to seed relational database instances for each of your Preview Environments:
With either of these methods, the test data can be stored in your repository or object storage, depending on the size of your data set and how frequently it gets updated.
Previews simplify the testing process by replacing a polluted QA environment with a clean and ephemeral test environment for every feature branch. But what if your application depends on a stateful service like a database? How can you include persistent test data or other stateful configurations in each of your Preview Environments? In this guide we discuss important design considerations for database seeding in ephemeral environments along with specific techniques you might consider to add seed/test data to your Preview Environments.
This article assumes that your application or service uses an SQL relational database management system (RDBMS), such Postgres, MySQL, or MariaDB. Non-relational (NoSQL) databases and other types of datastores can also be seeded but are outside the scope of this article.
Before discussing specific seeding methods, it's first worth considering where to store data before it gets seeded into a database.
One option is to store your seed data in your Git repository alongside your code. GitHub warns about files larger than 50MB and blocks files larger than 100MB, so this isn't a good option for larger data sets. For files larger than this, GitHub offers Large File Storage (LFS), which uses a pointer file committed to your repository that references your actual data, stored outside your repository. However, if your seed data changes frequently, such as when it's mirrored from production, storing within Git will generate many commits, each with a full duplicate. Therefore, storing seed data in your repository is only recommended for small datasets that change infrequently.
If your seed data is larger than 100MB or changes often, you should consider storing it in an object storage service, such Amazon S3 or Google Cloud Storage. In addition to being more scalable, object storage is also simpler and more cost effective than storing within a Git repository.
Of the seeding methods discussed, loading an SQL dump file upon container initialization is the most broadly applicable, as it will work with any application that uses SQL like Postgres, MySQL, or MariaDB. This is because we rely on the database container to seed itself, instead of an application container performing the seed.
The official Postgres, MySQL , and MariaDB images published on Docker Hub include facilities for populating the database on start-up: these containers will automatically execute *.sql, *sql.gz, and *.sh files stored in the /docker-entrypoint-initdb.d directory.
Important
As of 1 May 2023, there is a known issue where Uffizzi will not mount individual files as volumes. This is currently being fixed. As a workaround you should mount the SQL dump as a directory that contains your *.sql, *sql.gz, or *.sh file(s).
Therefore, if you mount a directory containing one of these file types as a volume in /docker-entrypoint-initdb.d, you will have a seeded database when the container starts up. Note that this method will work whether your SQL dump is mounted directly from your Git repositiory or whether a directory containing a script is mounted that fetches the dump file from object storage. See the following official Docker documentation for details:
Postgres initialization documentation
MySQL initialization documentation
MariaDB initialization documentation
As described in the links above, the database container will execute these initialization files in sorted name order and only if the data directory in the container is empty, i.e. if there are no pre-existing databases. This is useful for Uffizzi Preview Environments since a freshly seeded database will be created when a pull request is first opened, but any new data will not be overwritten when the environment is refreshed upon new commits to the branch. When the environment is destroyed (by merging or closing the pull request), the database and volumes are destroyed too. This allows teams to persist data during the lifetime of the Preview Environment, while ensuring that new pull requests get a freshly seeded database without data accumulation from the last environment.
To store initialization files (*.sql, *sql.gz, or *.sh ) in the /docker-entrypoint-initdb.d directory of your database container, you will need to mount them as Docker volumes. In Uffizzi, you can define a host mount in Docker Compose as follows:
Figure 1: An example Docker Compose file that mounts a directory ./seed/ as a host mount volume that contains a seed.sh or *.sql.gz file. This script might fetch an SQL dump file (.sql) from object storage.
As noted above, currently you must mount a directory containing your SQL dump file or script, instead of the individual file(s). In Figure 1, a shell script seed.sh (stored in the ./seed/ directory) is mounted to the /docker-entrypoint-initdb.d directory of the Postgres container. When Uffizzi creates a Preview Environment from this Docker Compose file, the Postgres container will execute seed.sh to fetch seed data from object storage.
If your application language or framework includes facilities for database seeding—such as an object-relational mapper (ORM), you may want to choose this method. Whereas in the previous method an SQL dump file is loaded by the database container itself, in this method the application container seeds the database. An advantage of this method is that if your schema changes, your application can easily perform a migration to update the database schema. Database migration and seed commands can be added to your Docker Compose file using a command definition, so that your application performs these steps before the application itself starts.
Below are a selection of popular lanaguages/frameworks and their tools for database seeding. We plan to release detailed guides for these options soon.
In Django you can load data using the framework's built-in object-relational mapper (ORM). The manage.py loaddata command lets you load a fixture in JSON format. To ensure that the seeding step happens before your application starts up on Uffizzi, include this command in your Docker Compose file.
Figure 2: An example Docker Compose file for a Django application that performs a database migration before seeding data from a JSON fixture.
Atlas CLI is a declarative migrations tool for Go applications. You can declare the end-state schema you want using atlas scheme apply or perform versioned migrations using atlas migrate apply. Install atlas in you application container, then run migrations and seeding steps before starting your application.
Rails has good support for bootstrapping databases, as well as defining and customizing fixtures for testing, using the built-in migrations features of Active Record. To add seed data to your preview databases, you can define a db/seeds.rb file and run rails db:seed after your application starts.
Figure 3: An example Docker Compose file for a Rails application that performs a database migration before seeding test data from a db/seeds.rb.
If your application language or framework uses an ORM or declarative migration tool, you should consider using it for its dual ability to handle both schema migrations and database seeding. For a more generic solution that will work with any application that leverages Postgres, MySQL, or MariaDB, you can instead choose to load a SQL dump file from the /docker-entrypoint-initdb.d repository.
To store your seed data, it is recommended to use an object storage service such as Amazon S3 or Google Cloud Storage if your data is larger than 100MB or changes regularly. If your test data is small and changes infrequently, colocating your seed data with your code in your repository is probably sufficient.