GitHub Actions has become many teams’ go-to code-building and container-shipping solution. With its support for chained workflows, your commits to the main can run your tests, trigger container builds and pushes, followed by an HCP Terraform run to update your infrastructure to run with that latest build, and finally run a set of acceptance tests to ensure everything is still running healthy.
However, teams can get tripped up when considering how to roll out database migrations when their architecture involves a networked data store. A classic example is a Postgres instance running on RDS. Any mature production deployment would keep that instance inside a private subnet without internet ingress. This presents a question of how a team could get schema updates rolled out in an observable and managed way while keeping to a single release pipeline tool when that tool is not networked to the instance.
There is a streamlined strategy I’ve adapted over time with an eye to:
- Keep Github Actions as the deployment orchestrator
- Keep the RDS instance off the internet
- Minimize new infrastructure as much as possible
- Manage schema migrations with effective rollback strategies
Using dbmate to manage schema migrations
A key component of this strategy is reliably deploying schema migrations from an arms-length. To do this, my go-to tool has become dbmate. It’s a flexible CLI tool that lets you express your schema upgrades and downgrades in pure SQL and reliably falls back when a failure occurs. There isn’t any real secret sauce here – it makes it easy to source control your schema changes while using them in tests and standing-up developer environments. However, the consistency in experience makes it ideal for this strategy: it has a short learning curve and is straightforward for automation to parse success or failure states. Additionally, dbmate is designed to be used as a golang library, which allows for even more control flow options.
Run schema migrations using ECS
With a reliable migration deployer selected, the next question becomes how to run the migrations themselves. Given the goal of not putting the instance on the internet, dbmate can’t be executed using a standard GitHub action. There still needs to be a path to connect dbmate to the instance inside the VPC, which raises two important design decisions:
- How to run
dbmate? - How to source the migration files for
dbmate?
In the past, I have opted to run dbmate as part of a small go binary on a container via ECS, with the migration files preloaded onto the container. However, using dbmate as a CLI via the container can work for an initial implementation.
Build container image with schema updates
The aws-actions/amazon-ecr-login and docker/build-push-action Actions make building and shipping the container to an ECS repo trivial. Using dbmate via the CLI or the go library is a design decision. However, shipping the container with the migration files is a core strategy component.
The rationale is simple – schema migrations can significantly impact the integrity and access of the data. While automation simplifies and de-risks deployments in many ways, it also means most, if not all, integrity is being transferred to the system and the signals that drive it. If the migration files are stored on S3, what safeguards are in place to ensure the bucket’s contents are correct and as expected? git can solve the “assurance of content” question, but the container runtime then needs network and identity access to the Github repository, which may not be reasonable or even possible. Either strategy also adds more machinery and access requirements to the deployment process, which generates more overhead and things to break.
By shipping the migrations on the container, the contents of a git commit can be associated with a container sha, and the deployment can be mapped to an ECS task ID using that container sha (and its related CloudTrail event and CloudWatch logs).
Use ECS run-task from a GitHub Action
The container, built and shipped to ECR, can now run as an ECS run-task operation with all the related networking and database connection requirements. There are no official Actions from AWS to perform this operation. Generally, it would be difficult to recommend unofficial Actions – however, forking one into your supply chain could be a viable option.
That being said, the AWS CLI provides sufficient capabilities, requiring three main executions:
- Execute
run-taskwith the network requirements, extract the task ARN from the result- The network requirements cannot be embedded into the ECS task definition, so they must be included with the
run-taskstep. - Database configurations are a second consideration.
dbmateuses standard drivers under the hood, which means it can read the database configurations from the environment. These can be reasonably configured to the ECS task definition. However, for sensitive values like the password, ECS supports using Secrets Manager to avoid co-locating them in plain text with the task definition.
- The network requirements cannot be embedded into the ECS task definition, so they must be included with the
- Run
wait tasks-stoppedusing the task ARN to observe the container’s exit - Once completed, use
describe-tasksfor the task ARN and extract the exit code- The JMESPath here is
.tasks[0].containers[0].exitCode
- The JMESPath here is
Bonus point – use RDS IAM auth via a go binary
One final suggestion that further enhances the value of the golang capability is using AWS IAM Auth instead of static credentials. As of this writing, I am unaware of a reliable method of vending the temporary database password using this IAM pattern and hoisting it in a way that the dbmate CLI could leverage. It is, however, a small addition to a golang module for it to run as a pre-flight step. Considering the sensitivity of this type of worker, leveraging temporary credentials whenever possible is a net improvement.
Leave a Reply