Self-hosted GitHub Actions on ECS
GitHub Actions are awesome, right? They’re not perfect — but what tool is? Some 3rd party CICD tools are probably better but they would have to be significantly better to get over the native/3rd party barrier. After thoroughly reviewing features and competitors I don’t think any 3rd party is significantly better and the native integration is first class. For better functionality, you would have to migrate completely (inc. VCS) to a different platform (eg. GitLab).
In order to use GitHub Actions at any scale, you need to deploy a self-hosted runner.
I’m an AWS guy, so let’s work out how to do this.
Looking at the docs the only real difficult requirement is docker. This is a bit of a bummer as my default choice would be to run the runner on kubernetes but they’ve recently deprecated the docker runtime… In order to run the runner in docker you need to run “docker in docker” which is achieved by running a privileged container and mounting
/var/run/docker.sock. In a managed AWS world this basically leaves us with ECS EC2 as the only option. I would have preferred Fargate but unfortunately you cannot run privileged containers there. ECS EC2 is is then.
Before we proceed we need to know some basics about the runners.
How do Self-hosted GitHub Runners work?
GitHub Runners are a service that long polls GitHub for work. They reset the connection every 50 seconds. Before starting they need to be registered via the API and configured. They don’t take incoming connections and just need to make outbound connections to GitHub (and any other dependencies). This is great as the runners don’t need to be internet facing.
The first step is to “dockerise” the GitHub, luckily @myoung34 has done this for already myoung34/docker-github-actions-runner. His implementation takes some environment variables below, registers and starts the runner when the container is started.
A subset of @myoung34’s environment variables:
Deploying to ECS
Deploying ECS had a couple of gotchas/requirements:
- You have to put your docker hub credentials in AWS Secrets Manager to get around the rate limit funsies.
- You have to put your GitHub PAT in SSM to make it securely available to the task
- You need to spin up/deploy to a ECS cluster that can access GitHub and the outside world
- You need to consider/setup your ECS Instance Role, Execution Role and Task Roles in IAM
This makes for a very straightforward ECS Task Definition:
GitHub Actions Settings
After you deploy the task definition as an x5 service to ECS you get:
Testing with a workflow
It works! 🎉
It successfully triggers the workflow and picks up the task role:
But I have a gut feeling the “docker in docker” container might not go to plan:
Boohoo 😭 It’s picked up the
ecsInstanceRole, not the Task Role. This makes total sense as the instance is created by docker, not ECS but it's annoying. How can we get around this?
I logged into the box and had a poke around with
docker inspect [container name/id] and this showed me some useful environment variables provided by ECS:
As we can see from the self-titled environment variables,
AWS_CONTAINER_CREDENTIALS_RELATIVE_URI looks like our saviour 🎉:
We can easily parse the response and pass it into the “docker in docker” containers to give them the correct IAM Role. There is/will be a GitHub Action for this…
Considerations and wrap up
Overall I’m pretty happy, it’s a shame the roles don’t magically work but that’s somewhat expected. I will proceed to a production implementation of this. There are still a few unknowns to be discovered: how to recycle runners (they persist between jobs — death would be better but this might affect performance), scaling strategies for the instance/service count, how the GitHub Actions runner labels/groups work in reality and do any maintenance scripts need to get written? I also need to refine the “Docker security options”; I don’t think the container needs full “Privileged” access and setting some SELinux or AppArmour configure would be more refined.