How to Design Runners That Scale Using Gitlab-CI (Part 1)

This article focuses on how to design your Gitlab Runners to achieve optimum CI efficiency.

Tim Osborne
Tim Osborne

Software Engineering Manager

Share this blog post:

Deimos Fallback Image

Designing Scalable Gitlab-CI Runners 

Article Co-Written by Mmadu Manasseh and Ayodele Mia Seriki

CI/CD (Continuous Integration/Continuous Delivery) forms an important part of software development and release processes by introducing automation to help with code integration, testing, and deployments. There are several CI/CD platforms with different setups. It is important that whatever platform and architecture are used for running CI/CD jobs, the efficiency of running jobs isn’t impacted even in events where the number of jobs spikes. This article focuses on how to design your Gitlab Runners to achieve optimum CI efficiency.

An image with a text: You said the pipelines would run. Yes I didn't say it wasn't going to be fast

On the GitLab platforms, Runners are used to execute jobs.  These runners listen for jobs on the GitLab instance and execute those jobs on configured instances, VMs, Shell, or docker containers. Although GitLab provides shared runners (for SaaS Gitlab), you can additionally configure your runners to execute your jobs. Bringing your own runners gives you the flexibility of configuring them to your needs. 

Creating and Configuring a simple Runner

Before biting deep into this big Gitlab-Runner Design Pizza, let’s take a quick peek at how to create a basic runner. Creating a self-hosted runner requires the following steps:

  • Installing the runner: Depending on how you wish to install your runner, there are different steps for this. Runners can be installed on Linux VMs, Windows VM, Docker containers, Kubernetes Clusters, etc. Go to Gitlab Docs to install a runner of your choice.
  • Registering the Runner: the process of registering a runner configures the runner to listen for jobs on a certain GitLab instance. For that, you’ll need a registration token from the GitLab instance. There are different types of tokens with different scopes. The token could make the runner a shared runner (accessible across all GitLab instances), a group runner (accessible within a GitLab group), or a project runner(accessible only within a specific project). For more information, see this document from GitLab on the scope of runners. With the runner installed, you can now start up the runner instance/container and pass the necessary information required. Upon successful registration, the runner should be visible on your GitLab instance.
  • (Optional) Configuring the Runner: Sometimes, the defaults of the runner are just not good enough, and you want to make some mods to better suit your needs. GitLab Runners read configuration from a config.toml file. The contents of this file can be modified, and the runner restarted to have the changes take effect. To see the full options available in the file, go to the documentation from GitLab on Advanced configuration.

Gitlab Runner Architecture

Understanding how GitLab Runners interact with the Gitlab Servers and run jobs is paramount to understanding how to make them efficient.

an image that shows Gitlab runner architecture in How to Design Runners That Scale Using Gitlab-CI (Part 1)

The Gitlab Runner listens for jobs from the GitLab Server. Once there is an available job, it schedules it on any configured executor. An executor is akin to the platform and method that executes the received job. This executor could be a docker (which runs the jobs in a docker container),  Virtualbox (to run the jobs in VirtualBox), the shell (which runs jobs in the local shell), or even a Kubernetes (to run the jobs in Kubernetes pods) among others. The runner monitors the jobs and reports logs and status of the jobs to the Gitlab server. Several executors can be configured for a given instance, and several instances of the same executor can also be deployed. For instance, you can have two runners all configured to run the docker executor.

Below is a sample Gitlab Runner configuration with three executors; Kubernetes, docker, and shell.

listen_address = ":9252"
concurrent = 10
check_interval = 30
log_level = "info"
  session_timeout = 1800
  name = "k8s-runner"
  request_concurrency = 1
  url = ""
  token = "SometToken"
  executor = "kubernetes"
  cache_dir = "/tmp/gitlab/cache"
    host = "someK8s.apiserver.endpoint"
    = false
    image = "docker:19"
    namespace = "gitlab-runner"
    namespace_overwrite_allowed = ""
    privileged = false
    pull_policy = [""]
    = ""
    = ""
  name = "docker-runner"
  url = ""
  token = "SomeToken"
  executor = "docker"
    tls_verify = false
    image = "docker"
    privileged = false
    = false
    oom_kill_disable = false
    disable_cache = false
    volumes = ["/cache"]
    shm_size = 0
  name = "shell"
  url = ""
  token = "someTokem"
  executor = "shell"

Understanding Bottlenecks and Problems

Having understood the architecture of the Gitlab Runner, let us look at some areas where a GitLab runner setup might fall short in terms of efficiency.

The executors

There are different types of executors as earlier listed (Docker, Kubernetes pods, shell, VMs), etc. Depending on your use case, your choice of runners might be different. However, internally, we’ve found running jobs in Docker containers (docker, Kubernetes pod) enough to cater to all CI needs. For executors to run, they need a host VM. However, if there are no jobs actively running, then that VM goes on inquiring costs without being used. Additionally, in cases where there is a spike in active jobs, the execution of one job shouldn’t affect the other or drain the resources (CPU and memory) required by the other jobs. If such happens, the jobs get slower and developers get frustrated. Hence the first problem: we need to also be able to scale the machines running the jobs up to accommodate all jobs while ensuring each job execution does not affect the execution of another and also scale down when there are no active jobs.

The Runner

With more and more jobs getting triggered in the GitLab instance, the runner schedules more jobs. However, there is a limit of jobs each runner can manage. This is defined as the concurrency in the config.toml file. Why is this a problem?

If an executor has a concurrency of 10, and there are 20 jobs currently triggered in the GitLab instance, only 10 of those jobs will be executed at once. Any remaining job will queue up and wait for an existing job to be completed before being scheduled in any freed_up spot. This increases overall CI time and negatively impacts developer experience. 

But, hey, can we not configure the runner to run a high number of jobs, say 50 or even 100? Yes we can, but this will require more static resources (memory and CPU). And what if only five jobs are executed? Or events of public holidays when no jobs get executed? The large allocated resources go to waste. The second task thus is to be able to scale the runner up and down depending on the number of jobs currently triggered in the Gitlab instance.

In this article, we will be discussing how to architect a solution for the first problem. In the follow-up article, we will then cover the second problem.

PROBLEM 1: Scaling the Executors

For this problem, our choice of executors affects our scalability. Having tried most of the executors, we have found the Kubernetes and Docker+machine executors the best and easily scaled solutions.

Using and Scaling the Kubernetes Executor

An image with text: Kubernetes everywhere

One of the many features Kubernetes offers is the ability to scale your cluster up and down according to the load on it via the cluster autoscaler. Cluster Autoscalers ensure your cluster can scale up by provisioning more nodes as required up to a particular limit and also ensuring it scales them back down when no pod is scheduled on the nodes. Hence, having the cluster-autoscaler installed and configured in the cluster is the first step to properly architecting this solution.

Some managed Kubernetes clusters like GKE provide options to configure this autoscaling when creating your cluster or nodes. And by doing so removes the burden of you configuring it yourself. With the cluster-autoscaler configured, we can proceed.

When using the Kubernetes Executor, ensure that the GitLab jobs do not interfere with the normal execution of other services in the cluster. This is achieved by ensuring that the jobs run on dedicated nodes (which will be scaled up and down) and that appropriate network policies are applied where necessary. 

The remaining part of this section uses GKE as a reference, but the steps should be the same on other managed Kubernetes clusters. 

Note that the runner that schedules the job on the Kubernetes executor doesn’t necessarily need to be deployed in the cluster. The runner can be deployed in a VM and still configured to use kubernetes executors.

To ensure we can scale our executor, we need to configure the cluster as follows: 

Create a new node pool (node group) with node autoscaling enabled via the cluster autoscaler, with a minimum number of nodes set to 0: this will allow the nodes to scale down to zero when no jobs are being executed, such as during weekends or after the close of business.

an image that shows the node pool details in How to Design Runners That Scale Using Gitlab-CI (Part 1)

  • Configure node taints on the runner nodes: this will ensure no other pod will be scheduled on the nodes aside pods that tolerate the taints on the node (the CI pods).

an image that shows the node metadata in How to Design Runners That Scale Using Gitlab-CI (Part 1)

  • Deploy the GitLab runner e.g., using helm chart, but configured that the pods which will run the jobs tolerate the taints on the runner nodes. Also, configure nodeSelector or node Affinity in a way that the pods are always scheduled on the Nodes created above. Setting the tolerations without setting the nodeSelector or node affinity will allow the job pods to tolerate the taints on the runner nodes. However, the pods can be scheduled to other nodes, which is not really what we want. Setting the pods nodeSelectors without setting the tolerations will force the pods to be scheduled on the runner nodes, but since they don’t tolerate the taints on the nodes, they wouldn’t be executed on the nodes.
  • Configure resource requests and limits on the pods so that every job has enough resources to run and doesn’t use more than it’s supposed to. This will prevent the issue of having a job exhausting all the available memory/CPU on the nodes. Jobs that require more resources than usual can always have that redefined as part of the variables in the .gitlab-ci.yml file.

If the nodes are deployed with a taint gitlab-ci-dedicated=true and have labels kind=ci, we can then have a sample gitlab runner configuration as:

   image = "docker:19.03"
   cpu_request = "200m"
   cpu_limit = "500m"
   memory_request = "200Mi"
   memory_limit = "500Mi"
      kind = "ci"
      = "NoSchedule"

With this configured, we have a setup that scales up to a limit, which we define and can scale down to zero when no jobs are running/scheduled.

For an easy way to configure this on your cluster via infrastructure as Code(Terraform), check out this module and the examples in our terraform modules gitlab-runner terraform repo.

Using the Docker+machine Executor

The docker+machine executor uses the docker-machine. A tool, which spins up new VMs/instances on cloud platforms and runs docker in them. When a new job is received, the docker+machine spins up a new vm to run the job. That means the number of running VMs is equivalent to the number of jobs. Also, each VM runs a single job (docker container), hence there is no contention for CPU and memory resources.

an image that shows the docker + machine executor in How to Design Runners That Scale Using Gitlab-CI (Part 1)

The docker-machine repository is deprecated. However, the Gitlab Team has an active Fork which it actively maintains and adds new features to. Configuring the docker+machine executor is dependent on the cloud platform you use. A sample GitLab runner config for GCP docker+machine Executor is shown below:

    IdleCount = 0
    IdleTime = 600
    MaxGrowthRate = 0

    MachineDriver = "google"
    MachineName = "runner-%s"
    MachineOptions = [
      "google-project=gcp-project" ,
      n1-standard-1" ,
      "google-network=default" ,
      "google-zone=europe-west2-a" ,
      cloud-platform" ,
      "google-disk-type=pd-standard" ,
      "google-disk-size=20" ,

Since each job is executed in isolated VMs, choosing the ideal size for your executor VMs becomes very important. Your choice of VM type should depend on the job executed and the cost of the VM. For GCP, you could consider choosing the N1-standard-1 for a start and scale up when required. You can have different runners with tags configured to run on heavy VMs and another on light-weight VMs. You can then configure jobs to run on runners using specific tags.

For setting up docker-machine runners on GCP, check out our terraform module. For AWS, Checkout this AWS Gitlab Runner terraform module.

Certain factors can influence the choice of going with a Docker+machine Executor or a Kubernetes Executor. However, as opposed to other executors, these two executors ensure proper scalability and isolation of jobs, making them the ideal choice for anyone looking to set up  Organization GitLab Runners.

The End of the First Part of Designing GitLab Runners that scale

As stated earlier, this is the first part of designing GitLab Runners that scale. For some use cases, this setup is enough to improve CI performance and conserve cost. There are cases where there are job spikes that overwhelm the capacity of your runners, resulting in lots of CI Jobs queued and waiting for jobs to finish. For such cases, as earlier discussed, we’ll need to scale the runners so that more runners can handle more jobs concurrently, increasing the overall developer experience. For that, watch out for the next part of the series.



Share this blog post via:

Share this blog post via:

or copy the link

Let one of our certified experts get in touch with you