Home Source code How GitOps Meant Fewer Application Crashes and Failures for an Online Bank

How GitOps Meant Fewer Application Crashes and Failures for an Online Bank


Courage of NatWest falls squarely into the category of an online banking startup. With no legacy equipment to manage, it started out one step ahead of its competitors: established banks and financial service providers with data archives on server equipment often decades old. The bank’s DevOps team was already able to reap the benefits of cloud-native infrastructure to build apps and deliver services to customers online faster, in ways that many established financial institutions struggle to achieve. today.

For a company that was fundamentally born in the cloud, one would assume that there wasn’t much room to improve the productivity of its service delivery.

And even.

Soon after it began operations and the company began to grow, so did its developer and operations team. However, instead of delivering new applications and application updates at an ever-increasing cadence, the DevOps team began to suffer from a demonstrable lag in CI/CD productivity. One of the major pain points was how developers continued to build and commit code to production at an adequate rate, but there were unacceptable and increasing delays between when the developer released code and when it was released. when the code was deployed to production.

Much of the CI-related delays were due to testing. A developer would create code and then run preliminary tests on their laptop or workstation. After the code ran as it should on the developer’s laptop or workstation, the code was then validated. Load and other performance tests were run in another environment, out of the hands of the developer.

Code validation, testing, and application deployment was largely a manual process. As manual testing was done by Mettle’s operations team, the developer’s work would stall, waiting for code to be deployed or returned to the developer for fixing, a back-and-forth process which added even more latency to CI . When applications didn’t perform as they should in production and were returned to the developer for fixing, the latency only increased for the development and deployment cycle. Round trips don’t matter too much when only a few developers are involved, but can lead to an exponential drop in productivity as more developers are added to the team to meet the demand for a higher throughput. high number of updates and application versions. .

Missing speed

It can be said that Mettle suffered largely from a speed problem. Coined by Google as DevOps Research and Evaluation (DORA), DORA’s velocity metrics show that companies that develop and deploy software more efficiently are twice as likely to achieve their business and organizational goals. These metrics cover:

  • Deployment frequency: How many deployments performed per month or per year.
  • Change deadline: How quickly applications can run successfully from the time a validation is completed through deployment.
  • Modify the failure rate: The percentage of deployments that fail and need to be rolled back.
  • Service recovery time: The meanwhile to restore (MTTR) application crash in production.

GitOps put to the test

Mettle’s answer to his CI/CD lag problem in hopes of seeing quantifiable speed improvements measured by DORA metrics was to embrace GitOps. She started relying on GitOps to standardize workflow and to deploy, configure, monitor, update, and manage applications in production.

One of the most recent developments in GitOps is how it can be used not only for applications, but also for setting up and managing Kubernetes clusters, now applicable to multiple clusters. This ability extends from how Git is the only source of truth, as the desired configuration is declared here. There is a GitOps agent running in Kubernetes, which constantly compares the actual state in Kubernetes with the desired state stored in Git. Any new changes merged into the monitored branch in Git are automatically applied to Kubernetes. Conversely, any manual changes directly applied to Kubernetes are automatically reverted to the desired state declared in Git. Configuration drift is eliminated.

With GitOps, Mettle began to see quantifiable improvements measured by DORA metrics. Testing became automated such that a commit not only started the build process, but that same commit could then be used to update the deployment manifest on Git. Although testing a single container or application can be done by the developer alone, testing that developer’s application or container in concert with the other containers and microservices provides a more realistic assessment of the how the code will run in production.

With the entire environment declared in GitOps, starting or maintaining an integration environment is simple. This environment can be “long-lived”, which means that the integration environment can be accessed 24/7 (which is also useful for distributed developer teams that are in different time zones ). An integration test environment can also be provisioned on demand if needed.

Developers can also create the integration test environment themselves, rather than entrusting this task to a dedicated Ops team. In this case, developers can act faster if they don’t have to wait for a test environment to be created for them.

With applications and containers declared with GitOps, as well as the cluster itself, integration test management is simplified and accelerated. The requirement for full integration testing can be handled directly by developers or DevOps teams, reducing the time it takes to provision the environment.

With a major software push that may include around 20 development teams, different containers from different teams are tested and validated in Git and tagged. The tag represents the version of all versions of the container while all versions of the source code.

The validated and tested code is then automatically put into production from Git. The updated code also remains independent and accessible and can be audited on Git (which reflects the same code commit in the cluster). In other words, deployments can be completed in minutes, which previously could take hours or even days. In an online cloud-native world, this productivity gain means that features are completed and made available to the end user more quickly. Measurable productivity improvements across CI/CD are achieved, for both development and operations teams.

Faster deployments and improved DORA metrics also pose no security risks. Indeed, GitOps provides an essential framework for DevSecOps, for security checks that span the entire CI/CD, as well as during the post-deployment stages of managing applications on Kubernetes clusters. With the tag, a full audit trail is available and accessible. Source code, build, deploy, and test can all be produced with the tag. As a result, Mettle was able to test, secure, and deploy code across its entire environment much, much faster – that’s where the biggest gains and productivity came from.

This whole process can be completed by Weaving GitOps Enterprisewhich became the first GitOps platform that automates continuous application delivery and automated operational control for Kubernetes at scale.

Good numbers

In total, Mettle reported substantial improvements based on all DORA metrics. Developer productivity alone improved time savings by 65%, thanks to the ability of the DevOps team to develop, build, test and deploy much faster and ultimately deliver improved services more quickly to bank customers.

This improvement in developer productivity is particularly important for cost savings given the high salaries that Kubernetes application developers typically command (the more productive a developer is, the fewer developers the organization needs to employ), which is a good news for the CTO. It’s win-win: measured in productivity, the developer does better. The CIO can easily show how developer costs are cut in half. Mettle is also not the only company seeing massive productivity frequency improvements for developer productivity, reductions in MTTR, deployment speed, and more. – thanks to GitOps.