Project Blueprint: DevOps EngineerReading time 9 minutes

Published: 08.05.2024

In our previous entries in the Project Blueprint series, we have taken a look at how a Team Lead and a QA professional view their responsibilities and work throughout a single project. Today I’d like to provide a view into the mind of a DevOps Architect / Engineer in a regular Zure software engineering project from sale to sundown.

Sales

It all starts at estimations. While not all DevOps personnel at Zure participate in sales efforts, most do provide assistance in estimating the amount of work required for a project. We often already know what type of solution we’re going to be building – albeit at a high level – but the lesser details are often unclear at this stage. Do we need to secure everything with networks? Is there some centralised firewall that all traffic must go through? What kind of security scanning is required to meet the customer’s CISO’s expectations?

To make things worse, these seemingly minor details tend to produce disproportionate amount of work when you add in the communication overhead of working out the specs with the customer / party operating the Azure hub and spoke models. Thankfully Zure has built tooling to somewhat accurately estimate that part in the offers, but even then it’s possible to miss the ballpark.

So what does the DevOps expert then need to focus on? I’d say most importantly how their own implementation speed and knowledge of the technology compares to the average Zure DevOps person. This allows one to make a reasonable guess on how long the implementation will take, and then scale it to the average. What helps here are our standardised ways of starting a project, so all of our DevOps personnel already have familiarity on what actions need to be taken to get our templates running and ready for modifications.

DevOpsEngineerEstimating — Engineer forgetting to 3x his original estimate before sending the offer. The image is AI-generated.

Project kick-off

DevOps work, however, is often very technical and focused on enabling others to do their work more efficiently.

A guiding principle for Zureans is to provide visible business relevant value to the customer immediately from the Project’s first sprint. This could mean for example a clickable prototype that the customer can play around in to help everyone form a better understanding of what exactly is the best way to implement a feature.

DevOps work, however, is often very technical and focused on enabling others to do their work more efficiently. Thus, following the guideline requires looking at the definition of business value from a different perspective. While the end-users or customer stakeholders may not interact with DevOps outputs such as CI/CD pipelines, infrastructure automation, or code quality / security scanning tooling, these components are crucial in enhancing the agility and efficiency of the team. To align a DevOps Engineer’s early work with Zureans’ principle of delivering immediate value, the focus should shift towards prioritizing unblocking the team and afterwards showcasing how these technical enhancements enable showcasing our work from production-like environments, expedite development cycles, and ensure a quality implementation right from the start.

Thankfully, our standardized tooling for new projects makes us able to deliver a general purpose infrastructure setup complete with fully fledged CI/CD flows and guidelines blazingly fast, enabling the rest of the team to avoid (or, at the very least, postpone) using their time on things with lesser customer visibility.

Unfortunately there can be no “one size fits all” solution in software development, and everything is a tradeoff. The DevOps Engineer’s responsibility is to agree with the team and the customer on what infrastructure related tradeoffs we should be making, and mould our starting point infrastructure towards those goals.

Some questions we often need to find answers for right from the start can include:

What naming conventions do we follow? Are there other governance related restrictions or guidelines?
Does our team have required permissions to the Azure subscriptions and DevOps platform?
Do the subscriptions have required resource providers registered?
Does the Azure environment already have the required automations to support private endpoints if they are required?
Are there agent pools ready to run our CI/CD flows? If not, are there restrictions on how to host them?
What do these (sometimes surprising) requirements actually mean in terms of extra work for the team? How do we present these findings so that the customer also understands?

Implementation Phase

The Implementation Phase is a crucial period in any project where concepts and preliminary designs meet the real world. This is where our DevOps Engineers transform initial baseline into 100% fit for purpose, efficient, and secure infrastructures.

It is however important to keep in mind that due to the scarcity of DevOps personnel in relation the number of Developers, they are often helping multiple teams accomplish these goals in many projects. In practice this might mean that they work in an implementation phase project for a couple of months, and then hop into another project that is just starting, or is in one of the later phases. Then when the time is right, the engineer might hop back into action in the initial project.

A somewhat unfortunate side effect of this type of practice is that sometimes non-functional requirements that should arguably have been taken care of during the implementation will only be able to be handled in the later phases instead.

Let’s take a closer look at what some of the implementation phase tasks most commonly consist of.

Infrastructure as Code (IaC)

At the outset, leveraging Infrastructure as Code practices is fundamental. IaC not only speeds up the setup process but ensures consistency and compliance across development, staging, and production environments. Throughout this phase, our DevOps team continuously refines the implementation, addressing requirements such as network configurations and connectivity issues. This iterative improvement helps manage complexities that often arise from dynamic project requirements.

CI/CD Pipeline Optimization

A significant chunk of implementation effort is dedicated to enhancing and optimizing the Continuous Integration and Continuous Deployment (CI/CD) pipelines as well as the agents running the logic. The aim here is to reduce runtime, enable parallel executions, and minimize unnecessary operations, thus accelerating the delivery cycles. We also further integrate advanced security practices, like GitHub Advanced Security for Azure DevOps, and establish Pull Request (PR) policies that trigger modular runs of the CI process depending on the nature of the changes submitted.

Disaster Recovery and Operational Resilience

Implementing robust disaster recovery strategies and backups is non-negotiable, ensuring that the system can recover swiftly and with minimal data loss in case of failure. Training the team on these processes, coupled with setting up comprehensive logging, initial monitoring, and dashboarding, forms the backbone of operational resilience. Instrumentation with tools like OpenTelemetry alongside developers helps in tracing and diagnosing issues across distributed systems more effectively. The primary goal is to find ways our system can fail, prevent as many of them as we can and prepare actions to perform when a failure will eventually happen.

Continuous Feedback and Iteration

Finally, the implementation phase is not just about building but also about learning and adapting. Regular feedback loops with the project team and stakeholders help identify potential improvements. These insights drive further iterations on the infrastructure setup, ensuring that the project not only meets but exceeds the expectations of quality, performance, and reliability.

By focusing on these key areas, the DevOps team at Zure ensures that the infrastructure not only supports but actively enhances the capabilities of the entire development team, paving the way for a successful transition to the subsequent phases of the project lifecycle.

Support ramp-up

Zure offers support services for the software we create, and the offering makes sense for most customers. The project teams and our Continuous Improvement Services team work together to make sure the project’s applications meet the criteria of adequate monitoring, alerting, development and troubleshooting tooling to be able to be brought into the support service. Most often the DevOps person’s job here is to further help the team answer the questions of “When do we really know that the application is broken?”, “How do we track that signal?” and “When should we alert for help?”. Depending on if the project has already been running on production, these questions might not have had enough thought put to them during the implementation.

The aforementioned questions can be answered in multiple layers of complexity, but in any type of support where someone has to context switch or even wake up during the night to resolve your alerts, it’s very important to get the answers as close to reality as possible. Alerting on a signal like “a single 500 error during the last minute” just won’t do.

BetterAlerts — A surefire way to get better alerts is to make their creators own the response to the incidents.
It’s super effective past midnight! The image is AI-generated.

My personal favourite tool for answering these questions are Service Level Indicators (SLI) and Service Level Objectives (SLO). You can read more about them and the surrounding themes from my previous post here. In a nutshell, this involves moving the thought process from technological metrics to business metrics, and focusing alerting on when those show red. The unfortunate reality is that to get the most value out of the SLO approach it should be implemented as early as possible, and with some projects there just isn’t enough in the budget to go so far before the app has been running in production for quite some time.

Maturity in Production

Not all projects get this far with the whole project team intact. From the DevOps engineer’s perspective the maintaining phase of a project is done mostly by the support team consisting of developers who have worked on the project previously. This is often the case especially in projects where there is no active development happening and the project can be considered to be “completed”.

However, there are a few where the resulting application is in the very core of the customer’s business and constant development efforts are actively happening. In this phase the team can focus more on the non-functional aspects of the application, like performance, making telemetry easier to read and monitor, fine-tuning scaling and cost of components and further optimising deployment frequencies and speed. This is often also where the aforementioned SLO practices get customers interested in the benefits of better understanding how the users of the application feel about it.

In these types of cases, the DevOps Engineer is often called in for special assistance or work that they can accomplish in a fraction of the time it would take the developers in the team to implement. These projects are also great learning environments for people who have not previously been in the team, as the time pressure is often not as heavy as in previous phases.

Sundown

The application in this phase no longer has a future and is being shut down. It has served it’s purpose, but there is still work to be done.

While not solely the responsibility of the DevOps Engineer, the team needs to make sure that everyone that has been using the application has had enough time to move elsewhere. When it’s time, the Azure resources get cleaned up, databases get wiped and backups are dealt with according to the customer’s policy. We also cannot forget about the used DNS names which might need to be forwarded elsewhere or disabling alerting and monitoring tooling.

There also might be issues related to security during the cleanup like permission changes and making sure the data is permanently deleted. At Zure, while we do have personnel specializing in security, it is generally viewed as something that all roles need to take into account.

These cases happen pretty rarely, but it’s the best feeling to close out that chapter of the team’s work for good.

Closing Thoughts

As we wrap up our journey through the lifecycle of a project from the DevOps perspective at Zure, it’s evident how integral this role is in not only setting up and maintaining the technical infrastructure but also in ensuring the project’s success through every phase. From initial estimates to the final decommissioning of services, DevOps Engineers are crucial in optimizing, securing, and future-proofing the systems that underpin all project activities. Their work, though often behind the scenes, lays the foundation for operational excellence and customer satisfaction.

At Zure, our aim is not just to meet but to exceed the expectations set at each project’s start, a goal only made achievable by the combination of seasoned experts of varying roles making up the project team.

Share this post:

Pasi Huuhka

DevOps Architect

Pasi has over 10 years of experience working with Microsoft products and has a broad knowledge of Azure services ranging from IaaS all the way to serverless offerings. During his time in the industry, he has worked as a cloud architect, developer, systems administrator, and project manager. Pasi is currently focusing on DevOps, cloud automation, and application development on Azure.

pasi.huuhka@zure.com