Increasingly, companies are looking for SRE engineers, and their requirements are similar to those for system administrators: system maintenance, infrastructure administration, and knowledge of OS tools. Is it just a fashionable renaming of one profession into another, or are there differences?
We will tell you what an SRE engineer does, what he should do, how he differs from a system administrator, and what changes in a company when such a specialist comes to it.
Who Is A System Administrator, What Does He Do, And What Is He Responsible For?
Globally, this specialist monitors the infrastructure to work stably and quickly. More specifically, the system administrator:
- installs programs and updates;
- configures the operating system;
- makes sure that everything works stably;
- monitors system resources so that there is enough disk space and RAM;
- sets up monitoring and knows where to look in case of problems;
- creates backups;
- monitors the network and knows how to configure it;
- automates its work, for example, so that backups are created automatically.
The system administrator knows how operating systems work, works with system utilities, and understands the network structure. He is not a programmer and should not be able to write code. But quite often, problems arise, the solution of which lies at the junction of two areas: development and administration.
In traditional approaches to development, system administrators and programmers are isolated from each other, often located in different departments and pursuing different goals. A programmer’s job is to write code that solves business problems. The system administrator’s task is to ensure that all systems work stably.
And here, problems can arise. Sometimes, the program works correctly during development and testing, and when installed on production servers, errors appear.
Often, the solution to such a common problem is delayed for a long time because each team lacks the skills and knowledge. The SRE Engineer helps you deal with the complexity of integrating administration with development.
Who Is An SRE Engineer, And How Does He Differ From A System Administrator
The task of the SRE engineer is to make the system reliable, stable, and efficient. In general, this is similar to the functions of a system administrator but is achieved in different ways. Like a system administrator, he must understand the structure of the infrastructure, know what programs are running there and what load the servers can withstand, and work with the operating system utilities.
Four key areas distinguish the SRE engineer from the system administrator.
SRE Engineer Is Primarily A Programmer
An SRE Engineer is a programmer with administration skills. However, he writes code not to implement business logic but to improve the stability and performance of the system. Writing code is not the main task of an SRE engineer but one of the ways to achieve goals.
Programming skills help an SRE engineer to find a bug in a program running in a production environment and fix it himself. He can go into the code himself, find the reason and immediately write a fix.
The SRE engineer also writes utilities that help monitor the system. For example, if the existing utilities for tracing or collecting logs do not suit him for something, he can write his own or modify the existing ones.
SRE Engineer Uses Infrastructure As Code (IaC)
He tries not to change anything in the system manually but writes scripts and configuration files for everything. This reduces the likelihood of human error and makes it easy to replicate the settings on other servers and determine precisely how the system is configured.
SRE Engineer Participates In Architecture Design
He knows precisely what servers the company uses, how much capacity they have, how they are configured, and any technical limitations. He can immediately anticipate a potential problem and tell programmers about it with this knowledge.
Also, at the start of development, an SRE engineer can determine the criteria that programmers must fulfill. For example, an application must write logs to a specific location or integrate with a general monitoring system. Without meeting these requirements, the SRE engineer may not accept the application for support.
SRE Engineer Surrounds The System With Metrics
He assesses the system’s stability not by his feelings but by specific indicators. Two primary metrics are used internally by teams:
- SLO – Service Level Objectives. This is a convention about metrics and their allowed values: thresholds must not be exceeded. For example, the maximum system downtime should be no more than 20 hours per year; the average service response time should be no more than one second.
- SLI – Service Level Indicators. These are the metrics themselves, which are measured over time. For example, the system’s downtime for a year is 18 hours, and the average service response time is 0.8 seconds.
Based on these metrics, the SRE engineer evaluates the stability and performance of the application. If something goes beyond the approved metrics, he can veto the development of new features and ask the developers to focus on fixing the problems.
Are Traditional Administrators A Thing Of The Past And Replaced By SRE Engineers?
It is possible that yes. This is not a tribute to fashion, but a natural development of technologies and development processes:
- Nowadays, there is such a tendency in development practices that the code is quickly updated on the servers, sometimes hundreds of times a day. In this case, the system administrator is not enough since you need to eliminate possible code problems promptly.
- Cloud technologies are often used now; companies are moving away from on-premise infrastructure. If the infrastructure is in the cloud, much of what a traditional administrator did before falls on the shoulders of the cloud provider: setting up hardware, updating the OS and software, network health, creating backups, and so on.
We do not want to say that there are no traditional sysadmins left. They are still in demand, for example, in data centers. They are just in their place: they monitor the equipment and update the software. But in companies that develop their software, SRE engineers are replacing traditional administrators no matter how small.
Who Needs A Company: A System Administrator Or An SRE Engineer
Let’s summarize and understand the company’s needs: a system administrator or an SRE engineer.
- A system administrator is a specialist who monitors the system’s stability, knows it thoroughly, and can solve the infrastructure problem, but not the code.
- An SRE Engineer is a programmer with administration skills. His main goal is to work stably; he modifies the developers’ code and can create his utilities.