Every company developing software needs developers running support shifts. It is a necessity, but it is also incredibly disruptive. I will focus here on office-hours support and discuss out-of-office support at another time.
When a company develops software, bugs are inevitable, and developers must be in the loop to change the software. Developers never work in the first line, and, typically, low-skilled teams cover the first line, trying to solve the easiest problems or collecting information in support tickets for the developers. Sometimes, there are intermediate teams with skilled engineers able to address more advanced problems between the first line and the developers. In an ideal world, a problem should reach the developers only as a last resort when a code change is required.
How a problem lands on a development team impacts the team’s productivity. There are essentially two strategies: “drop everything” and “planning”
One step back.
The job of a developer consists of picking a problem, getting “in the zone”, and remaining there until the problem is solved. People who do not have that kind of job do not understand that approaching those problems requires sorting ideas and that every distraction collapses the whole thing like a house of cards. When distractions happen, the ideas are still there but messed up.
As a strategy, “drop everything” is a distraction that kills the developers’ focus, and it is particularly impactful because it requires forgetting the job in progress and start working on something new. Developers typically take turns to do support because they hate context switching and consider being on support even more annoying than meetings. We all know how much some developers groan when they have to join a meeting!
“Planning” is a strategy consisting of putting a ticket on a to-do list without interrupting anyone. It protects productivity and focus, but the team may pick up the issue after several days, delaying the fix. It is not a strategy suitable for emergencies.
Choosing between “drop everything” and “planning” requires triaging, and the first line is typically too unskilled and understaffed to do it effectively. Besides that, most team leaders and engineering managers would not trust the first line’s judgement anyway. Effectively, “drop everything” is the default everywhere.
Another problem is the pressure from the customers. People always want their problems solved ASAP, and constantly ping the support channel with “Any news?” or “Can I have an update?”. Managing expectations and communication is of fundamental importance, but the first line is unable to do it since cannot determine priorities and the timeline of a fix. The first line can only apply SLAs and chase the teams accordingly. As a result, the team has to solve the issue, and, in parallel, manage all the noise from the customers, the first line, and sometimes from account managers and incident managers too. Needless to say, being on support is one reason for frustration and burnout for developers.
For managers, it is essential to monitor the input-output rate of tickets and how long they stay open. If tickets come in faster than how they are closed, there is a problem with the team’s capacity or skills. If the rhythm is fine, but tickets stay open for a long time, the release process may be slow, or the software maintainability could be problematic. These issues never go away by themselves and explode if the company scales up.
If you can measure only one additional thing, monitor the number of support tickets rejected by the team. Rejected tickets are engineers who broke their focus for no reason. Another important metric is how many tickets did not result in code changes. Developers are expensive resources and should not be involved in tickets that do not require coding.
Summarising. Measure the pressure on teams from support: when it is high, consider the risk of burnout and add layers of medium-skilled engineers between the first line and the developers.