Getting the basics right
One recent experience helped me realize something important about the well-functioning of software engineering teams. There is significant complexity in these teams, both at the human and technical level. To rein in this complexity we can use a wide array of approaches. Among them, there is a set of activities that I would call the basics. If the basics are not covered, you can put on those teams all the 10x engineers you want, and it will still look like as if the team is barely treading water in terms of its productivity. When the basics are covered, it makes a world of difference.
Let's take any sport or competition we like – basketball, karate, rowing. Take a high-performance athlete or team in that sport. If we observe their training, without mistake we will notice they always start with warm up and practicing the basics. Some elite performers are absolutely obsessed about it. Kobe Bryant is credited with saying "I never ever get bored with the basics”. Anecdotal stories say that watching him train was in fact boring because of the prolonged time he used to practice the most basic routines.
The same applies to software engineering teams. For some reason, however, the importance of the basics was not obvious to me.
An example
This is an experience I had with a team over the last 16 months.
First, the setup
We allocated the most senior engineer in the team to working on the most important bleeding edge feature. Let's call this feature A. The team had two other medium-level engineers. They each focused on feature B and feature C, respectively. There were other engineers, which were acting in supporting roles spread evenly on either of threads A, B, or C.
In this setup, the team would have difficulties shipping and it would not be obvious at all why. Possible symptoms and hypotheses we might have in this case are the following: Are there too many threads of work open? Or is the team missing processes and rituals to protect their attention and focus? Maybe the codebase being full of landmines and technical debt is slowing everyone down?
Second, the diagnosis
None of the above symptoms proved to be a root cause. We did some rounds of experimentation with the team setup, and these plus other common problems did not prove to be the main cause slowing us down. Something else popped up as an interesting observation. We noticed that the senior engineer could not easily find time to mentor, coach, or unblock others. In terms of the basics, moreover, this engineer was rarely involved in that type of activities, and instead staying with the priority of shipping feature A.
This is a good moment to describe more specifically what do I mean by basics. Continuing the story through the lenses of the senior engineer, there's actually two buckets of basics:
- The kind of situations that interrupt the senior engineer. These can make the engineer context switch away from the critical path of delivering feature A and towards handling the interruption.
- The kind that other engineers should (and can) handle.
The basics can mean a wide array of tasks or activities, depending on the team and project. But they almost always include: testing frameworks fixes or additions, nightly test failures, triaging, handling incoming asks from users or colleagues, releases, security incidents, critical tooling or libraries management, write-ups (or pre- or post-mortem). With the exception of security incidents, these basics are part of bucket (2) – the kinds of activities falling on the shoulders of other engineers.
I'll add a couple more bits of detail for the diagnosis to highlight the difficulty of achieving a productive steady state in this setup.
- The other engineers are juggling multiple responsibilities, trying to prioritize feature delivery themselves, and are not senior. This can mean situations from bucket (2) will "leak." Partly due to missing context or knowledge, partly due to insufficient capacity. The leaks will interrupt the senior engineer, diverting their attention from feature A. In other words, what would normally be bucket (2) becomes bucket (1).
- This situation of frequent escalation will not only impact delivery of features A, but can also affect B and C and overall team morale, because:
- The fact that bucket (2) leaks will sap the motivation and confidence of the engineers who are by default trying to tackle the tasks in this bucket. Underlying this, we can see how most engineers in the team need to do frequent context switching.
It may seem like the key insight here is, "engineers are juggling multiple responsibilities." But this itself is just a symptom of something else. It's not lack of focus. It's the unrealistic expectation that a team can merely focus on the features and ignore the basics.
If Kobe Bryant was obsessed with the basics, I suspect the reason was that he wanted to build extreme confidence and bring the foundation of his practice as close as he could to perfection. Then he could allow himself to indulge and "focus on the features."
Third, the basics
Even though bucket (2) of activities seems like "keeping the light on," and therefore are not a big deal, this is often not the case if the team is not deliberate about them. The problem is more frequent or compounded if one of the following is true: the project is a production system with heterogenous user classes (as it can sometimes be seen in platform teams); the project has significant technical debt (which one does not?); the project is post-launch (in the difficult teenage years); and/or the project benefits from public attention that results in external contributions with many pull requests, improvement suggestions or bug reports that require careful handling (review, detailed testing, backporting).
To get a steady state that is productive, I learned that it is important for the team to acknowledge and include in planning not only the existence of buckets (1) and (2), but also the necessary training – the slow start phases – necessary until sufficiently many engineers can juggle the tasks in these buckets. In an ideal state, there are no leaks, no escalations, few context switches. Surprises such as security incidents cannot be eliminated in most projects, so the time allocated for this cannot be estimated or planned in advance; but the bus factor associated with handling these surprises can be improved.
To end, knowing the above is only half the battle. The other half remains: To identify the basics and stick with them throughout time, again and again, despite how menial they may seem.