Quantcast
Channel: work – Surfing Complexity
Viewing all articles
Browse latest Browse all 9

The carefulness knob

$
0
0

A play in one act

Dramatis personae

  • EM, an engineering manager
  • TL, the tech lead for the team
  • X, an engineering manager from a different team

Scene 1: A meeting room in an office. The walls are adorned with whiteboards with boxes and arrows.

EM: So, do you think the team will be able to finish all of these features by end of the Q2?

TL: Well, it might be a bit tight, but I think it should be possible, depending on where we set the carefulness knob.

EM: What’s the carefulness knob?

TL: You know, the carefulness knob! This thing.

TL leans over and picks a small box off of the floor and places it on the table. The box has a knob on it with numerical markings.

EM: I’ve never seen that before. I have no idea what it is.

TL: As the team does development, we have to make decisions about how much effort to spend on testing, how closely to hew to explicitly documented processes, that sort of thing.

EM: Wait, aren’t you, like, careful all of the time? You’re responsible professionals, aren’t you?

TL: Well, we try our best to allocate our effort based on what we estimate the risk to be. I mean, we’re a lot more careful when we do a database migration than we do when we fix a typo in the readme file!

EM: So… um… how good are you at actually estimating risk? Wasn’t that incident that happened a few weeks ago related to a change that was considered a low risk at the time?

TL: I mean, we’re pretty good. But we’re definitely not perfect. It certainly happens that we misjudge the risk sometimes. I mean, in some sense, isn’t every incident in some sense a misjudgment of risk? How many times do we really say, “Hoo boy, this thing I’m doing is really risky, we’re probably going to have an incident!” Not many.

EM: OK, so let’s turn that carefulness knob up to the max, to make sure that the team is careful as possible. I don’t want any incidents!

LM: Sounds good to me! Of course, this means that we almost certainly won’t have these features done by the end of Q2, but I’m sure that the team will be happy to hear…

EM: What, why???

TL picks up a marker off of the table and walks up to the whiteboard. She draws an x axis and y-axis. She labels the x-axis “carefulness” and the y-axis “estimated completion time”.

TL: Here’s our starting point: the carefulness knob is currently set at 5, and we can properly hit end of Q2 if we keep it at this setting.

EM: What happens if we turn up the knob?

TL draws an exponential curve.

EM: Woah! That’s no good. Wait, if we turn the carefulness knob down, does that mean that we can go even faster?

TL: If we did that, we’d just be YOLO’ing our changes, not doing validation. Which means we’d increase the probability of incidents significantly, which end up taking a lot of time to deal with. I don’t think we’d actually end up delivering any faster if we chose to be less careful than we normally are.

EM: But won’t we also have more incidents at a carefulness setting of 5 than at higher carefulness settings?

TL: Yes, there’s definitely more of a risk that a change that we incorrectly assess as low risk ends up biting us at our default carefulness level. It’s a tradeoff we have to make.

EM: OK, let’s just leave the carefulness knob at the default setting.


Scene 2: An incident review meeting, two and a half months later.

X: We need to be more careful when we make these sorts of changes in the future!

Fin


Coda

It’s easy to forget that there is a fundamental tradeoff between how careful we can be and how much time it will take us to perform a task. This is known as the efficiency-thoroughness trade-off, or ETTO principle.

You’ve probably hit a situation where it’s particularly difficult to automate the test for something, and doing the manual testing is time-intensive, and you developed the feature and tested it, but then there was a small issue that you needed to resolve, and then do you go through all of the manual testing again? We make these sort of time tradeoffs in the small, they’re individual decisions, but they add up, and we’re always under schedule pressure to deliver.

As a result, we try our best to adapt to the perceived level of risk in our work. The Human and Organizational Performance folks are fond of the visual image of the black line versus the blue line to depict the difference between how the work is supposed to be done with how workers adapt to get their work done.

But sometimes these adaptations fail. And when this happens, inevitably someone says “we need to be more careful”. But imagine if you explicitly asked that person at the beginning of a project about where they wanted to set that carefulness knob, and they had to accept that increasing the setting would increase the schedule significantly. If an incident happened, you could then say to them, “well, clearly you set the carefulness knob too low at the beginning of this project”. Nobody wants to explicitly make the tradeoff between less careful and having a time estimate that’s seen as excessive. And so the tradeoff gets made implicitly. We adapt as best we can to the risk. And we do a pretty good job at that… most of the time.


Viewing all articles
Browse latest Browse all 9

Trending Articles