And now let’s decode:
@solarce is Brandon Burton who is at Travis-CI.
The first link is from @allspaw, John Allspaw, Etsy CTO who talks a lot about devops and culture. And here’s the article from 2012, and although what it says on the tin is about being a “senior” engineer, he switches quickly to talking about what it is to be a “mature” engineer. It’s really long and although geared at programmer-type engineers, is relevant to all technical folks. Here’s a sample.
They know that they work within a spectrum of ideal and non-ideal, and are OK with that. They are comfortable with it because they strive to make the ideal and non-ideal in a design explicit. Later on in the lifecycle of a design, when the original design is not scaling anymore or needs to be replaced or rewritten, they can look back not with a perspective of how short-sighted those earlier decisions were, but instead say “yep, we made it this far with it and knew we’d have to extend or change it at some point. Looks like that time is now, let’s get to work!” instead of responding with a cranky-pants, passive-aggressive Hindsight Bias-filled remark with counterfactuals (e.g.. “those idiots didn’t do it right the first time!”, “they cut corners!”, “I TOLD them this wouldn’t work!”)
The second link is from Twitter’s Jeff Hodges from 2013. About 2/3 of his advice about distributed systems are true for systems in general and infrastructure in general.
Use percentiles, not averages. Percentiles (50th, 99th, 99.9th, 99.99th) are more accurate and informative than averages in the vast majority of distributed systems. Using a mean assumes that the metric under evaluation follows a bell curve but, in practice, this describes very few metrics an engineer cares about. “Average latency” is a commonly reported metric, but I’ve never once seen a distributed system whose latency followed a bell curve. If the metric doesn’t follow a bell curve, the average is meaningless and leads to incorrect decisions and understanding. Avoid the trap by talking in percentiles. Default to percentiles, and you’ll better understand how users really see your system.
And the third link is from Camile Fournier, former CTO of Rent the Runway and Apache Zookeeper contributor. Notes on Startup Engineering Management for Young Bloods, inspired by the previous article. Ostensibly about startup engineering, most of it applies to all organizations.
“The team is moving too slow” is the hardest problem you’ll ever debug. When your CEO asks you why nothing is getting done, why we can’t do everything on their laundry list, why the project they expected would launch next quarter is still two quarters out, accurately answering that question is incredibly difficult. Once you are a level or two removed from the actual people doing the work, your previous debugging process of “going to every meeting, watching the work being committed, understanding every detail of the project” does not scale. You have to figure this out from a distance.