a reckoning for tech by the humans that work with it

Advice from engineers that’s good for all of us


For this post, let’s just take the reading list from this tweet:
Screen Shot 2015-11-04 at 5.54.06 PM

And now let’s decode:

@solarce is Brandon Burton who is at Travis-CI.

The first link is from @allspaw, John Allspaw, Etsy CTO who talks a lot about devops and culture. And here’s the article from 2012, and although what it says on the tin is about being a “senior” engineer, he switches quickly to talking about what it is to be a “mature” engineer. It’s really long and although geared at programmer-type engineers, is relevant to all technical folks. Here’s a sample.

On Being A Senior Engineer:

They know that they work within a spectrum of ideal and non-ideal, and are OK with that. They are comfortable with it because they strive to make the ideal and non-ideal in a design explicit. Later on in the lifecycle of a design, when the original design is not scaling anymore or needs to be replaced or rewritten, they can look back not with a perspective of how short-sighted those earlier decisions were, but instead say “yep, we made it this far with it and knew we’d have to extend or change it at some point. Looks like that time is now, let’s get to work!” instead of responding with a cranky-pants, passive-aggressive Hindsight Bias-filled remark with counterfactuals (e.g.. “those idiots didn’t do it right the first time!”, “they cut corners!”, “I TOLD them this wouldn’t work!”)

The second link is from Twitter’s Jeff Hodges from 2013. About 2/3 of his advice about distributed systems are true for systems in general and infrastructure in general.

Notes on Distributed Systems for Young Bloods

Use percentiles, not averages. Percentiles (50th, 99th, 99.9th, 99.99th) are more accurate and informative than averages in the vast majority of distributed systems. Using a mean assumes that the metric under evaluation follows a bell curve but, in practice, this describes very few metrics an engineer cares about. “Average latency” is a commonly reported metric, but I’ve never once seen a distributed system whose latency followed a bell curve. If the metric doesn’t follow a bell curve, the average is meaningless and leads to incorrect decisions and understanding. Avoid the trap by talking in percentiles. Default to percentiles, and you’ll better understand how users really see your system.

And the third link is from Camile Fournier, former CTO of Rent the Runway and Apache Zookeeper contributor. Notes on Startup Engineering Management for Young Bloods, inspired by the previous article. Ostensibly about startup engineering, most of it applies to all organizations.

“The team is moving too slow” is the hardest problem you’ll ever debug. When your CEO asks you why nothing is getting done, why we can’t do everything on their laundry list, why the project they expected would launch next quarter is still two quarters out, accurately answering that question is incredibly difficult. Once you are a level or two removed from the actual people doing the work, your previous debugging process of “going to every meeting, watching the work being committed, understanding every detail of the project” does not scale. You have to figure this out from a distance.



  1. Dave Therrien on November 5, 2015 at 9:27 am

    Hey John, I just heard about this site on Greg K’s Tech podcast. Sounds interesting. I am the founder and CTO of ExaGrid. We are in the purpose-build backup appliance market.

  2. John Troyer on November 5, 2015 at 8:52 pm

    Hey Dave – Welcome! Few free to stick around and subscribe to the newsletter.

  3. Kevin Keeney on November 11, 2015 at 5:48 pm

    I find that providing a vision for each “sprint” is key. This is the lofty place we want to be at the end of this effort. Then celebrate, build up, encourage individually or as a team as they achieve milestones. I love having a developer explain to me the problem the team is stuck on, I am quickly humbled and proud to be working with them.