This week’s musings on software are mostly about not building & running software. That’s probably a result of me working at a large-ish company, where we have the economic freedom to build & run plenty of systems, most of which are really not worth it. As a technology leader it’s more important to steer clear of the distractions and focus the team on what matters - the business.
Hit the subscribe button. Good stuff will follow. Trust me.
Anticipating Conway’s Law
Conway says that software systems resemble the structure of the organization that created it. Another way to look at it is: the organizational structure shapes the system’s structure. As a corollary most of your architectural & technical decisions are not that important - microservices architecture, kubernetes, the enterprise message bus, what have you. Organizational structure will overrule your design.
Yet another perspective is to think in terms of incentives. If team Alice uses a service provided by team Bob and team Bob has no incentive of providing a quality service, in terms of budget, headcount, bonuses or otherwise, then team Alice will have a hard time. No matter how much Alice complains or tries to fix the situation, team Bob will not waste their resources. Why should they? They have no incentive to do so. So far, so obvious. I’ve seen the above situation time and time again, often as a result of this conversation:
Dev #1: I built this cool thing for my team, look!
Dev #2: Wow that’s amazing. We should run this as a service, other teams will love it!
Dev #1: Yeah, you’re right, let’s do it! Mr. Manager! I wanna run this as a service
Mr. Manager: Oh that’s cool, let’s do it!
(thinking: I can show this off to other managers / at performance review!)
Ok, ok, projecting evil intentions onto Mr. Manager is maybe a tad unfair. Regardless the result is the same: a solution that good enough for a single team internally doesn’t cut it as a generalized service. Other teams start using it. Complaints start coming in. At first Dev #1 is enthusiastic about the interest and improves the service. But soon it gets annoying - the same questions over and over again, no time to write documentation (“it’s obvious/self-explanatory”), stress builds up. This is stylized, but based on experience. That “service” either dies quickly or the organization tries to salvage it by throwing resources at an ill-conceived project. Again the software structure reflects the organizational structure. Hello Mr. Conway, I didn’t see you there at first, glad to have you back.
I try to anticipate such misaligned-incentives situations. Recently I worked on a monolithic project with 2 teams, each delivering a different vertical slice of functionality. Those 2 parts of the application have a few well defined points of integration, mostly the left vertical (team A) integrating with the right vertical (team B).
Both teams work in different programming languages, have different levels process maturity, communication skills and development culture. Which is ok while everyone stays on their own turf. But for integrated features they have to collaborate. Worlds collide. We see the typical problems: confusion, misunderstandings, complaints, rework, etc. - all that things that you expect to hear when you ask someone to work outside of their comfort zone. But it kinda works - it’s slow, people complain - but eventually it works.
Now someone it team A has a brilliant idea:
Dev #1: “Team B should provide an SDK! All the behavior we need to for integration should be exposed as an SDK!”
Dev #2: “Right… and they can re-use the SDK on their end as well!”
Technically that’s a decent solution, all the associations we have with SDKs make integration easier: A stable public API, hidden implementation details, a versioning contract, well defined releases, (existing) documentation. If we had all of these life would be much easier! We found a technical solution for a human/social problem! Isn’t it great?!
Well…
Would team B do that? Both dev teams have their own set of business requirements and team B’s requirements don’t include “provide a software library to team A“. If team B puts in all the work of creating an SDK they don’t deliver business value. That is the opposite of what they are paid for. Team B has no incentive to provide that service to team A. So they won’t. Simple as that.
Conway’s law manifests again: software is inevitable shaped by the organizational structure and its incentives. There are no technical solutions to human/social problems. In this scenario anticipating Conway’s law means: if we want to have an SDK that supports both team A and team B then we need a new team that only delivers that service. We have to design the organization anticipating the software structure it will create.
Build & run less Software
The hubris of software developers is probably only paralleled by that of venture capitalists. I don’t know how many times I’ve (over-)heard this conversation:
Dev: We want to use [SaaS for some table stakes feature]
Someone else: How much is it? … WTF we have to pay [fraction of a developer’s salary] for this software?!
Dev: …we could just build it ourselves… or, look there’s an open source project, we can just run that ourselves.
Someone else: Let’s do it!
(Often that someone else is a manager, but not always).
What are “table stakes” features? Stuff that your system needs to run successfully, but nobody comes to you for that reason, stuff that nobody notices until it goes wrong. For example: Login. Nobody will start using your product because of your awesome login. But for many application a login is a necessity. So you use a SaaS (Amazon Cognito, Azure Active Directory, Auth0, …) to get best in class user accounts, security, role based access control, continuous updates, etc. You pay a hefty price for the capability to externalize all these concerns into the SaaS vendor. Essentially to forget about it. Sweet relief.
Or you try to be smart about it. Run your own identity provider, with keycloak or hyrda. We could make a rough estimation of the costs of running that — but once we include the human cost of developers keeping the software up to date and the quality assurance effort to confidently update an identity provider and the operations team running it, maintaining automation scripts, handling pager duty… those 1138 $ / month for Auth0 (“developer” plan for up to 50k active users) suddenly look like a bargain. That’s not even considering the opportunity cost - all the business opportunity we could have pursued if our team had built a differentiating feature for our product instead.
But that’s not the worst part of it. The biggest problem with an “oh we can just run it ourselves” approach is that your organization loses focus. Let’s say you run an email newsletter that generates revenue through advertisement. Your core business is creating & delivering quality content, acquiring new readers and advertisers. If you execute those effectively and cost-efficiently you probably have a good business. Running an identity provider adds nothing to your business. However it takes away focus on what matters for your business - if your team was 100% focused on your core business before, now maybe only 90% remain, while 10% are wasted on running an identity provider. Imagine what happens if your culture allows for the same phenomenon to happen again - your own analytics system? your own application performance monitoring system? your own logging system? your own email service? your own payment infrastructure? heck why not run your own private cloud? It sounds ridiculous - but it happens.
Building and/or running software systems that don’t differentiate your business or increase your competitive advantage is a waste of time. Or in capitalist terms: it’s a misallocation of capital. Only build software that contributes to your top line revenue or that protects your business.
Ok, enough of the high level business talk, let’s “get real” for a moment. Because this applies at a lower level all the same. Let’s say you write client side software that communicates with your backend via HTTP and you want caching. Don’t implement custom caching mechanisms. Learn how HTTP caching works, understand the Cache-Control directives and use them. The HTTP library you’re using already implements it and the caching semantics are thought out (well, more so than your ideas after thinking about it for half an hour). Use the decades work that went into those standards and implementations. Stand on the shoulders of giants. Or you can try to be smarter than a whole generation of programmers. But you’re probably not. HTTP isn’t subversion and you’re not Linus Torvalds (if you were you wouldn’t implement caching in some app).
So whenever I’m in that conversation I say:
Dev: We want to use Auth0 for login.
Someone else: How much is it? … WTF we have to pay 1138 $ per month?!
Me: Yes, but that’s ok. It’s cheaper than running it ourselves. But more importantly: we’re not in the business of running an identity provider - we are in the business of sending newsletters. Let’s not get distracted from our money maker.
Someone else: Hmmm… you know what, you’re right! Let’s pay for Auth0.
Dev: Great, thanks! I’ll get on with it.
Focus your organization on what matters to your business.
Build and run less software.
Hyperlinks
Kyle Simpson on the Frontmatter podcast, speaking about engineering excellence.
Engineering culture is not something that happens accidentally within a team. And a culture of excellence is not something that will happen simply because you tell everybody, 'We want you to be good, and we want you to constantly learn, and we want you to do these things.' Saying it from the top down isn't what works.
He’s damn right about that!
We exchange this knowledge in a very ephemeral way. [We exchange solutions through conversation] and nobody has any idea why that worked - and more importantly, there's no preservation of that conversation. That just became this ephemeral cult knowledge that one person transferred to another - and then it's just lost into the ether. [..]
The core problem in engineering culture is that we focus too much on technology, and not enough on humans and the exchange that occurs between humans.
He’s making some strong points, the conversation is worth a listen. I’m in favor of long form writing (see #4), whereas he also suggest efficient ways of capturing conversations.
Lenny in his newsletter issue 37 on product management templates. A timely bundle of inspiration for my upcoming week, when I will dissect my teams’ current product management documents and practices.
From engineering to product management, and back again - great read & great design.
I've also found my perspective is very different than before. While previously I would have seen my role as writing great code, now I see my role as leveraging technology to provide value to users. What's the difference? In the former, I'm doing a task while in the latter I'm driving outcomes. It feels much more empowering as a mental model for what I should do as an engineer. It's also helped me shed any hint of perfectionism I had before since writing "perfect" code or building the "perfect" system usually isn't relevant to providing value to our users.
That’s it for this week. Hit the subscribe button. Good stuff will follow. 👋
Nice post.
"Don’t implement custom caching mechanisms. Learn how HTTP caching works, understand the Cache-Control directives and use them." 🙌