Why Is Cloud Infrastructure So Complicated?

Most modern software teams have no choice but to spend a lot of effort on their infrastructure, often spinning up entire teams to do so (and sometimes placing their best engineers on those teams).

How did cloud infrastructure get so complicated? Why is AWS such a hot mess of insane configurability and complexity? I ponder this a lot. As a manager of software teams, any minute not spent on building an awesome product kills me a little.

I have a few thoughts.

For a minute, back in 2008-2010, things looked different. Heroku had just been launched, and with a few clicks in a GUI and a git push you had your code deployed and accessible.

Fast forward a decade, and now, we’ve got containers, Kubernetes, so-called “serverless” lambdas. Heroku is still around, but most people outgrow it pretty quickly.

Instead, we end up in the complex world of Amazon Web Services, Google Cloud Platform, Azure, etc with their hundreds of services, each with a tremendous amount of configuration and complexity.

Now, maybe you think I’m exaggerating, because you’ve already spend the time necessary to learn/navigate one of those platforms and/or you enjoy that type of work. But any effort spent not directly delivering value to users is wasted on some level. Infrastructure should be simple, it should just work. But it doesn’t. Even simple tasks like estimating what your infrastructure costs might be are a chore on some of these platforms. There are literal venture-backed startups whose business model is to use reinforcement learning to tweak your infrastructure configuration.

Instead of getting simpler, infrastructure got more complicated. Why?

I have a few theories. The first is creator’s bias. AWS, GCP, etc were all born at FAANG-ish companies, who are outliers in terms of the size of their company, their data, their code base. They build tools they would use themselves. But why would 99% of the world’s engineers need the same complex tooling that the biggest software companies in the world need?

It actually gets worse. Because those companies are viewed as (and often are) technologically ahead, the open source community models its software after them (or they contribute tools back to the open source community). Borg inspires Kubernetes (actually born at Google), Mesos, and Docker. MapReduce inspires the Hadoop ecosystem. And so on.

The second big reason is competitive dynamics. The big cloud providers are locked in an existential competitive battle for “the cloud”. This creates a dynamic where more is better. You’d think that competition in markets would lead to better products, but for enterprise or business-to-business products, often it just leads to an arms race for more features and more complexity. More services, more configurability, and ultimately, more complexity. This is how you stay ahead of the competition and how you win the big contracts. Meanwhile, the average developer is just gasping for air.

The third is Heroku’s decline. Heroku could have been so much more, but instead, they were acquired by Salesforce. Instantly, they lose their ability to innovate and become that classic story of “large company buys startup, startup product stagnates / only gets incremental improvements”. And in this case, Salesforce isn’t exactly a beacon of simplicity when it comes to product (it’s a great product, it’s just clunky and enterprisey… just less so than the competitors it displaced). There are a few bright spots like https://render.com/ which I’m keeping an eye on that could pick up where Heroku left off.

I hope this changes. I respect each of the big cloud platforms as businesses, and know that they have some of the smartest engineers working both on them directly, and building software on top of them. But this is all just tremendously inefficient.

Why All Engineers Must Understand Management: The View from Both Ladders

Something interesting has been happening as I’ve been trying to write more about engineering management.

When I wrote advice about micromanaging for managers, a few friends asked me about how to deal with their (micro)manager, so I wrote about how to handle your manager. The latter piece seemed to be a lot more useful. I also wrote about how managers should avoid cognitive biases, and most questions I got were about engineers who felt their managers were victims of those biases. You see the pattern. I write for managers, I get interest from ICs.

It could be that I‘m reaching the wrong audience, but I think there’s something deeper going on.

Dual Ladders

If you work in tech, you’ve heard of the “dual career ladder”. As engineers grow, they can choose to climb the “technical ladder”, or they can climb the “management ladder”. In fact, companies you interview with will often brag about it so much, you’d think they were the ones who invented it! In reality, like most things modern Silicon Valley thinks it invented, the dual ladder has been around for decades. I found mentions that imply it was first made explicit by post-World War II DuPont (which, if true, would put the concept alongside other notable DuPont inventions like nylon, Kevlar, and the concept of ROI — more on the nylon part, in particular, later).

What’s the purpose of the dual ladder?

  • Avoiding the Peter Principle. Engineering and management require different skill-sets, so if a technology company always “promotes” its best engineers to managers, it will end up converting great engineers to bad managers.
  • Attracting, retaining and motivating engineers. If engineers feel like their only path to growth is management, and aren’t interested in it, they might not join, or they might leave, or just stay and not be that motivated.

It’s commonly said that managers of technical teams should understand technology so that they can be effective. It helps them build credibility, coach their staff, and make better technical decisions. I’d like to propose a broader claim: Regardless of which ladder you choose, you should understand the skills needed to ascend the other. Just like managers should understand technical issues, engineers should understand management issues to be effective. In the rest of this piece, I’ll mostly make a case for why this is important, but in future posts, I’ll go deeper on a few selected topics. But before that, let’s do a quick jump into the past.


Remember when we said we’d get back to DuPont and nylon? It’s hard to imagine how the world would be today if DuPont had not been able to offer a brilliant young scientist an opportunity to spend his time doing inventive research on polymers.

Wallace Carothers was teaching Chemistry at Harvard when DuPont approached him and offered him a job as a lab director. He initially declined. It wasn’t that Carothers, who was described as both brilliant and melancholy, enjoyed his teaching job. He actually seemed to dislike teaching. But he just wanted to do research, and had been dealing with bouts of paralyzing depression. He was worried about the industrial pressures of working at DuPont. Nevertheless, DuPont persisted, offering double his salary at Harvard, and explaining that at DuPont, while he would direct a lab, he could spend most of his time inventing. He would write of his days at DuPont:

“Nobody asks any questions as to how I am spending my time or what my plans are. Apparently it is all up to me.”

Carothers’ work led to the invention of both neoprene and nylon, which revolutionized plastic and were used in everything from clothing to military equipment. In fact, nylon became known as the “fiber that won the war” due to its contribution to the Allied victory of World War II.

Despite his success in the lab, Carothers continued to suffer from depression, and tragically took his own life at the age of 41, years before his inventions really took off. But his research was world-changing. It doesn’t seem like DuPont had formalized the “dual ladder” system yet in the 1930’s, and technically he was “director” of his lab, but if DuPont hadn’t offered him money, prestige, and a chance to spend his time doing research and solving technical problems, the world might be a very different place.


Let’s take another example. Meet Arthur Fry, credited with invention of the Post-It Note.

Rumor has it he also invented witty scientist photos

In 1974, Fry was a researcher at 3M, on the technical side of their dual ladder. Frustrated with trying to have bookmarks in his hymnal stay put, he used a unique adhesive developed by Spencer Silver, a fellow chemist at 3M to stick, unstick, and re-stick small pieces of paper, inventing the Post-It Note.

He would go on to get promoted to “corporate researcher”, the highest rung on 3M’s technical ladder, and stay with the company for many more decades. If it wasn’t for the dual ladder, he later said, he’d “have left the company and gone into business for [himself] as an inventor, or joined some small company where they give you a piece of the action.” Instead, he got to spend his time at 3M “staring at the wall or getting [himself] educated” while still “making as much money as a vice president”.


In other words, companies have been finding ways to attract and retain technical talent by providing them with opportunities that don’t force them to become managers or administrators. Of course, the “binary” dual-ladder is a little restrictive, so a lot of companies have more of a “wide ladder” (where there’s a spectrum of management vs. technical contribution) or a “jungle gym” (where mobility between the different types of roles is allowed or encouraged).

Most companies have realized that managers need some understanding of the technical side of things, lest you end up with pointy-haired Dilbert-style managers.

Dilbert.com

On the other hand, many individual contributors (ICs) don’t necessarily feel the need to understand things on the management side. ICs might feel like management is irrelevant to them, or they might have a cynical view on management. Here’s why I think that’s a big mistake.

Understand management because you need those skills

The software companies of today are a far cry from the industrial R&D labs of DuPont and 3M. Most current software development is more of an engineering discipline than it is a science. Sure, managers and ICs have very different day-to-day activities: managers spend more of their time in meetings and talking to people, and ICs spend more focus-time on technical issues. But, most ICs are not working in isolation in some lab in search of a moment of brilliance that will change the world. Modern software development is a heavily collaborative, cross-functional discipline. The brilliance comes in increments and iterations.

As you progress in your career as an IC, you will need skills like the ability to communicate with people on your team and outside your team, the ability to influence others, the ability to coach and mentor others, and so on. These are all skills that are required of managers, but are still very useful for ICs. A manager without these skills will probably fail — an IC without them might just have a stunted career. I’m not saying you need to spend all your time in meetings, or that you need to learn to play corporate politics. But I’ve seen many ICs who are so turned off from anything management-related that they throw the baby out with the bathwater. So don’t shy away from skills that might seem “managerial” just because you’ve decided to grow on the technical ladder — rather, invest in them!

Understand management because it shapes your system

By system, I mean the people processes in your company or team. You’re likely to be on the receiving end of processes designed primarily by managers. Recruiting and hiring? You’ll go through that process. Performance management? You’ll go through that process, too. You’ll witness good management and bad management. You’ll have to manage up. You should understand those processes, why they are designed the way they are, their strengths and weaknesses. And every process has weaknesses.

Let’s zoom in a little on a topic like performance management. Early on in a start-up’s life, the primary goals are execution and survival. Many start-ups don’t have the time, luxury, or experience to think deeply about performance management too deeply. As an engineer, you work on whatever needs to get done, and if you don’t know how to do it, you try to learn it, and through that, you become a better, more knowledgable engineer. If you’re not doing what the company needs you to do, it’s pretty obvious to you and everyone around you. If you continue to not do what the company needs you to do, you’ll probably be a victim of the “fire fast” mantra. Time horizons are short and feedback is swift but chaotic.

As the company grows and matures, things start to change. The company hires (or transitions) managers and “People Ops” folk, who have the experience (or at least the time and mandate) to think about performance management of people on their teams. The company’s time horizon stretches, which makes planning for the future easier. Someone might propose some structure: a 360-degree performance review process run at some regular cadence (once or twice a year). But how does the company evaluate people? Someone proposes a rubric (or “ladder”, maybe even a dual one). How do you reward people? Someone proposes a compensation policy.

Manager A notices that Manager B is a little more “generous” when evaluating members on his team. She proposes a calibration process, to try and maintain standards and fairness across different teams. After every performance, managers hole up in conference rooms to discuss their proposed evaluations and try to maintain consistency.

These processes differ from company to company, depending on the company’s size, maturity, culture, and overall philosophy. But as an engineer, you will be on the receiving end of one of these systems, with its pro’s and con’s. Even if your primary goal isn’t to get promoted, it’s still useful to understand why your company designed (or stumbled into) that particular process. What dimensions are being evaluated and why? Do managers have autonomy to make their own decisions, or are decisions made by impartial committees?

So how can you learn more about all this?

One way to do this is to just ask your manager. Why do we do “calibrations” as part of performance management? Who decides who gets promoted? What are the dimensions you evaluate people against? Sometimes those processes are well-designed and work as intended, but sometimes they’re not. In either case, you should try to understand them.

I actually love it when someone on my team show interest in these things, and value the feedback I get when I explain the reasoning behind some of the processes we’ve built.

I’m going to be writing more about each of these issues, but I’m going to try to do it a little differently. Firstly, I’m going to try to gather multiple perspectives from others, since different companies approach these issues differently and have learned various things about what works and what doesn’t. But more importantly, when I write, from now on, I’m going to try to write with a perspective from both ladders.

“VC Brain”—Dissecting Investor Behavior

Why do so many Venture Capitalists act like assholes? They don’t respond to your email, even though they promised to. When they do, it’s incredibly terse and lacking in punctuation and capitalization (turns out, they only like one type of “capitalization”, the one that sits on tables). Sometimes they ghost you. If you pitch them, they might be very rude. I’ve an investor put his feet up on a table and start browsing his phone—straight out of Silicon Valley. Sometimes they don’t even show up for scheduled meetings.

In reality, some VCs are assholes. But many are not. I’ve been on both sides of the fundraising table, as both an angel investor and as an entrepreneur (usually I’m the entrepreneur, and so I identify more as such). I have many friends who are VCs of all shapes and sizes. Most of them are really nice, down-to-earth people. For a select few, that carries over into their work. But for the majority, interacting with them as VCs exposes you to their “VC Brain”.

Why do reasonably nice, down-to-earth people act like assholes, even if they are not? I think there are a few reasons. It’s not just that they are “busy” (that is definitely part of it, but when a VC and an entrepreneur interact, I’d argue the entrepreneur is probably just as busy as the VC). The very life and social interactions a VC is exposed to trigger some sort of change to their brain.

Balance of Power

VCs have entrepreneurs vying to get in front of them, to pitch them. Then the pitch itself is an attempt to get the VC to give them money. And if they do invest, they become a shareholder and possibly a board member, advising (but also supervising) the business. So VCs are usually in a favorable position when it comes to balance of power.

Of course, every VC occasionally gets exposed to a deal that’s so hot that the balance of power is inverted, and they are the ones that have to jump through hoops to get into the deal. And of course, VCs go through a period where they are fundraising themselves from Limited Partners. But that happens every few years, and the switch turns off again pretty quickly and the favorable position is restored.

Power affects your brain (some studies have indicated that power can even cause brain damage). So constant micro-interactions where you have the upper hand cause you to lose empathy, and to normalize the state where you have the upper hand and can act with perceived impunity.

Context Switching

Another thing that can alter your brain’s biology is frequent context switching. In fact, some studies have shown that the affect can be worse than that of marijuana.

A VCs typical day might include the following:

  • A constant barrage of emails, texts, and tweets.
  • Listening to pitches from entrepreneurs.
  • Doing diligence / conducting reference calls.
  • Corresponding with lawyers.
  • Talking to portfolio companies to get updates and/or help with whatever issue they’re dealing with.
  • Buying sweater vests.

This goes beyond just being busy. We’re all busy. Entrepreneurs are busy as hell, and honestly, we context switch too. But even when we context switch, we usually have a singular goal—making our business successful. VCs are switching between very different tasks.

Transactional Social Interactions

Venture Capital is a relationship business. VCs build relationships with other investors, entrepreneurs, and talent. Many of these relationships can be very deep and long-term.

But day-to-day, VCs also have a lot of more shallow, transactional interactions (see context switching above). This “social context switching” also takes a toll, because it normalizes these types of interactions. And so these interactions might seem shallow or abrupt to you, but they might be a lot more normal for a VC.

Procrastination

We all procrastinate, but we tend to do it more when a task makes us uncomfortable. When I’m on the investing side of things, I dread having to send rejection emails to entrepreneurs, because I know how it feels to get those. So it’s all too easy to put it off. You come up with narratives to justify it, of course. You tell yourself “I want to write a thoughtful email, but that will take time, so I’ll find time later”. The end result is procrastination.

More Empathy

You might read this and think that I’m a VC apologist. Or you might read it and feel like I’m over-generalizing VCs and being critical of them. I don’t know.

But I think a little empathy goes a long way, for both investors and entrepreneurs. Entrepreneurs can do better when interacting with VCs if they are more aware of the world VCs live in. In general, entrepreneurs (myself included) can also be very defensive and insecure about our “babies”, and we can get overly sensitive to any behavior that seems dismissive or disrespectful. We can build thicker skin by understanding that it’s not personal, and not always coming from a place of disrespect.

And, of course, VCs can also do right by entrepreneurs by understanding the types of behaviors they are prone to that might come across negatively. No matter what your day-to-day looks like, a little thoughtfulness and empathy will go a long way. There’s a bare minimum here that can go a long way. Respond to people in a timely manner, even if you’re not interested. Show up for meetings when you schedule them. Be aware that people have poured their blood, sweat, and tears into their businesses. Remind yourself, periodically, of what it’s like to be on the other side of the table.

Disagree and Commit And Prove Yourself Wrong

One management principle I’ve found really powerful is “disagree and commit”, but I’ve often found that it can be easily misapplied.

Let’s first define what the disagree and commit principle is. Here’s Jeff Bezos in Amazon’s 2016 Letter to Shareholders describing the idea:

Use the phrase “disagree and commit.” This phrase will save a lot of time. If you have conviction on a particular direction even though there’s no consensus, it’s helpful to say, “Look, I know we disagree on this but will you gamble with me on it? Disagree and commit?” By the time you’re at this point, no one can know the answer for sure, and you’ll probably get a quick yes. This isn’t one way. If you’re the boss, you should do this too. I disagree and commit all the time.

In short, if you work with passionate, intelligent people, you will be bound to disagree. Hopefully, you and your team are aligned on high-level things like strategy, values and vision, and so the disagreements are more on the tactical side.

In any case, the principle has a few benefits:

  • It surfaces disagreements explicitly, and surfaces them early. This helps your team be more thoughtful about decision-making.
  • It increases the speed of decision-making, because you have a mechanism to move forward without having to get to full consensus.
  • It avoids the type of “design by committee/consensus” traps that often result in low risk-taking.
  • It results in better execution. Once a decision is made, everyone is committed. At least, in theory.

In reality, I’ve that last point (the “commit” piece) difficult for teams. When teams try to simply sprinkle a little “disagree and commit” dust on their existing culture, they often find that it backfires. I call this “naive disagree and commit”.

The problem stems from the obvious fact that most people don’t like being wrong. Of course, the degree of “don’t like being wrong” varies from person to person and situation to situation. But from a social psychology perspective, that characteristic is biologically ingrained in each of us, and it’s called self-justification. When we’re confronted by evidence that we are wrong, our natural reaction isn’t to admit we’re wrong. It’s to double-down and engage in whatever mental gymnastics are necessary to avoid admitting we are wrong to ourselves and others.

It turns out that certain things can kick self-justification into overdrive. For instance, the more publicly and explicitly you hold an opinion, the more likely you are to cling to it. And this is where naive disagree and commit can cause problems.

Let’s say you and your team are debating a potential decision. “I don’t think we should denormalize this data model,” you say, “it might cause consistency issues in the future”. Your team-mates disagree, they think denormalizing will have huge performance improvements and consistency won’t be a big problem. “OK,” you say, “I disagree, but I’m happy to disagree and commit. Let’s denormalize.”

And now, because you’ve made your opinion public and explicit, you’ve just given birth to a tiny little self-justification demon, and that demon is burrowed into your brain—the part of your brain that likes to think you are a smart person who is generally right about things. He’s just sitting there, waiting for opportunities to say, “see, we were right all along”. Depending on how self-aware you are, you may or may not be aware of his existence at all.

People who have generally been successful and right start to ingrain that “being right” into their core identity. The very possibility of being wrong is a self-existential threat. This gets even worse on high-performing teams. For instance, another of Amazon’s leadership values is that leaders “Are Right, A Lot“. And so, if your naive formula is:

  • Hire smart people.
  • Ask them to be “right, a lot”.
  • Ask them to surface when they disagree with decisions.
  • Ask them to then commit to move forward with the decision they disagreed with.
  • Assume that things will just work.

.. your team won’t be set up for success.

The Solution

Does this mean that disagree and commit doesn’t work? Not quite. We don’t to throw the baby out with the bathwater. It’s naive disagree and commit that’s dangerous. To successfully implement disagree and commit, you need to go a step further.

First, you need to foster psychological safety. Research has found that psychological safety (how safe people feel in taking risks and expressing themselves) is correlated with high-performance on teams. People should feel safe expressing disagreement, without worrying too hard about whether they are wrong.

Second, prioritize getting to the right answer over being right. Getting to the right answer means healthy, rigorous debate. It means knowing when to disagree and commit, and when to draw a line in the sand. It means correcting your assumptions or opinions when they’re wrong.

Third, turn any disagreement energy into risk mitigation energy. When people disagree, it’s because they’re worried about some risk. So channel their (or your) energy into trying to mitigate that risk. “You don’t think denormalizing this data model is the right decision because you’re worried about consistency issues? Well, let’s figure out how we can denormalize this model while minimizing the risk of consistency issues.

Third, make it a cultural norm that people aren’t just expected to say they disagree and commit, they are expected to actually fully commit. You don’t want people to say they disagree and commit but still be dealing with their little “I can’t admit I was wrong” demons. You also don’t want people to say they disagree and commit, but then disengage and not put in 100%. What you really want is people who disagree, commit and try to prove themselves wrong.

Product/Culture Duality

At many startups, culture happens organically. It’s just built around the personalities and values of the founders and early team.

But anyone who has built a company before learns a pretty vital lesson: culture is important, and when something is that important you have to be intentional about it.

We wanted to build a company that would endure. We started noticing [these types of] companies have something in commonWe started to realize that we needed to have intention, culture needs to be designed.

Brian Chesky, Co-founder + CEO Airbnb

Another way of thinking about this is that, as a company, you are not just building a product. You’re building an engine, a machine that builds a product. That machine is composed of several pieces:

  • The team you hire.
  • The culture you put in place (deliberately or accidentally). This includes things like what is rewarded/punished, what incentives are in place, etc.
  • The processes you put in place (again, deliberately or accidentally). Formal and informal communication. How decisions are made.

I’d like to make three arguments here:

  • It’s helpful to think of both the “actual” product and the machine as products. Going forward, I’ll call the former the user product and the latter the company product.
  • Your user product and your company product both sprout out of your values.
  • There is a feedback loop between those two products. Your company product (aka your team/culture/process) and your user product both impact each other. They are intertwined.

Two Products

Let’s first discuss the dualism between your user product and your company product. Most people are familiar with the idea of finding product-market fit for your users (or, more generally, finding product-market-channel-model fit for your users). This will involve thinking through things like:

  • Identifying who your core customer is.
  • Understanding their core problems.
  • Offering a solution to their problems.
  • Positioning that solution as being differentiated, via some narrative/brand.
  • Communicating that narrative to your core customers.

In our tech-industry-lean-startup world, you do that iteratively as you learn and grow, but it’s basically the same core loop.

In reality, every company is actually running two of those loops whether they realize it or not. The obvious one is for their user product, but they’re running one for their company product too:

  • Identifying what they need to accomplish, and who will help them be successful.
  • Figuring out what they can offer them as an employer.
  • Positioning their company as a differentiated employer, via an employer brand.
  • Communicating that narrative to potential employees.

Many companies get this second loop wrong. For example, I commonly see early-stage startups trying to mimic the recruiting practices of larger companies. If you’re competing for the same candidates in the same way—against companies that can pay more and have a much more recognized brand—without some unique value proposition, you’re setting yourself up for failure. It’s basic supply and demand. So you have to find a way to differentiate yourself as an employer. My friend/co-author Aline Lerner has written about that here if this is something you’re interested in learning more about.

Your Values Define Your Two Products

OK, so if you’re building a company, you’re building two products and running two loops for each of those products. What shapes those two loops? Especially in your early days, your company’s user product and company product both grow out of your values—your beliefs about how the world is or how it should be.

Values are rarely right or wrong, but they can be substantially different. Let’s take some opposing values and see what effect they might have on a company’s culture or user product.

Let’s start with a company’s view on how decisions should be made:

Data is crucial. Everything can and should be measured, and decisions should be made based on that.vsNot everything can be measured, sometimes you have to use your judgment/instincts when things are ambiguous.

You can kind of imagine what product and culture companies with each of those values might build.

The duality isn’t perfect, though, and it can break down a little. Let’s take a company’s view on relationships/transactions:

The world is generally a zero-sum game. When two parties interact, when one wins, the other loses.vsThe world is not always zero-sum. If you think hard enough, you can come up with ways to align incentives so everyone wins.

Things can potentially diverge here. A company might have a different set of values for “in-group vs. out-group“. For example, a company might treat its employees with a lot of trust, but treat users/customers in a highly adversarial manner. This is actually a natural sociological outcome (as humans, we evolved in tribes/clans), but it might be hard to maintain as a company grows (tribes/clans tend to break down at scale).

Some companies are more interesting in that they seem to have the opposite view-point: employees may not treated very collaboratively or trusted, but when it comes to their users or customers, they are obsessive. I haven’t worked at either Amazon or Apple, but from the outside, they generally seem to be that type of company. It’s like an anti-tribe. Or maybe a cult, I don’t know.

The Two Loops are Inter-Twined

OK, so you have two loops, and they sprout out of your values. But they’re also intertwined and interdependent.

I think in the early days, your company product dominates and it influences the user product you build. You start without a product, without users/customers, and without revenue, so the product you build is shaped by your values, and those values are set by the early team.

But as your company grows, the type of people you attract and the value proposition you offer them is somewhat determined by your user product. Depending on what you build and how you build it, different types of people will want to come work with you. Ultimately, and I’ve written about this before, the revenue model, which is part of your user product, will dominate.

So the lessons are:

  • Be intentional about both the product you build for your users and the “product” you build for your team, especially in the early days.
  • Pick a revenue model with care, since that will probably be the dominant term over time.

Organizational Psychologist vs. Organizational Mechanic

When a team or company is not functioning as it should, two types of problem-solvers often emerge. The organizational psychologist tries to debug the culture. The organizational mechanic tries to debug the process.

The mechanic asks what meetings or what documentation is missing. Organizational mechanics love “reviews” (meetings that force decisions to be made). When it comes to communication, they look at the mechanics of what is said and when (how it is said is less relevant). Mechanics look at the structure and the connections. When all else fails, mechanics become surgeons. They “operate”. They pursue “reorgs” or just plain old lay-offs / firings.

Organizational psychologists are more about the human part of the equation. What incentives has the organization set up? How can those incentives be changed? What is rewarded and what is punished? When it comes to communication, they look at the how and the why.

Really good psychologists can dig in even deeper. They can particularly understand how the psychology of an organizations leaders amplifies and impacts the rest of the organization. Is the CEO a micromanager? Is the Head of HR/People generally a cynic who doesn’t trust people to do what’s right? Is the Head of Product uncomfortable with ambiguity or with quantitative analysis? What effects does that have down the chain-of-command?

Great leaders are able to put both the mechanics and the psychology together. They understand that teams are complex systems of humans. They understand that debugging is cyclic: the mechanics and the process affect the culture and the psychology, but the mechanics and the process are an output of an organizations culture and psychology. You need to look at both sides to solve most problems.

Using Systems Thinking to Understand Personal Finance

I always struggled with double-entry accounting, even after I got an MBA. I could do it and mostly memorized my way through a lot of the jargon, but it wasn’t until I took a systems view to accounting that I really understood the mechanics of how things worked. I figured I’d share some thoughts about that.

Stocks vs. Flows

A common theme in systems thinking is the idea of stock vs. flow. In accounting, everything looks like numbers, but in reality, some of those numbers are stocks, and some are flows, and understanding that distinction is critical.

So—more generally—what are stocks and flows? The classic illustrative example is a bathtub. The stock is the amount of water in the tub. Flows are water going into, or out of, the tub (via the drain and via the faucet). There are a few important relationships between stocks and flows:

  • Time: A stock is measured as an amount at a fixed point in time (10 gallons at 10:00AM). A flow is actually a different unit, it is either a flow (ie 1 gallon / minute) or a fixed amount over a period (5 gallons between 10:00AM and 11:00AM).
  • Space: A stock is measured for a system with a boundary (e.g. a bathtub). Anything within the boundary is part of that stock. Flow happens between system boundaries.

There is a simple law that holds true for all systems, which is that the change in stock of a system between two time periods is equal to the sum of the flow that happened within that period. This is obvious but stating it is important. If the tub had 10 gallons at 10:00AM and 15 gallons at 10:15AM, this means the total flow was +5 gallons in that period. It could be that the faucet added 10 gallons, 8 went down the drain, 1 evaporated, and you poured in 4 using a bucket. That sum has to be 5 (15 – 10).

How does this relate to personal finance? Well the first and obvious mistake is mixing stock and flow. Your cash balance, in a checking account, is a stock. An expense is a flow. They both look like numbers with a dollar sign, and they are related (like any stock and flow might be), but they are really different things.

Confusing stock and flow can often lead to simple mistakes or underspecified statements. Like “I have a $10K balance in my account in October”. That is an underspecified statement. Does it mean you maintained a minimum balance of $10K in October? Does it mean you had $10K at midnight of October 1st?

System Boundaries

One of the interesting pieces of systems thinking is that you can draw arbitrary boundaries around pieces of a system, and when you do, events can look very different. You can draw smaller boundaries to observe things more carefully, or larger boundaries for a broader view.

Let’s take this “system” with two cash accounts.

The green boundary encompasses only one account. That account has a balance, and it has some flows (cash flow). For the purposes of the green system, it doesn’t matter if a flow is an expense, a transfer, etc, all that matters is how big it was, whether it was in or out, and when it happened. You can call it a credit or a debit—that’s just convention.

A bank’s statement tells you the stocks and flows. For a period of time, the green “system” will have a starting balance, some flows (in and out), and an ending balance. That should all sum up.

In fact, if you know (or can predict) all the flows, you can derive the balance at any point in time. An account is a series of financial flows (magnitude, direction, datetime). The balance is always directly derivable from the cash flows. Simplistically, having that series of flows fully specifies the account. And, for the purposes of a single account, it doesn’t really matter where the money is flowing to (or from), just that it is flowing in (or out) of that single account.

Now instead, if you take the (broader) yellow system, the story is different. A transfer between two accounts isn’t an inflow or outflow, it’s neutral, and you could, if you didn’t care about individual account balances, simplify the system like this:

Let’s do a slightly more complicated example. At Monarch, we sometimes see users treat credit card payments as an expense, while others treat them as a transfer. This discrepancy is pretty easy to explain using system boundaries.

If you’re using the yellow boundary, that payment is a neutral transfer. That’s also the proper accounting solution, of course. Technically, that payment doesn’t affect your net worth, because it reduces an asset and a liability by the same amount. It is two opposing flows.

But if you’re looking at the cash account only, because you’re cash-oriented (maybe because you’re worried that if you run out of cash you won’t be able to pay your rent or buy food), you’d be looking at the green boundary. In which case, that credit card payment is an outflow, and it looks more like expense. So, in a sense, while the yellow boundary gives you the right solution technically, it’s hard to tell someone looking at the green boundary that their explanation is wrong.

And in fact, the allocation of money within accounts does impact the future flows, in ways that people underestimate… because things like interest create a feedback loop, where future flows depend on the balance, and interest compounds in ways that people tend to always underestimate but is extremely poweful.

Let’s do one more, where you make a payment towards your auto loan.

If you take the green boundary, then a payment to your car loan looks like an outflow of cash from your account, and it seems like it is decreasing your net worth (ie an expense). However, if you take a broader system view, then you are actually paying down a liability, so your cash account is losing cash, but part of that cash is going to decreasing your liability (ie a transfer), and that part is neutral to your net worth. Then there’s interest involved too, but that interest is actually expense. And payments may have tax implications as well (e.g. mortgage). And of course, I didn’t even factor in that your car is depreciating. Anyway, the point is, a flow or event can look very different depending on how you draw your boundaries. Not only can it get more complicated, it also just looks like a completely different event.

Double-Entry Accounting

Double-entry accounting is the main paradigm in use with modern accounting. It sound scary—this is where you run into words like “debits” and “credits”. You can memorize what each of those is, but I find it a lot easier to first intuitively understand what’s going. And the stock and flow system model can help with that.

Double-entry means that every transaction to/from an account has to have corresponding, opposing entry to/from a different account. It’s not entirely intuitive to most people, but we can easily map it to our model by just creating the right system of accounts and drawing the right boundaries. Basically, because a transaction is a flow, it has to flow from somewhere (one entry) to the other (the other, opposing entry).

For example, a transfer between two bank accounts is easy, because the opposing transactions are clear. -$100 out of your checking account, $100 into your savings.

What about an expense? Expenses look something like this:

When you buy groceries for $10, you can think of it as being a flow of -$10 out of your cash accounts, and a flow of +$10 into a “groceries account”. Except the groceries account isn’t a “real account”, it’s just created because it helps us “balance the books”. And we shouldn’t include it inside our system boundary for net worth (or cash or anything else). But if you want, you could create a system boundary that encompasses all expenses, and it would look like this:

Businesses all use double-entry accounting. When they buy something, if that thing is an asset, they’d decrease their cash account, create an account for the new asset (or put into an existing asset account). They would then depreciate the asset over time (by, you guessed it, creating a “depreciation account”). The asset account is included in the business’s “net worth” (balance statement)—it’s inside that system boundary. But depreciation is outside the boundary, and so it isn’t, and it becomes an expense. (Of course, if they buy something that doesn’t become an asset, it would an immediate expense).

You can really create accounts for anything you want, and if you wanted a complete system, you would do exactly that.

Anyway, stocks and flows and double-entry accounting are the same thing, assuming you’re labeling things correctly, and all flows are opposing. The only difference is rather than using diagrams like we use in stocks and flows, a double-entry might look like:

I find stocks and flows more intuitive than double-entry accounting. For one, anyone I know who has studied accounting initially struggles to understand a debit vs a credit. Also, it’s less visual, and makes it less clear what the “system boundaries” are.

Going Beyond Simple Systems

Since we’re building a personal finance platform, we spend a lot of time thinking about these things.

Let’s assume you’re a person or household, and you’re trying to understand your financial picture. If you look at what a lot of current budgeting apps do, they provide:

  • Historical views (spending): you can break things down by account, category, etc because you have the account-level data from banks. You can tell someone what they’ve spent money on. Sure, there might be some confusion around whether paying down a credit card is an expense or a transfer, but those are easily resolvable.
  • Current views (balances): again, you can break things down pretty well if you have the account-level data. You can track balances and things like net worth over time, assuming you have the data and are tracking all accounts.
  • Forward view (budgeting): this is where things start to fall apart, because this is where system boundaries start to matter a lot more. Most budgeting views take a really broad system boundary view, where expenses are outflows, income is inflow, and transfers are neutral (since they are within the system’s boundaries). Accounts are mostly lumped in together.

You can’t really do accurate long-term forecasting or give people solid financial advice based on that. You can’t model. Sure, it lets you build a simple product, and that can work for some people, but most people we talk to end up in spreadsheets because they want to model. Let’s look at some of the information you lose when you draw such a large system boundary for a forward-looking view of finances.

Balance-Dependent Flows (e.g. Interest)

In our simple two-cash-account example above, without any interest, the allocation of cash between accounts didn’t really matter. In reality, interest complicates things because it results in flows that depend on balances. Cash in a high-interest-rate savings account increases due to interest (and vice versa for a credit card balance incurring interest). So the allocation of balances within the broader boundary does impact the actual flows, and you can’t just lump everything together without losing some information that is crucial for forecasting (compounding interest is a huge factor in the longer term). Allocation does matter.

Gains / Losses

Interest actually isn’t terribly complicated because it is largely predictable. Gains and losses are a lot more volatile. A stock portfolio could appreciate or depreciate, which means traditional (physical) systems analogies break down. The water in your tub never appreciates. You can’t mark its value up or down. Water has to flow in or out, whether it’s through a faucet, drain, splashing, evaporation, or your toddler deciding to pee in it while you’re giving him a bath.

Now you for many types of assets, you can’t accurately forecast gains or losses. You can make assumptions based on history or your forecast (ie a certain growth rate for a portfolio of stocks), but it’s just a forecast. And that forecast will depend on allocation (are you investing in high-risk assets? Index funds? A mix?)

Liquidity / Hard Fungibility

Things that are “fungible” are things that can replace each other. Due to liquidity, not all dollars you own are “fungible”, and not all balances are interchangeable

A dollar in cash is not the same as a dollar in retirement savings in your 401K which is not the same as a dollar of value in a home you own. Various balances might have restrictions about when they can be liquidated (like desperately selling a house or car), might result in losses / expenses if liquidated at the wrong time (eg, paying taxes or penalties for early withdrawal from a retirement account), or might have transaction fees associated with them. A lot of this information is important for the purposes of forecasting and planning, but the broad system view just sums these up to your net worth. It’s nice to see how your net worth is trending, but having $10K in your retirement account today doesn’t mean that money is spendable right now.

Soft Fungibility

In addition to “hard fungibility”, there is also “soft fungibility” (I made this term up, I’m sure there’s a more formal one). Soft fungibility is mentally-imposed—it’s some emotional value you as a person assign to an allocation of money. Some people call that “earmarking”.

Sometimes, money with the same liquidity is treated differently, usually because it has earmarked that money for some purpose. For instance, let’s say someone (a parent) has two checking accounts: one out of which they spend regularly, and the other where they are keeping money to buy Christmas gifts for their children. Now let’s say there’s a shortfall in their spending account, will they dip into the Christmas gift account to cover it? Many people won’t. They might, instead, do something seemingly irrational, like borrowing at high-interest to cover their spending, just to avoid touching this “earmarked” money.

In fact, there is a lot of research into why some people have savings earning low interest while carrying debt with high interest, when the rational thing to do would be to use the savings to pay down the debt. The reasons are complicated, but usually it’s because they have earmarked their savings for some purpose, and a dollar that has been “earmarked” for something seems to be worth more than its intrinsic value.

Putting It Together

So based on the previous few points, you might assume that to really master finances, you need to worry about really granular, detailed data. You need a full system, with internal and external boundaries.

And to some extent, that is true. But that introduces complexity. And while you can use software and good product design to hide some of that complexity, at some point, it eventually bleeds through—and the system stops being purely mechanical, because there are humans in the mix.

Humans with goals and aspirations for the future, but also concerns and worries. Humans with baggage and a complex relationship with money. Humans that might throw their hands up and avoid problems that seem complex or stressful. So there’s a balance between creating a system that’s complete, but complicated, and one that’s tractable, but misses big parts of the picture.

And there are ways to navigate that tension, using good product design, basic psychology, and—spoiler alert—systems thinking. But that’s a topic for another post.


(After writing this, I came across this excellent piece by the awesome Martin Kleppmann, which is probably a lot more eloquent than anything I’d ever write. But I still figured it was worth sharing the systems model, since it is slightly different)

A Small Difference in Developer Productivity Can Amplify Over Time

Small differences in the productivity of software developers on a team can easily magnify themselves over time.

On many software teams, one engineer seems significantly faster than the others. Now, in some cases, it’s because that engineer is cutting corners left and right. They get stuff done, but they cause damage. They’re a tactical tornado.

Let’s assume your fastest engineer isn’t that type. They put out good code. It’s possible that they’re not actually as fast as you think. It turns out that even a marginal advantage for one engineer can translate into significant speed differences.

Let’s imagine a simple team of two engineers, Amy and Bob. All things equal, Amy would be 10% as fast as Bob. And by all things equal, I mean they’d produce the same quality of code, Amy would just produce 10% more. This slight speed advantage could be because she’s naturally faster, or because she joined the team earlier and hence has more context on the code.

That 10% difference can actually translate to a pretty large difference in output. Initially, Amy produces 10% more code. Now Bob has to increase the time he spends doing code reviews of Amy’s code, and just generally keeping up with Amy’s changes so he has context and can make the changes he needs to make. Which means he has less time to write code, which frees up Amy to be even more effective, which increases her output yet again.

Over time, this is amplified. Amy is spending most of her time writing code. Bob is spending most of his reviewing and keeping up. Amy’s slight advantage turns into a much larger one. Other colleagues start to view her as the go-to person, and wonder why Bob is falling behind.

A good engineering manager or senior engineer can detect when that’s happening and try to correct the balance. But often the team kind of settles into a mode where Amy is assumed to be better and more productive and everything is funneled to her.

Mission Tactics: When Under-Specifying is Good

On my product engineering teams, I under-specify product requirements by design. That is, the work that engineers are asked to do is always left a little ambiguous.


I used to have a very naive view of how militaries made decisions. You had a formal chain-of-command, and detailed instructions were passed down that chain and implemented, no-questions-asked. If you questioned those instructions (or god forbid, decided to deviate from them), you would be reprimanded: yelled at by your superior officer or court-martialed (whatever that means) or something, idk.

It turns out modern militaries don’t operate that way (or at least, they try not to). In fact, over a century ago, the Germans developed a style of military tactic called Auftragstaktik, or “mission-type tactics”. Here’s how one German officer, Von Schell, described it in a 1917 military book called Battle Leadership that is popular until this day (it’s recommended on US Marine Corps Commandant’s reading list):

In the German army we use what we term “mission tactics”; orders are not written out in the minutest detail, a mission is merely given the commander. How it shall be carried out is his problem.

The Germans (or Prussians at the time, again idk) apparently developed this tactic of avoiding detailed orders in response to being beaten by Napoleon. Napoleon’s troops couldn’t be superior to theirs, they concluded, so he had to have just managed his troops better. Their detailed orders led to rigid tactics, and in the (at the time) modern warfare, there was no room for detailed, rigid commands. So, to give officers on the ground—who had the best knowledge of reality on the ground—the ability to adapt, they are given less detailed orders.

It turns out there is another benefit to less-detailed orders—a psychological one. Here’s our Von Schell again: “There is also a strong psychological reason for these ‘mission tactics’. The commander… feels that he is responsible for what he does. Consequently, he will accomplish more because he will act in accordance with his own psychological individuality”.

What Von Schell is describing, essentially, is what we could today call empowerment, though that term doesn’t come into existence until several decades later (and then it proceeds to get mis-used to death in the corporate world).

Big Waterfalls, Small Waterfalls

Now, in the type of software I’m involved in, we don’t have to defeat Napoleon and luckily, if we screw up, no one dies. But I’ve found that the idea of under-specifying works really well with my software teams.

In a traditional software development process, a product manager (or at many startups, the CEO who is the de-facto product manager) sets a high-level vision for what needs to get implemented. That product manager then works with designer(s) to translate that vision into more granular artifacts, like a product requirements document (PRD) and/or some visuals (mockups, etc). There might be user stories involved. Eventually, these get translated into “requirements” that are then given to the engineering team to build, maybe in the form of tasks in a tool like JIRA, Asana, etc.

You might recognize this as the “waterfall” method of software development, and it is the equivalent to my naive view of how militaries operate. It is rigid and instructions flow in one direction. The software industry recognized a couple decades ago, and movements like Agile were born. The spirit of Agile was to break the rigidity of the process and make things more light-weight and, well, agile.

But when not implemented thoughtfully, all Agile does is break the process down into smaller waterfalls. This is a definite improvement—smaller cycles and feedback loops are better than larger ones. But it still leaves a lot of room for improvement.

This is where under-specification comes in.

A Chance to Exercise Judgment

To summarize the benefits of “mission tactics” again, there were two benefits. The first is a tactical one: the people making smaller, on-the-ground decisions can make them faster and better. The second is a psychological one: mission tactics create a sense of ownership, which makes people more engaged and invested in the outcome.

This carries over into the software world. No matter how hard you try to specify everything, there will almost always be uncertainty. There will be edge cases you didn’t anticipate. Sometimes, it will become clear that an interaction or feature won’t work as designed only as it’s being built. And finally, things may end up being harder or easier to build than anticipated, which changes the calculus about which things are even worth building in the first place.

Any software developer working on a product needs to be constantly making micro-decisions around what they’re building. When something is unclear or doesn’t make sense, do they:

  • build it as designed?
  • halt, and flag it to someone on the product/design team but wait to get an answer?
  • improvise with something that makes more sense?
  • do some combination of the above?

These micro-decisions require an understanding of what they are building and why. They require an understanding of the users who will use the product, and the problem space. And, they require an understanding of the scope and likelihood of possible future changes. They require thinking holistically and strategically. But most of all, they require good judgment.

Good judgment is never engaged when detailed instructions are given. Good judgment is engaged and improved when there is room for it to grow.

Does this mean that specifications should be entirely ambiguous? Of course not. Without enough direction, it’s hard to build anything at all. A good overview of what needs to be built and why, along with some user stories and some visuals can help an engineer understand the “intent” of what needs to be built. Good visuals are especially important, because they remove the burden of thinking about how something will look, and let the focus be on how it will behave and how it will be built.

Does this work perfectly? Of course not, either. Mistakes will be made, details will be missed. But, it will be the details that are missed, not the big picture. And over time, as the team builds judgment and understanding, those missed details will tend to shrink.

Over-specifying

Unfortunately, I often see teams go the other direction: over-specifying. This can be an especially vicious loop to get stuck in, because it gets worse over time. Engineering tasks are heavily specified, so engineers don’t engage their judgment. They turn into literal “code monkeys”. They make obvious mistakes. The response to that? More specification the next time around to try and remove any opportunity for error. Which leads to less judgment, and more mindless coding. You get the point.

Does this always work?

It also turns out that many modern militaries adopted a variation of “mission tactics”, known as Mission Command. Mission Command swings the pendulum back a little from mission tactics. Instead of just communicating intent without details, superior officers exercise judgment in terms of deciding when to use more detailed instructions and control, and when to delegate. Officers are told that Mission Command requires “shared understanding, mutual trust, and high competence”. The literal chart given to officers to help decide how much detail to use in control looks like this:

https://fas.org/irp/doddir/army/adp6_0.pdf

You can map that to software development as well. For under-specification to work well:

  • There needs to be a lot of ambiguity and lack of predictability.
  • The team is competent and experienced.
  • There is a high-level of trust and shared purpose.

These conditions are a lot easier to achieve at earlier stage start-ups, and it gets harder as a team grows to maintain these factors. Companies try to eliminate ambiguity and predictability as they grow. It gets harder and harder to maintain the same bar for talent (assuming that bar was there to begin with). And, of course, trust / intimacy starts to break down. And that typically tends to be when teams start to over-specify again.

Good Architecture: Data Model vs. Data Flow

Most architectural mistakes I’ve seen in software stem from a mistake either in the domain model or the data flow. Understanding what each of those two things is, how to do them both well, and how to balance the tensions between them is an essential skill every developer should invest in.

Let’s use an example to talk to expand on this.

Financial Transactions

Let’s imagine we’re building a personal finance product. A user has a set of financial transactions (Transaction). Each transaction has a dollar amount, happens on a date, in a financial account (Account) and is labeled with a category (Category).

Further, we know a few other things:

  • The balance of an account at any point in time is always the sum of all transactions up and until that time.
  • Users may want to add, remove or edit transactions at any point.
  • Users will want to see the balance of their accounts at any point in time, and how the balances change over time.
  • Users will want to slice and dice their cash flow, too. They will want to see the sum of their transaction amounts between certain dates, for certain categories, and for certain accounts, and they may want to group that data too (for instance, a user might want to see how much they’ve spent by category, each month over the past 12 months).

Sounds pretty straightforward so far. But let’s dig in.

Domain-Driven Design

When it comes to modeling your domain, the seminal idea is Domain-Driven Design (DDD). The fundamental idea behind DDD is to map entities in your software to entities in your “business domain”. Parts of this process are pretty natural. For instance, we’ve already started doing that above (entities for a Transaction, an Account, and a Category all naturally fell out of just describing what users want to do).

But domain-driven design doesn’t stop there. It requires technical experts and “domain experts” to constantly iterate on that model, refining their shared model and then updating the software representation of that code. This can happen naturally as you evolve your product and use-cases, but often, it’s a good idea to trigger it up front through in-depth discussion and questioning of how the model could accommodate future use-cases.

For example, here are some questions that might help us refine our model, and some possible answers.

For starters, here’s one: what if an account has a starting balance? How do we represent that? Does that violate our initial assumption that an account’s balance is the sum of all its transactions? The answer depends on how you model your domain.

For some products, it might make sense to add a starting_balance field to your Account entity. A more “pure” approach might be to keep the initial invariant (that an accounts balance is the sum of all transactions), but refine things so that starting balances are actually a special type of Transaction (with some invariants around that—for instance, an Account can only have one starting balance Transaction, and it must be on the date the Account is opened). But this is good, we’re domain-modeling now! We’re rethinking some of our assumptions, and that’s pushing us to think more deeply about our understanding of the model.

Here’s a trickier one: what if a transaction occurs between two accounts? In our current model, we’d actually have two transactions (one leaving the first account, and one entering the second one). That might be fine in many applications, but if you’re an accounting product, you might realize that this model can introduce some inconsistencies. What if one transaction is missing? In the real world, money flows from some place to another. Maybe every transaction requires two accounts (from_account and to_account). A domain expert on your team would now point out that you’re brushing up against double-entry accounting. We don’t need to go down that route, but you can see how a question prompted us to revisit our understanding of the model.

This is just an overview of domain-driven design. You can read a lot more about it on Wikipedia, or by reading Eric Evans’ classic book, but at a high level, in domain-driven design you create a “bounded context” for your domain model, iterate on your understanding of the domain model, come up with a “ubiquitous language” to describe that model, and constantly keep your software entities in sync with that domain model and language.

Data Flow Design

Data flow design takes a bit of a different approach. Instead of focusing on the entities, you focus on the “data”. Now, you might argue that data and entities are the same, or should be the same, and in an ideal world they would be, but software has real-world limitations set by the technology that enables it. Things like locality, speed, and consistency start to rear their heads.

Let’s apply that to our example above. Again, we had already naturally started doing some data flow design in defining our original problem: all of the “users will want to…” statements are about data flow. For example, let’s consider the balances question: “users will want to see the balance of their accounts at any point in time, and how the balances change over time.”

Our model dictates that balances are derived from transactions. How do we respond to a query like “what was the balance every day over the past year for a user’s account?” The simplest way could be to always derive, on-the-fly, the balances of an account by walking through all its transactions. That way, if anything in the underlying transactions change, the balances are always consistent. But this is where technical limitations start to hit us. Can we do that calculation fast enough when we get the query? What if the query is something like “out of the 10 million accounts in the system, show me all accounts for which the balance exceeded $10,000 on any day in the past 5 years”?

You probably already have solutions simmering in your head. Caching for faster queries. Updating balances whenever transactions change. Some additional data store that makes it easy/fast to index and execute queries like that. But you’re no longer just thinking about the domain model. You’re thinking about the data.

To do data flow design well, you need to think through a few dimensions. The first is read vs. write data paths. Clearly, when transactions are changed, balances need to change to reflect that. Should that happen on write, when a transaction is updated? Or should it happen on read? (should we lazily only do the work when we know we need it). Or should we do it asynchronously in between so that we can have fast reads and fast writes, while sacrificing some consistency.

Next, you need to think through read vs. write patterns. How frequent are writes? How frequent are reads? Are they varied or skewed? Depending on the answer, you might be OK doing more work on write, or you might be OK doing more work on read. Or you might introduce something like caching if a lot of reads are similar. Or, you might go full on Command Query Responsibility Segregation.

You’ll also need to think through your consistency requirements. We’ve already hinted at that above, but maybe you can offload some work if you’re OK with data you read being a little out of sync with the data you write. You can use asynchronous or batching models.

Finally, there’s a question around where invariants should live. In modeling the domain, you usually end up with some “invariant”: things that should always be true. These invariants work like constraints, giving you assumptions you can trust throughout the life cycle of any entity or the data representing it (like, the balance of an account is the sum of all its transactions, or an account can only have one starting balance transaction). But when thinking about data flow, you need to worry about how to check and enforce those constraints. Should that happen in the application layer? In the data storage layer?

A full exploration of what this means in practice is beyond our scope here, but the main point is that in addition to our nice, clean domain model, we also have all this extra logic that is not part of our domain. It’s just a function of technological limitations. That’s the tension.

(The best resource I’ve found on thinking about data flow, especially at scale, is Martin Kleppmann’s Designing Data-Driven Applications)

The Balance

I’ve found that most software engineers start their careers with a bias either towards data model or data flow. As two extremes, consider:

  • The data model purist: Spends an exorbitant amount of time thinking through and modeling the domain before writing a line of code. Draws a lot of diagrams, possibly of database schemas. Gets really frustrated at implementation time because the data flow reality sets in and they realize they will need to “corrupt” their model.
  • The data pragmatist: Thinks through the end-to-end data flow really well, quickly writes code and spins up multiple data services. Was big on “polyglot persistence” when that was a word. Has figured out how to parallelize / partition things before figuring out what those things are.

Many people start off as one of those two, overlooking the other side of the equation, then learn through experience that you have to think about both from the get go.

I find that to strike a good balance, it’s best to do design in an iterative fashion. First, of course, you need a really solid understanding of the underlying problem you’re trying to solve and why it needs to be solved. Then, you take turns thinking through the domain model, and the data flow.

  • Write or sketch out a quick data model.
  • Map it to the problem space: does it represent the domain well? Does it support what the product needs to do now and do later? Fiddle the requirements a little bit. Does the model hold up?
  • Now map the data flow. Look at the UI and what data needs to be shown. Think about the interactions that need to happen and what data needs to be changed. Now think about how that would work at a much larger scale.

Rinse and repeat. Pull in some colleagues, get feedback, and continue repeating again. And even when you start writing code, you still keep iterating.

You should start with a slight bias towards getting the data model right, and worry more about data flow as you gain confidence in your data model, and as you start to hit the performance problems that only show up once you have enough scale and once your product is complex enough. But you always keep both concepts (the data model, and the data flow design) top of mind as you’re working.