Most modern software teams have no choice but to spend a lot of effort on their infrastructure, often spinning up entire teams to do so (and sometimes placing their best engineers on those teams).
How did cloud infrastructure get so complicated? Why is AWS such a hot mess of insane configurability and complexity? I ponder this a lot. As a manager of software teams, any minute not spent on building an awesome product kills me a little.
I have a few thoughts.
For a minute, back in 2008-2010, things looked different. Heroku had just been launched, and with a few clicks in a GUI and a
git push you had your code deployed and accessible.
Fast forward a decade, and now, we’ve got containers, Kubernetes, so-called “serverless” lambdas. Heroku is still around, but most people outgrow it pretty quickly.
Instead, we end up in the complex world of Amazon Web Services, Google Cloud Platform, Azure, etc with their hundreds of services, each with a tremendous amount of configuration and complexity.
Now, maybe you think I’m exaggerating, because you’ve already spend the time necessary to learn/navigate one of those platforms and/or you enjoy that type of work. But any effort spent not directly delivering value to users is wasted on some level. Infrastructure should be simple, it should just work. But it doesn’t. Even simple tasks like estimating what your infrastructure costs might be are a chore on some of these platforms. There are literal venture-backed startups whose business model is to use reinforcement learning to tweak your infrastructure configuration.
Instead of getting simpler, infrastructure got more complicated. Why?
I have a few theories. The first is creator’s bias. AWS, GCP, etc were all born at FAANG-ish companies, who are outliers in terms of the size of their company, their data, their code base. They build tools they would use themselves. But why would 99% of the world’s engineers need the same complex tooling that the biggest software companies in the world need?
It actually gets worse. Because those companies are viewed as (and often are) technologically ahead, the open source community models its software after them (or they contribute tools back to the open source community). Borg inspires Kubernetes (actually born at Google), Mesos, and Docker. MapReduce inspires the Hadoop ecosystem. And so on.
The second big reason is competitive dynamics. The big cloud providers are locked in an existential competitive battle for “the cloud”. This creates a dynamic where more is better. You’d think that competition in markets would lead to better products, but for enterprise or business-to-business products, often it just leads to an arms race for more features and more complexity. More services, more configurability, and ultimately, more complexity. This is how you stay ahead of the competition and how you win the big contracts. Meanwhile, the average developer is just gasping for air.
The third is Heroku’s decline. Heroku could have been so much more, but instead, they were acquired by Salesforce. Instantly, they lose their ability to innovate and become that classic story of “large company buys startup, startup product stagnates / only gets incremental improvements”. And in this case, Salesforce isn’t exactly a beacon of simplicity when it comes to product (it’s a great product, it’s just clunky and enterprisey… just less so than the competitors it displaced). There are a few bright spots like https://render.com/ which I’m keeping an eye on that could pick up where Heroku left off.
I hope this changes. I respect each of the big cloud platforms as businesses, and know that they have some of the smartest engineers working both on them directly, and building software on top of them. But this is all just tremendously inefficient.