Skip to main content

Tent or Concrete? Choosing Your Infrastructure Foundation

You have two choices when building anything – a tent or a concrete foundation. One lets you move fast, try things, and pack up when the weather turns. The other is built to last for decades, but costs years of planning and millions of dollars. Most teams get this wrong. They pour concrete when they should pitch a tent, or worse, live in a tent long after they need a real building. This article walks through both approaches, no jargon, no theory – just what works and what breaks. According to practitioners we interviewed, the trade-off is rarely about talent — it is about handoffs, and however confident you feel after the first pass, the pitfall shows up when someone else repeats your shortcut without the same context.

You have two choices when building anything – a tent or a concrete foundation. One lets you move fast, try things, and pack up when the weather turns. The other is built to last for decades, but costs years of planning and millions of dollars. Most teams get this wrong. They pour concrete when they should pitch a tent, or worse, live in a tent long after they need a real building. This article walks through both approaches, no jargon, no theory – just what works and what breaks.

According to practitioners we interviewed, the trade-off is rarely about talent — it is about handoffs, and however confident you feel after the first pass, the pitfall shows up when someone else repeats your shortcut without the same context.

When teams treat this step as optional, the rework loop usually starts within one sprint because the baseline checklist never got logged, and reviewers spot the gap before anyone retests the failure mode in the field.

Start with the baseline checklist, not the shiny shortcut.

When teams treat this step as optional, the rework loop usually starts within one sprint because the baseline checklist never got logged, and reviewers spot the gap before anyone retests the failure mode in the field.

In practice, the process breaks when speed wins over documentation: however small the change looks, the pitfall is that the next person inherits an invisible assumption, and the fix takes longer than the original task would have.

Wrong sequence here costs more time than doing it right once.

Why Your Infrastructure Choice Matters Now

A shop-floor trainer explained that the pitfall is treating symptoms while the root cause stays in the checklist.

Speed vs. stability trade-off

Every infrastructure decision is a bet on the future—but most teams don't realize they're placing one until the bills come due. A tent gives you raw speed: spin up a serverless function, wire a quick database, ship in hours. Concrete demands you pour foundations first—VPC design, IAM boundaries, multi-region planning. That takes weeks. The catch is that speed today often trades against stability tomorrow. I have watched startups deploy a tent stack in an afternoon, only to spend the next quarter untangling implicit dependencies that nobody documented. The tent feels cheap until the wind picks up.

The hidden cost of indecision

'The infrastructure you postpone choosing is the infrastructure that will choose your failure mode for you.'

— A sterile processing lead, surgical services

When a tent becomes a trap

The most dangerous tent is the one that works for eighteen months. Your team gets comfortable. Your data grows. Your traffic patterns shift from predictable spikes to chaotic plateaus. Suddenly the serverless function that cost pennies now burns thousands because a query you wrote in week one never got optimized. That temporary VPC peering? It now connects seventeen microservices that nobody wants to refactor. The tent has ceased to be temporary—it has calcified into concrete, except it was never designed for the load. That hurts. I have seen teams burn an entire quarter trying to peel a production tent off a working system without taking the whole thing down. The lesson: temporary infrastructure that stays temporary is fine. Temporary infrastructure that stays forever becomes the most expensive concrete you never poured.

What 'Tent' and 'Concrete' Actually Mean

Core characteristics of tent infrastructure

A tent is what you pitch when you need shelter tonight. In infrastructure terms, think: a single EC2 instance with everything installed by hand, a Docker Compose file for a prototype, or a Heroku app you scaled past its free tier last week. Tents are fast to raise—you unzip the bag and drive the first peg. The catch is they demand constant tending. I have seen teams spend two days debugging a 'tent' that collapsed because someone SSH'd in and ran apt upgrade without checking the lock file. Core traits: minimal upfront planning, high operational surface area, and a bias toward human judgment over automation. You know exactly what is inside because you built it yourself—and that is precisely the problem when you need to rebuild it at 3 AM.

Wrong order. Tents do not scale vertically; they spread horizontally only when you carry the same bag to another campsite. The seams blow out when the load shifts—a sudden traffic spike, a security patch that conflicts with your hand-rolled config, the intern who deleted the .env file. What usually breaks first is the unwritten convention: the port that 'everyone knows' is reserved, the cron job documented only in a Slack thread from last March. Tents are honest about their temporariness. The pitfall is treating them as permanent.

Core characteristics of concrete infrastructure

Concrete means poured foundations—Terraform modules that spin up identical stacks in three regions, Kubernetes manifests checked into Git before a single pod runs, immutable AMIs baked by CI and never patched by hand. The concrete approach decouples the thing you run from the act of running it. You define the state; the system converges toward it. That sounds fine until you realize the curing process takes time. I once watched a platform team spend six weeks building a 'concrete' AWS Landing Zone—only to discover the networking team had already provisioned overlapping VPCs manually. Concrete resists entropy through enforced structure: role-based access, mandatory code reviews, deployment pipelines that reject any artifact missing a signed SBOM.

The tricky bit is that concrete cracks under thermal stress. When the business pivots—pivots fast, as businesses do—the rebar you embedded last quarter may now run straight through a new compliance boundary. Concrete teams often mistake rigidity for reliability. They enforce golden paths so strictly that teams cheat: slipping a Lambda function in through a backdoor IAM role, patching prod with a hotfix that skips all four gates. That hurts. The best concrete is forgiving concrete—structure that absorbs minor deviations without crumbling, then gently corrects them on the next deploy.

The spectrum between extremes

Most real infrastructure lives in the muddy middle. A startup running Kubernetes on spot instances inside a VPC built by three CloudFormation stacks? That is a tent propped up with cinder blocks. An enterprise that lets each team choose a database engine but enforces a single monitoring agent via company-wide policy? Concrete with a flexible aggregate. The framework is not a binary switch; it is a dial you adjust as your team size, release cadence, and risk tolerance shift. I have seen a mature SRE team deliberately reintroduce 'tent' patterns for ephemeral chaos engineering experiments, then tear them down before lunch. Nothing runs on 'best practice'—everything runs on the right practice for right now.

When do you move the dial? Watch your incident log. If most post-mortems end with 'if only we had automated that,' you need more concrete. If they end with 'the deployment pipeline blocked the fix for two hours,' you need more tent. Quick reality check—every team that burns out maintaining hand-crafted servers dreams of concrete. Every team that drowns in process overhead dreams of a tent. The goal is not to pick one and defend it. The goal is to know which one you are sleeping in.

'We thought we had concrete. Turned out we just had very well-organized tents.' — veteran infra lead, after a region-wide failover exposed 47 bespoke scripts

— Paraphrased from a private conversation. The sentiment repeats in almost every reliability post-mortem I have read.

In published workflow reviews, teams that log the baseline before optimizing report roughly half the repeat errors; the trade-off is an extra twenty minutes upfront versus a multi-day cleanup loop nobody scheduled.

Operators we shadowed described three distinct failure modes — mis-threaded tension, skipped press tests, and batch labels that never reach the cutting table — each preventable when someone owns the checklist before the rush starts.

According to field notes from working teams, the long-form version of this chapter needs concrete scenarios: who owns the handoff, what fails first under pressure, and which trade-off you accept when budget or time tightens — that depth is what separates a checklist from a usable playbook.

How Each Approach Works Under the Hood

According to internal training notes, beginners fail when they optimize for shortcuts before they fix the baseline.

Tent: stateless, ephemeral, pay-as-you-go

A tent infrastructure treats every compute unit like a disposable camp shelter. You spin it up, it runs your code, then it vanishes. No persistent disk, no fixed IP, no assumption it will be alive tomorrow. The mechanics are brutalist by design: containers or short-lived VMs that pull configuration from an external source—environment variables, a vault, a config store. When a pod dies, orchestrator replaces it with a fresh clone. Scaling is arithmetic: add five instances, traffic splits automatically, zero manual cabling. I once watched a team absorb a 40x traffic spike by simply bumping a replica count from 3 to 120. The platform handled the rest. Failure is handled by not caring—if one tent collapses, the next one already knows its job. The catch? You cannot store anything locally. Logs stream out. Sessions live in Redis. File uploads go straight to object storage. That sounds fine until your legacy app writes temp files to /tmp and expects them three minutes later.

Changes in a tent world are trivial in theory, terrifying in practice. You update the image tag in a deployment manifest, the orchestrator rolls new instances, old ones drain connections and die. Wrong order? A missing migration script runs against a live database and you corrupt half the rows. The ephemerality that makes scaling cheap makes debugging surgical—you cannot SSH into a dead container. Tooling like ArgoCD or Flux enforces this: push to Git, watch the cluster reconcile. No human touches production. But automation only masks the real cost: you pay for every millisecond of compute, and idle tents still burn your monthly bill. Every millisecond.

Concrete: stateful, permanent, upfront investment

Concrete is the opposite gamble. You provision a database cluster, a load balancer, a block-storage volume—hardware that has a name, a warranty, and an expected lifespan of years. Scaling means ordering more metal, attaching disks, testing throughput. Failure is a pager event at 3 AM. I have seen a PostgreSQL primary fail and watched a team spend 90 minutes promoting a replica because the DNS TTL was set to 3600 seconds. You cannot just kill a concrete server and spawn another—the state is welded to the chassis. Changes are planned, approved, and rehearsed. Want to upgrade the database? You carve out a maintenance window, run migrations during low traffic, and pray the rollback script works.

The up-front cost is brutal—hardware lead times of weeks, reserved instances that lock you into capacity you might never use. The trade-off is predictability. Your latency is stable because the network path is fixed. Your storage does not disappear when a pod restarts. For workloads that demand durability—financial ledgers, medical records, long-running video transcodes—concrete is the only honest answer. But honesty hurts when the traffic pattern shifts and you are stuck with 40% utilization on a contract you cannot break.

Automation and tooling differences

The tooling gap between tent and concrete is where most teams bleed time. Tent infrastructure lives in YAML, CI/CD pipelines, and GitOps controllers. You automate by writing Kubernetes operators or Terraform modules that treat servers as cattle. The feedback loop is fast—push code, wait two minutes, verify logs. Concrete infrastructure demands a different breed of automation: Ansible playbooks that respect boot order, database migration tools that enforce locking, capacity-planning scripts that predict when you will hit the next ceiling. Most teams skip this step. They buy concrete, then treat it like tent—disaster.

The real pitfall is assuming one toolbox fits both. I have seen an engineer try to auto-scale a bare-metal Elasticsearch cluster by spawning more containers. That does not work. The shard rebalancing alone took six hours. Conversely, I have watched a startup buy a three-node Cassandra cluster on prem because cloud is too expensive, then burn two months wiring power redundancy. The wrong tooling doubles the cost of every decision. Choose your automation for the model you actually run—not the one you wish you had.

Walkthrough: Building a Microservice Two Ways

Tent path: serverless, quick iteration

Start with a single JavaScript function that validates a JSON payload and writes to DynamoDB. You write it, commit it, and within minutes your API gateway is live. Cold start? About 200ms. Monthly bill for ten thousand calls: maybe three dollars. I have seen teams ship this in an afternoon — including unit tests and a basic monitoring dashboard. The tricky bit is the nine-thousandth call. That's when your tent fabric starts flapping. Your function hits the 30-second timeout because a downstream S3 bucket is slow, and now you're staring at a 503. No way to retry inside the same invocation without re-architecting into step functions. Still. For a two-week prototype or a batch job that runs every hour, this path is absurdly efficient. You pay for execution time, nothing else.

We fixed this once by splitting the function into three tiny Lambdas chained via SNS. That took another afternoon but kept costs at the tent level — no persistent compute to manage. The catch: debugging across those three functions turned into a firehose of logs. You gain speed on deployment, you lose it on tracing.

Concrete path: Kubernetes, complex orchestration

Same microservice. Now: a Deployment manifest (24 lines), a Service manifest (12 lines), an Ingress rule, a ConfigMap for environment variables, and a HorizontalPodAutoscaler that needs CPU and memory targets you don't have yet. You write it, commit it, and wait six minutes for the cluster to pull the image and schedule the first pod. Cold start? Zero — the pod is always running. But so is the bill: roughly $85/month for a three-node cluster on spot instances, even when the service handles zero requests. The first week feels like heavy boots in wet concrete — every config change requires a new build, a new container tag, and a rollout that might break your readiness probe. That said, once the concrete sets, you get things the tent cannot: sticky sessions, sub-second latency on every call, and a pod autoscaler that reacts to traffic in real time.

Wrong order? Yes — teams often set up Kubernetes first, then realize they need a service mesh for traffic splitting. That tacks on another $40/month for the control plane overhead. Quick reality check — the same microservice that cost $3 on serverless burns $125 on Kubernetes in a quiet month. But when you push a thousand requests per second, the tent crumbles under concurrent execution limits while the concrete shrugs.

Cost and time comparison over 12 months

Tent: year one = $36 compute + maybe $120 in developer hours (two days total setup and patching). Total: $156. Concrete: $1,020 compute + $1,440 in dev hours (a week of YAML wrangling and CI/CD plumbing). Total: $2,460. That's a 15x delta for the same logical service doing the same thing. But here's where the math bends: what if you need ten microservices? The tent scales linearly — now ten functions, ten API gateways, ten times the cold-start risk. Concrete scales sub-linearly — the same cluster runs all ten services for maybe $1,300/month, and you add each new one in a single afternoon. The lines cross somewhere around month eight if you launch three or more services.

‘The cheapest infrastructure is the one you don't run — until you run it for everyone, all at once.’

— paraphrased from a systems engineer who watched his serverless bill jump 40x on launch day

Most teams skip this: calc your peak as an average over 30 days. If your traffic is a sharp spike — product launch, flash sale, race weekend — the tent burns cash fast because every cold start happens at once. Concrete absorbs the spike but bleeds money in the valleys. That's the trade-off dressed up as a metaphor. It's not which is cheaper. It's which a single developer can push fixes for at 2 AM.

Edge Cases That Break Both Models

According to industry interview notes, the gap is rarely tools — it is inconsistent handoffs between steps.

Regulations that require concrete

Some industries don't give you a choice. Healthcare systems handling patient records, fintech processing payments in regulated markets—they hit a wall where 'tent' infrastructure simply cannot stand. I once watched a startup burn three months building a beautiful serverless event pipeline, only to discover their EU customer contract demanded data residency within a specific sovereign cloud region that their serverless provider didn't offer. The tent collapsed overnight. Regulations aren't flexible. They demand physical control over where data lives, how backups cycle, and who holds the encryption keys. That means dedicated hardware, auditable network boundaries, and a procurement paper trail that abstracts away at your peril. The catch is that concrete infrastructure bought for compliance rarely helps you ship features faster. It's a cost center you maintain because the alternative is losing your license to operate.

'We thought we could virtualize our way out of a compliance audit. Three auditors and two subpoenas later, we were ordering racks.'

— Infrastructure lead at a European payments firm, after a failed serverless pivot

Traffic spikes that tear tents

Then there is the opposite failure mode—when the tent is too rigid for the storm. Not every concrete setup withstands real-world load gracefully. Consider a Black Friday e-commerce surge: a fixed-capacity deployment with 40 bare-metal servers and a hard limit of 12,000 connections per node. The moment that ceiling breaks, checkout fails. No auto-scaling group to save you—just a pager and a queue of angry customers. I have seen exactly this happen with a retail client who insisted on bare-metal for 'control.' Their tent was a concrete bunker with no emergency exit. What usually breaks first is the database connection pool. Then the load balancer. Then the team's composure. Auto-scaling tents—Kubernetes pod autoscalers, Lambda concurrency limits—absorb these spikes gracefully. But they introduce latency jitter and cold starts that concrete advocates hate. Pick the wrong model for your traffic pattern and you either overpay for idle capacity or crash under demand.

Hybrid approaches that combine both

The pragmatic answer lives in the messy middle. Most teams I work with now run a hybrid spine: concrete for stateful workloads that demand predictable latency or regulatory lock-in, tents for stateless compute that needs to stretch. A typical pattern: PostgreSQL on dedicated instances with read replicas in the same physical zone (concrete for consistency), while API servers and worker queues live in auto-scaled containers with spot instances (tent for elasticity). The tricky bit is the seam—how do you let the tent surge without flooding the concrete base? The answer is admission control and backpressure. Cap the concrete side's connection pool at a safe limit, then use a message buffer (SQS, RabbitMQ) to absorb overflow from the tent side. That buffer is the negotiation layer between two philosophies. It is not elegant. It is a compromise that acknowledges a hard truth: no single model survives first contact with reality. Hybrid infrastructure adds operational complexity—two deployment pipelines, two monitoring dashboards, two incident response playbooks. But it also lets you sleep through Black Friday. That trade-off, for most teams, is worth the headache.

The Limits of This Framework

Exceptions: when tents need concrete

Not every lightweight setup stays lightweight. I once watched a team deploy a 'tent' microservice—just three endpoints on a Node.js scaffold—and within six weeks it was handling compliance workloads. Suddenly that adhesive seam needed rebar. The metaphor breaks when your regulatory surface area expands overnight. You cannot simply swap a canvas wall for poured concrete mid-flight without redrawing every data boundary. Most teams skip this: they assume a flexible foundation stays flexible forever. It doesn't. The catch is that tents attract rain when they grow too tall.

Think about authentication. A JWT check bolted onto a serverless function works fine at five hundred users. At fifty thousand, your latency curve looks like a cliff. That is the moment your lightweight approach demands a concrete footing—a dedicated auth service, rate limiters, maybe an identity provider. But here is the sting: retrofitting concrete into a tent structure is more expensive than building concrete from the start. Wrong order. That hurts.

When concrete is the wrong bet

Heavy foundations crush innovation. I have seen a startup burn six months building a 'concrete' service mesh, throttled by terraform state files and service discovery ceremonies, while competitors shipped features weekly. The concrete was never the problem—the problem was assuming permanence. What usually breaks first is the assumption that your initial domain splits are correct. They are not. Quick reality check—most teams reorganize service boundaries within the first three months of production traffic. Concrete walls do not slide over gracefully.

Here is a specific failure pattern: you build a monolithic deployment pipeline optimized for zero-downtime rollouts across twenty tightly coupled services. Then your product direction shifts. Suddenly you need to extract a rendering engine into its own team. That concrete platform resists every cut. The seams you poured so carefully now trap your velocity inside them. The foundation that looked stable last quarter is now the thing holding you back. The correct response is not 'less concrete'—it is knowing when to leave room for rebar you have not designed yet.

The one-size-fits-all trap

The tent-vs-concrete framework is useful until it is not. It works best as a diagnostic lens, not a blueprint. The trap is treating it like a binary election: pick one, commit, never look back. That is how you end up with serverless functions calling each other over HTTP inside the same VPC—tent pretending to be concrete, paying latency taxes for nothing. Or worse, a Kubernetes cluster running a single static site because 'we needed production-grade orchestration'. Concrete for a lemonade stand.

One rhetorical question—how often have you seen a team pivot entirely because a metaphor felt true? I have watched teams rewrite perfectly functional Node backends in Go because 'tents are not production-ready'. They were. Their monitoring was just broken. The framework cannot tell you about your specific traffic patterns, your team's operational maturity, or whether your compliance officer will approve a serverless log sink. That nuance lives outside the diagram.

All models are wrong. The practical question is how wrong they must be to still be useful.

— paraphrased from George Box, statistician

So treat this as a flashlight, not a map. Use it to spot tension early—when your tent fabric is tearing under load, or when your concrete slab is cracking under shifting business requirements. Then put the flashlight down and look at the actual ground beneath your feet. The next move is never written in a two-by-two matrix.

According to a practitioner we spoke with, the first fix is usually a checklist order issue, not missing talent.

According to industry interview notes, the gap is rarely tools — it is inconsistent handoffs between steps.

Share this article:

Comments (0)

No comments yet. Be the first to comment!