Beyond the Console: Structuring Terraform Modules for Multi-Environment Enterprises

July 15, 2021

Terraform 1.0 dropped last month, and that's a good prompt to revisit how we organize infrastructure code. The shops still copy-pasting main.tf between dev/ and prod/ folders are going to feel a lot of pain over the next year as their environments drift apart. This is what's been working for me on multi-environment GCP projects.

The copy-paste trap

The pattern I see most often: a team builds out dev/ first, things work, and when prod is needed they duplicate the folder and edit a handful of values. Six months later the two environments share a name and not much else. A change tested in dev fails in prod because some resource was renamed, some IAM binding was added by hand, or a provider version drifted.

This is the monolithic-folder trap. It scales fine until you have a second environment, and then it doesn't.

Modularize the resource definitions

The fix is to treat infrastructure like software and pull resource definitions into reusable modules. A vpc module doesn't know whether it's in dev or prod; it knows how to build a VPC given some inputs. Each environment is a thin caller that passes in its own values.

The diagram I keep drawing for this:

[ Architecture: module flow ]

       +------------------+
       |   Source code    |  <-- Generic blueprints
       |   (modules/vpc)  |
       +--------+---------+
                |
                v
    +-----------------------+
    |   Live environments   |
    +-----------------------+
    |                       |
    |  +-----------------+  |
    |  |  env/dev        |  |  <-- Same module, smaller inputs
    |  +-----------------+  |
    |                       |
    |  +-----------------+  |
    |  |  env/prod       |  |  <-- Same module, HA inputs
    |  +-----------------+  |
    +-----------------------+

Two things fall out of this:

  • The VPC code is identical across environments, so behavior is consistent.
  • Module changes land in dev first, so a bad module is caught before it ever reaches prod.

A directory layout that holds up

Here's the shape I recommend. The point is to separate the definition of resources from the instantiation of them.

├── modules/                  # Reusable code
   ├── networking/
      ├── main.tf
      ├── variables.tf
      └── outputs.tf
   ├── gke-cluster/
   └── cloud-sql/

├── environments/             # Where state lives
   ├── dev/
      ├── main.tf           # Calls modules with dev vars
      ├── backend.tf        # Remote state config (GCS bucket)
      └── terraform.tfvars  # node_count = 1
   
   └── prod/
       ├── main.tf           # Calls the same modules with prod vars
       ├── backend.tf
       └── terraform.tfvars  # node_count = 5, high_mem

environments/prod/main.tf ends up looking like this — module calls, no resource blocks:

module "vpc" {
  source     = "../../modules/networking"
  project_id = var.project_id
  region     = "us-central1"
  subnet_cidr = "10.0.0.0/16"
}

module "gke" {
  source     = "../../modules/gke-cluster"
  vpc_id     = module.vpc.network_id
  machine_type = "e2-standard-4"
  min_nodes    = 3
}

Remote state, not local state

The single most common mistake on small teams: leaving terraform.tfstate on a laptop. Move it to a remote backend before you do anything else. On GCP that means a GCS bucket, which gives you state locking out of the box. If two people try to apply at the same time, the second one gets a clear error instead of a silently corrupted state file.

Watch for circular dependencies between modules too. If module A depends on an output from B and B depends on an output from A, Terraform can't resolve the order. The usual fix is to break the modules apart further or use a data source to look up the resource after it's created.

Pin provider versions

Terraform 1.0 gives stability promises for the core, but providers are still on their own cadence. Pin them explicitly:

terraform {
  required_providers {
    google = {
      source  = "hashicorp/google"
      version = "~> 3.70"
    }
  }
  required_version = ">= 1.0.0"
}

Without this, the next google-provider release can change a resource schema and break your pipeline overnight, with no code change on your side to point at.

Wrap up

Splitting modules from environments is the boring infrastructure decision that pays back the most. The first refactor takes a couple of days. After that, spinning up a new region for DR is a new folder pointing at the same modules.