Event-Driven Architecture: Decoupling Services with Pub/Sub and Eventarc

November 7, 2021

Hey folks! Today we need to talk about relationships. Specifically, the toxic, co-dependent relationship your microservices are probably in right now.

You know the drill. You spent months breaking that dusty monolith into shiny new microservices. You felt like a genius. But now, Service A calls Service B via HTTP, which calls Service C, which needs to query a database... and if Service C has a hiccup? The whole chain collapses like a house of cards in a wind tunnel.

I've been there. I once had a checkout service that synchronously waited for an email confirmation service to finish. The email provider had an outage, and suddenly, nobody could buy anything. Our checkout latency went from 200ms to "Time Out." Disaster.

The fix? Event-Driven Architecture (EDA). Or, as I like to call it: "The Art of Ghosting." Service A sends a message and leaves. It doesn't care if Service B is awake, asleep, or on vacation. It just drops the info and moves on.

Today, we're looking at the two heavy hitters on GCP for this: the reliable veteran Cloud Pub/Sub and the rising star Eventarc.


The HTTP Trap vs. The Event Freedom

When you couple services with direct HTTP calls (REST/gRPC), you are creating a runtime dependency.

  • Synchronous: The caller waits.
  • Brittle: If the receiver is down, the caller fails (unless you have fancy circuit breakers, but even those are complex).
  • Rigid: Adding a new service that needs that data means changing the sender's code.

Event-Driven flips the script:

  1. Producer fires an event ("Order Placed").
  2. Router/Broker catches it.
  3. Consumers (Subscribers) react to it whenever they can.

If the "Email Service" is down, the message sits in the queue. When it comes back up, it processes the backlog. Zero lost orders. Zero checkout latency spikes.


The Workhorse: Cloud Pub/Sub

If you've been doing cloud for more than five minutes, you know Cloud Pub/Sub. It's the backbone of GCP asynchronous messaging.

What it is: A global, asynchronous messaging service that decouples services that produce events from services that process events.

When to use it:

  • High Throughput: You are streaming millions of events (clickstreams, IoT data, heavy transaction logs).
  • Fan-Out: One message needs to go to 10 different analytics systems.
  • You control the message body: You are defining the JSON schema for your app-to-app communication.

Rad's Tip: Use Pull Subscriptions if you have legacy workers on VMs that need to control the flow rate. Use Push Subscriptions if you are triggering Cloud Run or Cloud Functions and want that sweet, sweet serverless scale.

The New Kid: Eventarc

Now, this is where things get spicy. Eventarc (which hit General Availability earlier this year) is Google's managed service specifically designed to let you build event-driven architectures with Cloud Run and Cloud Functions (2nd Gen).

Why is it different from Pub/Sub? While Pub/Sub is the transport layer, Eventarc is the governance and routing layer built on top of it (and arguably simpler to use for specific use cases).

The Killer Feature: Audit Logs Eventarc can listen to GCP Infrastructure Events. This is huge. Want to trigger a Cloud Run container every time a file is uploaded to a specific Cloud Storage bucket?

  • Old way: Set up a Cloud Function trigger on the bucket.
  • Eventarc way: Create a trigger listening to google.cloud.storage.object.v1.finalized.

Want to trigger a slack alert when a specific IAM permission changes?

  • Old way: Log sinks into Pub/Sub into a Function.
  • Eventarc way: Listen for the Cloud Audit Log event directly.

Plus, Eventarc uses the CloudEvents standard (CNCF). This means your events have a standard structure/header format regardless of source. No more parsing custom JSON blobs wondering where the timestamp field is hiding.


The Architecture: Putting it Together

Here is a simple diagram of how we are rolling this out for a client right now. We use Pub/Sub for high-volume app data, and Eventarc for state changes.

Event-Driven Architecture - Pub/Sub and Eventarc

Why this works:

  1. Decoupling: The Client App doesn't know Inventory exists.
  2. Standardization: The Thumbnail Generator receives a standard CloudEvent. If we switch storage providers or event sources later, the format remains largely consistent.
  3. Scale: Both Pub/Sub and Eventarc scale to zero. If no orders come in at 3 AM, we pay (almost) nothing.

Rad's "Gotchas" Corner ⚠️

It wouldn't be a blog post if I didn't share a scar I earned the hard way.

The Infinite Loop of Doom I was playing around with Eventarc to capture "Audit Logs" for Cloud Storage to track when files were created.

  1. I set up an Eventarc trigger on storage.object.create.
  2. My Cloud Run service processes the file and... writes a processed version back to the same bucket.
  3. The write triggers a new event.
  4. The service runs again.
  5. Infinite loop. My billing alert woke me up before my alarm clock did.

Fix: always separate your "Input" buckets from your "Output" buckets, or check the file name metadata strictly at the start of your function!

The "At-Least-Once" Reality Both Pub/Sub and Eventarc guarantee "at-least-once" delivery. This means, on rare occasions (network blips, retries), your service might receive the same message twice. Fix: Make your consumers Idempotent. If you process Order #12345, check if you've already processed it before charging the credit card again. Your database needs a unique constraint on that Transaction ID. Don't skip this!


Verdict: Which one to pick?

  • Building a Data Pipeline? Ingesting clickstreams? App-to-App chatter where you own the schema? Go Pub/Sub.
  • Reacting to GCP State? (File uploads, VM creations, BigQuery job completion)? Or want strict CloudEvents compliance? Go Eventarc.

Decoupling isn't just an architectural choice; it's a sanity choice. It lets your teams move fast without stepping on each other's toes.

Next time, we'll wrap up the year with a look at how we're transforming our "Ops" team into true "SREs." Until then, keep your services loose and your architectures decoupled!

Cheers, Rad