Skip to content

The east-west security blind spot in GPU cloud builds (and why you can’t “add it later”)

I’ve been the infrastructure manager on the receiving end of a “completed” platform handoff: the kit is racked, the fabric is flying, the benchmarks look great, and everyone wants to move on.

In GPU-first data centre builds, that moment is happening at scale. Fabrics are tuned for raw throughput (RDMA/RoCE, lossless Ethernet, PFC/ECN), validated, and handed over to a neocloud operator.

Then, a week later, someone asks the question that wasn’t in the build checklist:

“How do you isolate tenants and prove east-west security?”

And the honest answer is often:

“VLANs… and we’ll be careful.”

That’s not a posture. It’s a future incident report.

The handoff gap: performance is not isolation

GPU platforms are being built as shared fabrics. Even when a neocloud’s commercial model starts with “single tenant,” it rarely stays that way. The first time an enterprise wants a carve-out, or a partner wants a slice, or a sovereign deployment needs provable boundaries, multi-tenancy shows up.

If your east-west story is “we put people in different VLANs,” you’ve already accepted three problems:

  • Isolation is configuration, not policy. It’s easy to be “mostly right” until one port, one trunk, one change window breaks the model.
  • Change has no audit narrative. Who changed what? Why? Was it reviewed? Can you prove what was enforced last Tuesday?
  • Blast radius is huge. A compromised workload on a flat or semi-flat fabric can enumerate and probe far more than it should.

The uncomfortable reality: if you don’t design east-west controls up front, you end up retrofitting them under production pressure.

The first questions you’ll get (and who asks them)

In my experience, these don’t come from abstract “security process.” They come from practical, high-stakes moments where someone needs evidence and a clear answer:

  • A customer security team: “Show me that my training data can’t be reached from another tenant.”
  • A regulator / auditor: “How is policy enforced and tracked? Where is the evidence?”
  • A sovereign/defence programme: “Prove data doesn’t cross boundaries—operationally, not conceptually.”
  • An SRE during an incident: “What changed? What was the intended state?”

When those questions arrive after handoff, the answer can’t be “we’ll bolt something on.” Not in a GPU fabric.

Why the usual fixes don’t fit GPU fabrics

1) Inline firewalls are the wrong shape

Traditional firewalls sitting in the data path are great when you can tolerate inspection overhead, hairpinning, and capacity planning around a choke point.

GPU east-west traffic is the opposite: you’re optimising for latency and predictable throughput. Putting an inline box in the middle is how you turn a high-performance fabric into an expensive science project.

2) Agent-based microsegmentation can’t cover the network

Server-agent approaches can be powerful, but they assume:

  • you control the OS image and can deploy/maintain agents,
  • your enforcement point is the host,
  • and your “network truth” can be reconstructed from what hosts report.

In neocloud environments—especially with mixed tenancy and rapid provisioning—you need enforcement that doesn’t depend on perfect host cooperation.

3) VLANs and ACL sprawl don’t scale operationally

VLANs, VRFs, and ACLs are building blocks. They’re not a lifecycle.

The failure mode isn’t “we don’t have knobs.” The failure mode is drift:

  • a manual exception during an outage,
  • a copied structure that never gets cleaned up,
  • a shortcut that becomes permanent,
  • and no single place to say “this is the intended policy.”

What “good” looks like (from an operator’s perspective)

If you want a fabric that can survive enterprise scrutiny, you need three things:

1) Declarative policy: Define isolation intent once (“tenant A cannot talk to tenant B except via these approved services”).

2) Enforcement at the switch level: Put the enforcement where the traffic actually flows, without forcing it through an inline appliance.

3) GitOps auditability: Every change is reviewed, tracked, and attributable. You can answer “what was enforced at time X?” with evidence.

This is the part that most teams miss: the goal isn’t just to block bad traffic. The goal is to make the operating model defensible.

A practical checklist before you sign off a GPU fabric

If you’re building or accepting a GPU platform, ask these before go-live:

  • How is tenant isolation expressed? (As policy, or as a pile of device config?)
  • Where is enforcement? (Host-only, inline chokepoints, or at the fabric?)
  • What’s the change control story? (Can we diff intended state? Can we roll back?)
  • What’s the evidence story? (Can we produce proof for auditors without archaeology?)
  • How do customers self-serve safely? (Or do we create a ticket queue that becomes the bottleneck?)

If those answers aren’t crisp, you’re not “done.” You’re just early.

Why we built NetOrca

We kept seeing the same pattern: incredible engineering to deliver lossless performance, followed by uncomfortable silence when asked about east-west policy.

NetOrca exists to close that gap:

  • security policy defined once in YAML,
  • enforced directly on the fabric switches,
  • tracked through GitOps with a clean audit trail,
  • without inserting an inline firewall into the GPU data path.