Beyond Fat-Tree: Why AWS’s Flat Network Design Matters

Blog ENG - AWS - Post 5 2026

Some topics deserve more than a quick reaction, and this is one of them.

I plan to study this work in more depth over the coming weeks, because there is clearly a great deal of architectural substance behind it. But even on a first reading, this feels like one of those moments when it is worth pausing and thinking carefully about what is actually changing.

What I find most interesting is not simply that AWS has introduced a new data center network design. It is that AWS appears to have brought into large-scale production a concept that has existed in research for years, but that remained, until now, largely confined to theory and simulation: replacing the familiar hierarchical fat-tree with a flatter fabric built on semi-random graphs with expander-like properties.

That is a serious architectural shift.

In a fat-tree, the network depends on a small number of structurally important layers. Traffic moves up and down the hierarchy, and the design works because it is orderly, symmetrical, and relatively easy to reason about. In a flatter expander-style fabric, the goal is different. The network is no longer organized around a few special aggregation points. Instead, connectivity is spread much more evenly across the fabric.

The value of that change is not just that there are more possible paths. The real value is that the paths are less correlated, less constrained by rigid hierarchy, and potentially more useful under real traffic conditions.

The real issue: making capacity more usable

One of the most interesting ideas here is capacity fungibility.

In simple terms, it means the network’s capacity is easier to use where it is needed, rather than being stranded in one part of the topology while another part becomes congested. That has always been one of the quiet weaknesses of fat-tree designs. They can perform very well when traffic is evenly distributed and the network is generously provisioned. But at very large scale, even small inefficiencies in how capacity is distributed can turn into very large penalties in cost, power, and operational complexity.

This is why the reported results matter. If a fabric can deliver significantly more throughput while using far fewer network devices and less power, that is not a routine optimization. It suggests a different way of thinking about how aggregation fabrics should be built in the first place.

More than a topology story

It would be easy to describe this as ” AWS used a random topology “, but that would miss the point.

The breakthrough, at least from an architectural perspective, is not just the topology. It is the combination of topology, routing, and cabling.

Expander-style graphs are compelling because they offer high connectivity and a large amount of path diversity without requiring an excessive number of links. But elegant mathematics alone does not solve a production networking problem. A real design has to work on practical forwarding hardware. It has to keep state under control. It has to avoid turning the control plane into something fragile or unmanageable. And it has to be physically deployable in a real data hall.

That is where this work becomes genuinely interesting.

Why routing is the hard part

In a fabric built like this, the problem is not the lack of alternative paths. There are many. The problem is how to make those paths usable without overwhelming the network with complexity.

A fat-tree has a simple grammar. Paths follow a predictable hierarchy. A semi-random expander-style fabric does not. If you try to treat it as a giant end-to-end path computation problem, the design quickly becomes impractical. You would have to calculate, store, and program far too many possible paths between source and destination pairs, and that runs straight into real limits in control-plane scale, update behavior, forwarding state, and hardware resources.

What makes the routing approach so interesting is that it seems to avoid that trap. Rather than trying to explicitly represent every useful end-to-end path, it makes path diversity emerge from simpler building blocks. Traffic can be spread initially, steered through intermediate points, and then brought toward the destination in a way that exposes the richness of the fabric without having to describe every possible route in exhaustive detail.

That is an important shift in thinking. The question is no longer ” How many end-to-end paths can I compute? “; it becomes ” How much path diversity can I actually make available to the hardware in a scalable way? “

That distinction matters. It is the difference between theoretical richness and practical usability.

And in a fabric like this, practical usability is everything.

Performance does not come from randomness alone

It is worth stating this clearly: performance does not improve simply because the topology looks more random.

A fabric with a large number of theoretical paths is not automatically a better network. If the forwarding model cannot use those paths effectively, or if the control plane cannot expose them without becoming unmanageable, the potential benefit remains mostly theoretical.

The advantage seems to come from the combination of two things:

First, the topology creates much more potential path diversity than a rigid hierarchy typically can.
Second, the routing model appears able to turn that potential into something the network can actually use at scale.

That combination is what makes the idea architecturally meaningful.

Cabling is not a side detail

Another part of the story that deserves real attention is the physical layer.

Historically, one of the main objections to random or semi-random fabrics has always been cabling. Many of these designs look attractive in papers but become far less attractive when you imagine building and operating them in a live data center. If the price of a more efficient fabric is a wiring plan that is chaotic, fragile, and hard to validate, the theoretical gain can disappear very quickly in day-to-day operations.

That is why the optical shuffling concept is so important.

At first glance, it may seem like an implementation detail. In reality, it is fundamental. If part of the randomness can be absorbed inside a passive optical component, then the external cabling can remain much more orderly and operationally manageable. That changes the practical equation. It means the design is not only efficient on paper, but also buildable, repeatable, verifiable, and maintainable.

That is exactly the kind of problem hyperscale operators must solve. Efficiency alone is never enough. The architecture also has to survive contact with deployment reality.

A useful comparison: topology versus transport

I also find it interesting to compare this line of thinking with what is happening elsewhere in high-performance networking, especially in AI and RDMA-focused environments.

The comparison is not about declaring one approach better than another. It is about recognizing that similar architectural instincts can appear at different layers of the stack.

A fabric design like this works at the topology level. It tries to create and expose more path diversity in the network itself. Other designs, especially in the AI transport space, work at a different layer: they try to spread traffic across multiple paths at the transport level, react faster to congestion or failures, and keep expensive compute resources fully utilized.

The common idea is simple but important: performance is not just about having more nominal bandwidth. It is about making the available paths genuinely usable, and about avoiding situations where traffic becomes trapped by static choices or by load-balancing methods that are blind to what is happening inside the fabric.

Resilience may matter even more than peak throughput

The resilience angle may be just as important as the throughput story.

In hierarchical topologies, upper layers naturally concentrate traffic and failure domains. Losing a critical node at the top of the structure can affect a large number of endpoint pairs at the same time. In a flatter fabric with expander-like properties, there are fewer structurally dominant nodes. Capacity loss is more likely to be spread out, more gradual, and more predictable.

That matters a great deal in multi-tenant and hyperscale environments.

At that scale, continuity is not about pretending failures will not happen. Failures are inevitable. The real question is whether the infrastructure degrades in a controlled way when they do. Graceful degradation is often more valuable than theoretical perfection.

The economics are structural, not cosmetic

The cost story should also be read carefully.

It is tempting to interpret device reduction simply as fewer boxes were purchased. But the deeper point is that if the fabric makes better use of the capacity already present in the network, then less supporting infrastructure is required to reach the same or better outcomes.

That has obvious implications: fewer devices, lower power draw, less cooling overhead, less physical complexity, and lower environmental cost per unit of traffic carried.

What I like here is that sustainability does not appear as a separate overlay or a moral add-on. It emerges naturally from better architectural efficiency. When a network wastes less capacity, it also wastes less silicon, less optics, less power, and less operational effort.

That is a much more interesting story than simply calling something greener.

What remains to be tested

As promising as this looks, there are still important questions.

How portable is this architecture outside an environment like AWS, where the operator controls the design end to end? How difficult is troubleshooting in a fabric that is flatter and less intuitively structured than a traditional hierarchy? How does it behave under correlated failures, planned maintenance events, or highly imbalanced traffic patterns? And how much of the observed benefit depends on having not only a new fabric design, but also the operational tooling and deployment discipline to support it properly?

Those are not minor questions. They will determine how broadly this approach can influence the rest of the industry.

Final thoughts

What makes this work compelling is not just that it promises more throughput or lower cost. Plenty of technologies make those claims.

What makes it interesting is that it revisits several layers of the problem at once: topology, routing, cabling, resilience, and operational practicality. That is rare. And it is usually where the most meaningful infrastructure progress comes from.

Fat-tree architectures have served the industry extremely well because they brought order, predictability, and a clean operational model. But at hyperscale, the same rigidity that once made them attractive can become a limitation.

A flatter fabric with richer path diversity offers a different path forward. If it can combine better throughput, more graceful failure behavior, lower cost, and lower energy use without becoming operationally unmanageable, then this is not just an interesting technical experiment. It is a serious architectural development.

And from a network architect’s point of view, that is exactly why it deserves attention.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.