I read the first half of BGP in the Data Center, so I’ll note down what caught my attention. It contained practical content about how to design CLOS networks and how to operate BGP. Since it’s an area I don’t usually deal with, there might be misunderstandings.
1. Introduction to Data Center Networks
- Communication between servers in a local network is called East-West traffic, and communication between the local network and external networks is called North-South traffic.
- Since a large number of cables need to be managed, administrators of huge networks each have management techniques. There’s also an OSS called Prescriptive Topology Manager (PTM).
- If you build a huge network with identical switches, even if they fail, you can recover simply by replacing them. It’s important to think about daily operation costs.
- To connect the CLOS network to the outside, place Border Pods or Border Leaves. The routing protocol used within the data center should be separated from the outside. If Border Pods or Border Leaves cannot be placed, connect all Spines to the outside (to maintain Spine symmetry).
- Compared to iBGP, eBGP is easier to understand and deploy, so it’s better to adopt eBGP. From a historical perspective, there are more mature implementations of eBGP.
2. How BGP Has Been Adapted to the Data Center
- It’s better not to use public ASNs. There are dangers such as confusing operators or, in the unlikely event of external leakage, causing BGP hijacking.
- Expanding 2-byte ASNs to 4 bytes allows the use of approximately 95,000,000 private ASNs.
- If you simply assign a unique ASN to all BGP speakers, you’ll suffer from Path Hunting.
- When a node with reachability to a certain Prefix goes down, it takes time for that information to converge. This is due to the nature of BGP’s best path selection.
- It’s impossible to distinguish whether it’s truly unreachable or reachable via another path.
- In densely connected CLOS topologies, this becomes a significant problem.
- As a response to the Path Hunting problem, there’s the following ASN allocation model. Since loops can be detected, unnecessary messages don’t need to be sent. However, note that Route Summarization cannot be done.
- All ToRs have unique ASNs
- Leaves have unique ASNs per Pod. Same ASN within a Pod
- Spines have common ASN

- OSPF and IS-IS have only one metric for Best path selection. On the other hand, BGP has 8.
- Wise Lip Lovers Apply oral medication Every Night.
- Weight, LOCAL PREFERENCE, Locally originated, ASPATH, ORIGIN, MED, eBGP over iBGP, NextHop IGP Cost
- If these 8 are equal, they are considered Equal Cost and Multipath Selection is possible.
- For ASPATH comparison, if the length and all elements are equal, it’s considered Equal Cost.
- For example, when the same Prefix arrives from different ASs, it doesn’t become Multi Path.
- However, by configuring
bestpath -as-path multipath-relax, you can set it to compare only by ASPATH length (without touching the content).
- It’s necessary to speed up convergence in CLOS topologies by changing timer settings. Usually, the default is set with stability as the top priority, assuming provider use, so configure appropriately.
- Advertisement Interval: Messages contained within this Interval are aggregated and sent out. For eBGP, the default is 30 seconds, but it should be set to 0 seconds.
- Keepalive and hold Timers: Exchange keepalive messages with peers at a certain period. If keepalive is not received within the Hold time period, the node is determined to be down. Even if BFD (Bidirectional Forwarding Detection) is introduced for detecting cable failures, these timers need to be adjusted for error detection of the BGP process itself. The recommendation is a Keepalive of 3 seconds and a hold timer of 9 seconds.
- Connect Timer: Waiting time until reconnection when peer connection is lost.
3. Building an Automatable BGP Configuration
- It’s good to set
bgp router-idto an IP address. - Rather than setting the prefixes to advertise as static, it’s better to inherit routes from protocols such as connected, kernel, ospf, bgp, rip.
- However, since there’s a possibility of mistakenly advertising illegal addresses, routing policy configuration is necessary.
prefix belongsis easier to manage thanprefix equal.
- Also, routing policy is usually configured with
route-map.
# Routing policy example
ACCEPT_DC_LOCAL(prefix)
{
if prefix belongs to 10.1.0.0/16 then accept
else if (10.0.254.0/24 contains prefix and subnet equals 32) then accept
else reject}
}