Close to the wire: How route analytics can help prevent BGP-caused outages
Close to the wire: How route analytics can help prevent BGP-caused outages
by Brian Boyko, Contributor - September 16, 2014
At around 3:00 a.m. Eastern Daylight Time on August 13th, Internet users started reporting slow connectivity and intermediate outages. This impacted many large networks and hosting providers including eBay, Comcast, and Time-Warner.
The problem was that some older Cisco routers have a default limit of 512k border gateway protocol (BGP) routing entries in their TCAM memory. Normally, routing tables typically have around 500k entries, so there’s a little bit of a buffer. But BGP prefix aggregation for a major service provider’s systems temporarily failed. The service provider quickly fixed the problem on their end, but not before 15,000 new prefixes were sent to the global routing table, surpassing that 512k limit.
There is a work-around for these routers to increase the maximum size for the routing tables, but one has to wonder why these routers were running so close to maximum to begin with. In short, there is clearly a need for a larger margin of error.
The August 13th event highlights one of the reasons that route analytics are more important than ever. With the visibility Continue reading