Russ

Author Archives: Russ

snaproute Go BGP Code Dive (11): Moving to Open Confirm

In the last post in this series, we began considering the bgp code that handles the open message that begins moving a new peer to open confirmed state. This is the particular bit of code of interest—

case BGPEventBGPOpen:
  st.fsm.StopConnectRetryTimer()
  bgpMsg := data.(*packet.BGPMessage)
  if st.fsm.ProcessOpenMessage(bgpMsg) {
    st.fsm.sendKeepAliveMessage()
    st.fsm.StartHoldTimer()
    st.fsm.ChangeState(NewOpenConfirmState(st.fsm))
  }

We looked at how this code assigns the contents of the received packet to bgpMsg; now we need to look at how this information is actually processed. bgpMsg is passed to st.fsm.ProcessOpenMessage() in the next line. This call is preceded by the st.fsm, which means this function is going to be found in the FSM, which means fsm.go. Indeed, func (fsm *FSM) ProcessOpenMessage... is around line 1172 in fsm.go—

func (fsm *FSM) ProcessOpenMessage(pkt *packet.BGPMessage) bool {
 body := pkt.Body.(*packet.BGPOpen)

 if uint32(body.HoldTime) < fsm.holdTime {
  fsm.SetHoldTime(uint32(body.HoldTime), uint32(body.HoldTime/3))
 }

 if body.MyAS == fsm.Manager.gConf.AS {
  fsm.peerType = config.PeerTypeInternal—
 } else {
  fsm.peerType = config.PeerTypeExternal
 }

 afiSafiMap := packet.GetProtocolFromOpenMsg(body)
 for protoFamily, _ := range afiSafiMap {
  if fsm. Continue reading

Can I2RS Keep Up? (I2RS Performance)

What about I2RS performance?

The first post in this series provides a basic overview of I2RS; there I used a simple diagram to illustrate how I2RS interacts with the RIB—

rib-fib-remote-proxy

One question that comes to mind when looking at a data flow like this (or rather should come to mind!) is what kind of performance this setup will provide. Before diving into the answer to this question, though, perhaps it’s important to ask a different question—what kind of performance do you really need? There are (at least) two distinct performance profiles in routing—the time it takes to initially start up a routing peer, and the time it takes to converge on a single topology and/or route change. In reality, this second profile can be further broken down into multiple profiles (with or without an equal cost path, with or without a loop free alternate, etc.), but for our purposes I’ll just deal with the two broad categories here.

If your first instinct is to say that initial convergence time doesn’t matter, go back and review the recent Delta Airlines outage carefully. If you are still not convinced initial convergence time matters, go back and reread what you can Continue reading

Reaction: Devops and Dumpster Fires

Networking is often a “best effort” type of configuration. We monkey around with something until it works, then roll it into production and hope it holds. As we keep building more patches on to of patches or try to implement new features that require something to be disabled or bypassed, that creates a house of cards that is only as strong as the first stiff wind. It’s far too easy to cause a network to fall over because of a change in a routing table or a series of bad decisions that aren’t enough to cause chaos unless done together. —Networking Nerd

Precisely.

But what are we to do about it. Tom’s Take is that we need to push back on applications. This, also, I completely agree with. But this only brings us to another problem—how do we make the case that applications need to be rewritten to work on a simpler network? The simple answer is—let’s teach coders how networks really work, so they can figure out how to better code to the environment in which their applications live. Let me be helpful here—I’ve been working on networks since somewhere around 1986, and on computers and electronics since Continue reading