Russ

Author Archives: Russ

Elephant Flows, Fabrics, and I2RS

The last post in this series on I2RS argues that this interface is designed to augment, rather than replace, the normal, distributed routing protocol. What sort of use case could we construct that would use I2RS in this way? What about elephant flows in data center fabrics? An earlier post considers how to solve the elephant flow using segment routing (SR); can elephant flows also be guided using I2RS? The network below will be used to consider this question.

benes-segment

Assume that A hashes a long lived elephant flow representing some 50% of the total bandwidth available on any single link in the fabric towards F. At the same time, A will hash other flows, represented by the red flow lines, onto each of the three links towards the core of the fabric in pretty much equal proportion. Smaller flows that are hashed onto the A->F link will likely suffer, while flows hashed onto the other two links will not.

This is a particularly bad problem in applications that have been decomposed into microservices, as the various components of the application tend to rely on fairly fixed delay and jitter budgets over the network to keep everything synchronized and running quickly. Continue reading

Being an Effective Interviewer

One challenging aspect of being an engineer is interviewing other engineers. The interview process is rife with various problems, including the discomfort of interviewing someone who you perceive as being a better engineer than you are, or figuring out how to draw out actual engineering skill versus simply finding out how much someone has memorized. Does that CCIE or degree on their resume really mean anything? What does an effective network engineering interview look like?

There are, of course, several different theories “out there.” For instance, some companies focus on giving candidates “real world” problems to solve, and checking with them several days later to see how they’ve done. This is potentially useful, but quite often I find my best work is done with a team, rather than by myself. Such systems seem to tend towards pitting the candidate against well known or well established problem sets, which can easily revert back to memorization skills, or towards difficult/obtuse problems. Either way, this doesn’t test the candidate’s ability to work in a team, or interact with others in solving difficult problems.

What about tossing other sorts of puzzles towards the candidate to see how they do? This also seems problematic Continue reading

snaproute Go BGP Code Dive (11): Moving to Open Confirm

In the last post in this series, we began considering the bgp code that handles the open message that begins moving a new peer to open confirmed state. This is the particular bit of code of interest—

case BGPEventBGPOpen:
  st.fsm.StopConnectRetryTimer()
  bgpMsg := data.(*packet.BGPMessage)
  if st.fsm.ProcessOpenMessage(bgpMsg) {
    st.fsm.sendKeepAliveMessage()
    st.fsm.StartHoldTimer()
    st.fsm.ChangeState(NewOpenConfirmState(st.fsm))
  }

We looked at how this code assigns the contents of the received packet to bgpMsg; now we need to look at how this information is actually processed. bgpMsg is passed to st.fsm.ProcessOpenMessage() in the next line. This call is preceded by the st.fsm, which means this function is going to be found in the FSM, which means fsm.go. Indeed, func (fsm *FSM) ProcessOpenMessage... is around line 1172 in fsm.go—

func (fsm *FSM) ProcessOpenMessage(pkt *packet.BGPMessage) bool {
 body := pkt.Body.(*packet.BGPOpen)

 if uint32(body.HoldTime) < fsm.holdTime {
  fsm.SetHoldTime(uint32(body.HoldTime), uint32(body.HoldTime/3))
 }

 if body.MyAS == fsm.Manager.gConf.AS {
  fsm.peerType = config.PeerTypeInternal—
 } else {
  fsm.peerType = config.PeerTypeExternal
 }

 afiSafiMap := packet.GetProtocolFromOpenMsg(body)
 for protoFamily, _ := range afiSafiMap {
  if fsm. Continue reading