Category Archives for "The Stupid Engineer"

The Two Principles Of Troubleshooting

  1. Never trust someone else’s configuration.
  2. Don’t trust your own configuration.

But in all seriousness. If you’re migrating configuration, this would be a good place to start:

  • Check all your IP addresses are consistent.
  • Check your masks are consistent.
  • Check your interfaces are correct.
  • If you’re working with peers, check your IP addresses for the peers are correct.I mean all 4 octets. Not just the last one, or two, or three. ALL FOUR. If it’s v6, then FML. Bite the bullet and write a script.
  • Is there a naming convention to follow? There’s a temptation when migrating to stick with the old name, but new devices may require a different convention is adhered to. Reasons for this range from the whimsical to the valid.

If you’re coming up with something new, and it involves addressing new interfaces then start with this:

  • First check your IP allocations are correct. By this, I mean check if you have any hierarchy or ordering. For example, do you reserve addresses by site, geographic location or application? If you do, then make sure these are consistent with what you’ve planned.
  • Is your addressing valid? i.e: Are the subnets and host addresses you’ve assigned correct? Continue reading

Saving Backup/Rescue Config on Juniper

A lot of times I find myself having to back a config up on a Juniper before I start work. Usually, I want a quick point I can restore to if I need to rollback. So enter rescue configurations to the, errr, rescue?

request system configuration rescue save

This saves the current saved system configuration as a rescue configuration you can easily rollback to with.

#rollback rescue

You can also save the current configuration to file using:
>file copy /config/juniper.conf.gz /var/tmp/temp_backup.cfg

/config/juniper.conf.gz is synonymous with the current running configuration.

Potentially, you could stash files in /var/tmp/ and restore them using the above. And restore using your backup with #load replace /var/tmp/temp_backup.cfg

View your stashed files using file list /var/tmp

Why Troubleshooting Is Overrated

This post is the result of a thought I had after someone asked me to describe an interesting problem I’d faced. I think they meant troubleshooting, because that’s how I answered it.

Speak to most network engineers about what they love about the job, and troubleshooting will crop up quite frequently. I’ve got to admit, being able to delve into a complex problem in a high pressure situation with a clock against it more often than not does give me a rush of sorts. The CLI-fu rolls off your fingers if you’ve been on point with your studies, or you’re an experienced engineer, you methodically tick off what the problems could be and there’s a “Eureka” moment where you triumphantly declare the root cause.

But then what?

I don’t mean what’s the solution to the problem. That’s usually obvious. In most cases, the root cause is one of these culprits:
– Poor design. E.g: 1Gb link in a 10Gb path, designing for ECMP and then realising they’ve used BGP to learn a default route and not influenced your metrics, so anything outside your little IGP’s domain is going to be deterministically routed.
– A fault. E.g: Link down Continue reading

Working with JunOs and Optics

Found myself troubleshooting a pesky fibre connection that wouldn’t come up. I was looking for a command that would show me if a light was being received on the interface and found these beauties:

show interfaces diagnostics optics xe-4/1/0

show chassis pic fpc-slot 4 pic-slot 1

The first shows information on light levels on the relevant optic. The second will help you figure out what type of cabling you need to be using. Handy when you don’t know if it should be single or multi mode.


When adding a VLAN doesn’t add a VLAN

Vendor: Cisco
Software version: 12.2(33)SXI7
Hardware: 6509-E

So this is a typical stupid question. How do you add VLANs to a trunk?

Assuming you started with a port with default configuration on it, it would be:

 switchport mode trunk
 switchport trunk encapsulation dot1q
 switchport trunk allowed vlan
 switchport trunk native vlan

Now, I was interrupted while doing this by someone interjecting and stating categorically, that

 switchport trunk allowed vlan

Should be:

 switchport trunk allowed vlan add

Not really the way I would do it on a new switchport, but not wanting to hurt feelings I proceeded and saw this:

 TEST(config-if)#switchport trunk allowed vlan add 10,20,30
 TEST(config-if)#do show run int gi9/14
 Building configuration...
Current configuration : 279 bytes
 interface GigabitEthernet9/14
 description TEST
 switchport trunk encapsulation dot1q
 switchport mode trunk
 storm-control broadcast level 0.50
 storm-control multicast level 0.50
 no cdp enable
 no lldp transmit
 no lldp receive

To cut a long story short, the switch takes the configuration, but doesn’t apply it. It lead to a lot of head scratching, because you’d think it should work. Switchport state when doing:

 show interface gi9/14 trunk

Shows a state Continue reading

Getting Paramiko To Work

I’ve had a lot of struggles getting Paramiko to work and today I’ve finally managed it.
Here’s my setup:

-bash-3.2$ cat /etc/redhat-release
 Red Hat Enterprise Linux Server release 7.1 (Maipo)

This is fairly important.

pip install paramiko

Didn’t work for me. Some Googling led me to believe I needed the python-dev package installed. So I tried:

yum install python-dev

This didn’t work, so I had to search for it. So I searched for it using:

yum search python-dev

The above is my new favourite command. It turned up:

$ yum search python-dev
 Loaded plugins: product-id, rhnplugin, subscription-manager
 This system is receiving updates from RHN Classic or Red Hat Satellite.
 ==================================================================================================== N/S matched: python-dev =====================================================================================================
 python-devel.x86_64 : The libraries and header files needed for Python development

I then did a:

pip install paramiko

And I was done!

BGP RIB Failure

An infrequent, yet interesting issue that comes up occasionally is when BGP encounters RIB failures. Usually, it takes the form of a prefix which you’d expect a router to learn via eBGP in its RIB being learnt via a routing protocol with a worse administrative distance.

To understand this problem, we first need to realise that “RIB failure” in a “show ip bgp” output implies that a route offered to the RIB by BGP has not been accepted. This is not a cause for concern if you have a static, or connected route to to that network on the router, but if you’re expecting it to be via eBGP then you can infer that something is misconfigured with your routing.

This can also be simplified to “BGP does not care about administrative distance when selecting a path”.

For reference, the path selection algorithm goes:

Network layer reachability information.

Weight (Cisco proprietary). Bigger is better.

Local preference

Locally originated route

AS path length

Origin code. IGP>EGP>Incomplete

Median Exit Discriminator. Lower is better.

Neighbour type. eBGP better than iBGP.

IGP metric to Next Hop. Lowest Router ID wins.

Verifying SSL Certificate Chains

Found this link very useful doing this:

Some useful commands:
Display a certificate:
openssl x509 -in test-cert-top.pem -noout -text

Display a certificate's issuer:
openssl x509 -in test-cert-top.pem -noout -issuer

Display a certificate's subject:
openssl x509 -in test-cert-top.pem -noout -subject

Verify a certificate:
openssl verify test-cert-top.pem

Verify a certificate chain with 3 certificates:
openssl verify -CAfile test-cert-bottom.pem -untrusted test-cert-middle.pem test-cert-top.pem
-CAfile keyword indicates which certificate is used as the root certificate, with the -untrusted option being set to validate the intermediate certificate in the chain

Verify a certificate chain with 2 certificates:
openssl verify -CAfile test-cert-bottom.pem test-cert-middle.pem

A10 Health Monitors

This post is an equivalence check of A10 vs ACE probes/health monitors.


ACE-A# show probe

probe : tcp-3121-probe-1
type : TCP
state : ACTIVE
port : 3121 address : addr type : -
interval : 10 pass intvl : 30 pass count : 2
fail count: 2 recv timeout: 5

--------------------- probe results --------------------
probe association probed-address probes failed passed health
------------------- ---------------+----------+----------+----------+-------
serverfarm : vip-
real : ip-[3121] 1286028 1104 1284924 SUCCESS

interval – the time period health checks for a healthy server are sent
pass intvl – the time period health checks for a server marked “DOWN” are sent
pass count – the number of successful probes required to mark a server as “UP”
fail count – the number of unsuccessful probes required to mark a server as “DOWN”
recv timeout – timeout before a probe fails

a10-1[test-1]#show health monitor
Idle = Not used by any server In use = Used by server
Attrs = Attributes G = GSLB
Monitor Name Interval Retries Timeout Up-Retries Method Status Attrs
tcp-443-monitor-1 30 2 5 2 TCP In use

Interval – the time period Continue reading

Checking Faulty Cables

I recently had to work with a 3rd part to diagnose a link between our devices and came across this handy command. The link in question was a pretty hefty (75m-ish) UTP cable run between a Cisco and HP switch. I have visibility of the Cisco switch, into the structured cabling into the patch panel, and the 3rd parties cable. Unfortunately I didn’t have a DC Operations tech with access to a Fluke, or the ability to interpret the output of a Fluke, but they did have a laptop with a 100Mbps NIC (this becomes important later on).

So I started by running the diagnostic on the production connection. It’s not working, so I don’t have to worry about taking stuff down. This gives me the following:

test cable-diagnostics tdr interface gi7/21
TDR test started on interface Gi7/21
A TDR test can take a few seconds to run on an interface
Use 'show cable-diagnostics tdr' to read the TDR results.

switchA#show cable-diagnostics tdr interface gi7/21

TDR test last run on: July 09 10:30:20
Interface Speed Pair Cable length Distance to fault Channel Pair status
——— —– —- ——————- ——————- ——- ————
Gi7/21 auto 1-2 77 +/- 6 m N/A Invalid Continue reading

Check 10Gb Interfaces On An ASA

I recently had to deploy and ASA pair. One of the pre-requisites is to make sure there’s an optic in the interface we’re going to use. On a switch you have the following options:

#show int te5/4 transceiver
Transceiver monitoring is disabled for all interfaces.

ITU Channel not available (Wavelength not available),
Transceiver is internally calibrated.
If device is externally calibrated, only calibrated values are printed.
++ : high alarm, + : high warning, - : low warning, -- : low alarm.
NA or N/A: not applicable, Tx: transmit, Rx: receive.
mA: milliamperes, dBm: decibels (milliwatts).

Optical Optical
Temperature Voltage Current Tx Power Rx Power
Port (Celsius) (Volts) (mA) (dBm) (dBm)
---------- ----------- ------- -------- -------- --------
Te5/4 27.0 0.00 7.6 -- -2.2 -2.7


#show int tenGigabitEthernet 5/4 capabilities
Model: VS-S720-10G
Type: 10Gbase-SR
Speed: 10000
Duplex: full
Trunk encap. type: 802.1Q,ISL
Trunk mode: on,off,desirable,nonegotiate
Channel: yes
Broadcast suppression: percentage(0-100)
Flowcontrol: rx-(off,on),tx-(off,on)
Membership: static
Fast Start: yes
QOS scheduling: rx-(8q4t), tx-(1p7q4t)
QOS queueing mode: rx-(cos,dscp), tx-(cos,dscp)
CoS rewrite: yes
ToS rewrite: yes
Inline power: no
Inline power policing: no
SPAN: source/destination
UDLD yes
Link Debounce: yes
Link Debounce Time: yes
Ports-in-ASIC (Sub-port ASIC) Continue reading

Kill An SSH Connection

Check what’s connected to the switch first:

#show ssh
%No SSHv1 server connections running.
Connection Version Mode Encryption Hmac State Username
0 2.0 IN aes128-cbc hmac-md5 Session started user1
0 2.0 OUT aes128-cbc hmac-md5 Session started user1
1 2.0 IN aes128-cbc hmac-md5 Session started user1
1 2.0 OUT aes128-cbc hmac-md5 Session started user1

Kill session using “disconnect” command:

#disconnect ssh ?
The number of the active SSH connection
vty Virtual terminal

#disconnect ssh 0

Fun With Subinterfaces

Loving this regex at the moment!
show int description | i 9/[1-2].1..
Te9/1.107 up up XXXX
Te9/1.111 up up XXXX
Te9/2.106 up up XXXX
Te9/2.110 up up XXXX

Help me see subinterfaces allocated on transit interfaces fairly simply.

What’s Possible

I just read story on Medium. It’s a great use of Social Media to achieve something truly useful.

I wish more people would think differently in my line of work. Some days the resounding echoes of “we’ve always done it this way” really give me a headache.

Fun With Route-Maps And BGP

I’ve always been a little bit hazy on the circumstances under which a BGP neighbour needs to be cleared. This extremely informative page from Cisco casts a bit of light on the situation. Especially, the section on when to clear a BGP neighbourship.

The official line is any in/outbound policy update will require the BGP session to be cleared to take effect. Obviously, this depends on the direction the policy is applied when you clear the neighbourship in/outbound.

So my question is whether a new route-map constitutes a policy update. Now this may sound like a stupid question (remember the title of the blog please dear reader). But someone legitimately asked me if applying a new policy constituted an update. So let’s find out.

This is my topology:

Test Topology
Test Topology

This is what I’m doing:
– Loopback0 ( is advertised into OSPF on R1 along with the network.
– The network is advertised into OSPF on R2.
– BGP is used to advertise the network using a peer-group TEST.
– R1 and R2 have an iBGP peering in AS 65000 using the physical addresses of Continue reading

ASA Prompt Customisation

I occasionally run into Cisco ASAs that don’t identify their status (active/standby). This is rectified by configuring the “prompt “. These are:

asa-1-pri(config)# prompt ?

configure mode commands/options:
context Display the context in the session prompt (multimode only)
domain Display the domain in the session prompt
hostname Display the hostname in the session prompt
priority Display the priority in the session prompt
state Display the traffic passing state in the session prompt

“state” will tell you if the device is active or standby.

You can check what’s currently on the ASA with:

asa-1-pri# show run prompt
prompt hostname context


Need to do this a few times for some work. It looks like the ASA is a bit picky about how you specify the destination location when you try and do it from a UNIX box.

Enable SSH copy on the ASA

ssh scopy enable

Copy the ASA image from the local directory on your UNIX box to the device.

scp -v asa825-51-k8.bin [email protected]_ADDRESS:disk0:asa825-51-k8.bin

If you don’t use this format the UNIX box will give you an error message along the lines of “lost connection”, though the transfer will seem to have completed.

3 Useful Juniper Commands

wildcard delete
Deletes all configuration associated with a level.

show system commit
Shows any annotations performed during the previous commit. Requires that the previous commit used a “commit comment” when committing the configuration.

clear system commit
Removes any pending commits.