lindsay

Author Archives: lindsay

Juniper Default ARP Policer

Juniper devices have a default ARP policer that drops ARP requests and responses over 150kbps. By default, this is an aggregate policer that applies to all interfaces. This can lead to unexpected behavior when high levels of ARP on one interface lead to BGP session drops on another interface. You can’t change the default policer limits, but you can create a new policer, with higher limits.

Problem: IPv4 BGP Session Flaps on PNI

I was investigating a problem reported by one of our Transit providers. Once a day or so, our IPv4 BGP session with them would flap. The interface itself was stable, and the IPv6 session remained up. One particular site was seeing this more than others. The sites used different platforms, but were running the same code version.

The curious thing was the logs - we saw log messages saying that we had a notification message saying NOTIFICATION received from 192.0.2.188 (External AS 64498): code 4 (Hold Timer Expired Error). The syslog included this hold timer 30s, hold timer remain 0s, last sent 2s. So our router thought it was sending regular KEEPALIVE messages, but the remote end thought it had missed too many.

Looking Continue reading

Juniper Branch SRX LACP Weirdness

Juniper SRX 300 Series firewalls may stop forwarding traffic in some situations. The firewall says it is forwarding the traffic, but it doesn’t work. Monitoring traffic looks OK, ARP entries are present, but traffic never gets to the destination, until you clear ARP. Turns out the problem comes from using LACP with fast timers and active mode. Luckily the fix is simple.

Alert: Firewall Offline

Here’s the situation we saw: Our NMS reported a Juniper SRX320 offline. All other devices at the site were still working, but the firewall was unreachable. Traffic from the firewall to the NMS goes via the firewall’s default gateway. Firewall A in this diagram was unreachable, but Firewall B was fine.

network_overview

OK, what’s happening? Why is my firewall unreachable?

Firewall says its fine?

Try to ping Firewall A, no response. From the default gateway, we can see an ARP entry for the firewall, but no response to ping. We can log in to Firewall B, and we see an ARP entry for Firewall A. Crucially: we can ping Firewall A from Firewall B. Hmmm. That’s strange. Why can we ping it from one locally connected device but not another?

From Firewall B, we SSH across Continue reading

Juniper QFX10K IPFIX Gotchas

IPFIX is problematic on the Juniper QFX10K switches. Documentation is sparse, and doesn’t have a complete configuration. Behavior changes between versions in undocumented ways. Here’s a couple of things I noticed when upgrading from Junos 17.3 to 17.4. These also apply if you are running 18.4 code. I hit more problems with 18.4, and ended up rolling back to 17.4.

Big Changes in Reported Throughput

Here’s a graph showing total reported throughput for a QFX10K I upgraded:

ipfix traffic report

There’s a few things going on there. First the reported traffic drops to zero after I upgraded. Then it starts coming up, after I fixed the first problem. But then after that the reported traffic is flat, and lower than it should be. Then it starts coming up again after I made the second fix.

First Problem: Chassis Sample Instance

The first configuration change I needed to add was this: set chassis fpc 0 sampling-instance sample-border, where sample-border is the name of the sampling instance I have configured under forwarding-options. This was not required with 17.3. If you don’t do it with 17.4, you won’t get any data.

Second Problem: DDoS-Protection

Some Juniper platforms implement Continue reading

Juniper MX Upgrades Causing Overheating

Juniper changed the way they do temperature management on MX240 and MX480 chassis devices, somewhere between 15.1 and 17.3. The net result is that your chassis might run hotter after you upgrade, which can lead to the system shutting down some optics. Probably not what you want. Luckily there’s a few hidden commands you can use to change this behavior

“Optics will be disabled…”

Post upgrade, you might see higher temperatures reported by show chassis fpc. This system was reporting temperatures in the low 30s, now it reports 50:

1
2
3
4
5
6
7
8
9
[email protected]> show chassis fpc
                     Temp  CPU Utilization (%)   CPU Utilization (%)  Memory    Utilization (%)
Slot State            (C)  Total  Interrupt      1min   5min   15min  DRAM (MB) Heap     Buffer
  0  Empty
  1  Online            50     22          1       22     22     22    2048       38         21
  2  Empty

{master}
[email protected]>

On its own, that’s OK, until you start seeing log messages like this:

1
FPC 1 temperature over 50 degrees C; non-high-temperature tolerant optics will be disabled in 58 seconds if condition persists

Yeah that’s not good, especially when it carries out the threat, and Continue reading

QFX Upgrades – Check Host Version

I came across a situation where a software upgrade failed for some members in a Juniper QFX Virtual Chassis. There is a known issue with upgrades with a certain configuration + version combination, but I thought it didn’t apply to me. Turns out that the key was the host OS version, not the Junos VM version. Your host and guest versions can be out of sync with Juniper QFX 5K devices, and this can lead to confusing behavior, especially in a virtual chassis where host OS versions might vary.

Upgrade Failures - post-install error

When upgrading an old Juniper QFX5100, you might see these messages when running the upgrade:

1
2
Error: jinstall-vjunos fails post-install
Error: jinstall-vjunos-14.1X53-D34-domestic-signed fails post-install

In my case, I saw it for some nodes in a Virtual Chassis. Some worked, some failed. KB31923 says that this error is due to this configuration:

1
2
3
# show system internet-options
tcp-drop-synfin-set;
no-tcp-reset drop-tcp-with-syn-only;

Easy enough to change:

1
2
3
4
5
6
7
8
9
10
# delete system internet-options
{master:0}[edit]
root# show |compare
[edit system]
  internet-options {
      tcp-drop-synfin-set;
     no-tcp-reset drop-tcp-with-syn-only;
 
{master:0}[edit]
root# commit

Continue reading

Junos SNMP via Routing Instance

Juniper routing instances are very useful when you need separate routing tables on the one device, for example to separate customers. Junos lets you configure SNMP polling of routing instances, so customers can poll “their” interfaces using 'instance_name'@'community'. All very useful. But it wasn’t obvious to me how to poll the default table via an interface in a routing instance. The trick is to just use @'community'. Here’s an example.

Network Overview

To demo this I have a simple network. I’m using a Virtual QFX plus Vagrant setup, based on the Vagrantfiles in this repo. I’m running one vqfx10k, connected to one server. The key here is that the server has two connections to the vqfx. One interface is in the default instance, one is in a “Customer” routing instance:

network_overview

Here’s the routing-instance config:

1
2
3
4
5
6
7
8
[email protected]> show configuration routing-instances
Customer {
    instance-type virtual-router;
    interface xe-0/0/1.0;
}

{master:0}
[email protected]>

And here’s my SNMP configuration:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
[email protected]>  Continue reading

New Role with Valve

I have started a new role as a Network Engineer with Valve Corporation. My period of unemployment was short-lived, and I am gainfully employed once more.

Not Another Vendor?

Did I think about going to work for another vendor? Yes, I did. I thought a lot about what I want to do, and what type of company I want to work for. Small/medium/large, vendor/customer, Product Manager vs Engineer, etc.

For now, I decided I want to solve business problems using whichever tools are appropriate, rather than building and selling a single product. I didn’t want to work for a company that just consumes technology though. I want to work somewhere that has interesting problems, and will do whatever is needed to solve those problems - build/buy/cobble together.

Why Valve?

Valve is big enough to offer the right level of challenge, but also small enough that I can make a difference. I’m not lost in the machine, but I am working on a global network.

Valve is also quite a different company. Check out the Employee Handbook to get a sense of Continue reading

Netlify Migration

This blog is now hosted via Netlify, rather than GitHub Pages. It is still built using Jekyll, but I updated the theme to Mediumish.

I looked at switching to Hugo for site generation, but I hit several bugs trying to do the import, and theme setup was a pain. So I stuck with Jekyll, because it’s doing what I need. Using Netlify gives me a few more options around build and deploy, and moves away from Cloudflare.

All URLs and RSS feed should remain the same. Let me know if you see any issues.

CCIE Renewed Once More – Exam 400-101 v5.1

I’ve given in to the Sunk Cost Fallacy once more: I’ve renewed my CCIE. There was a lot of foot dragging this time around, and I only had four months to spare. But it’s done, for another year. Here’s some quick notes on my prep, and thoughts on the exam.

Preparation

I decided to sit the CCIE R&S Written Exam to renew. This was the easiest route for me. I don’t use Cisco products on a day to day basis, so certifying with a different track would be very hard for me.

The version hasn’t changed since the last time I sat it. It’s still 400-100, v5.1. The only difference is that the “Evolving Technologies” section has been tweaked a little. Think Automation toolsets, Cloud concepts, etc.

I used the study guide I purchased last time from “CCIE in 8 Weeks”. I also re-subscribed to their online practice exams. I meant to only subscribe for 3 months, but…I couldn’t get motivated to do this exam. I ended up paying for another 3 months access, before I finally knuckled down and did the study while I was on vacation. I flicked through my old CCIE flashcards a few times too.

Continue reading

CCIE Renewed Once More – Exam 400-101 v5.1

I’ve given in to the Sunk Cost Fallacy once more: I’ve renewed my CCIE. There was a lot of foot dragging this time around, and I only had four months to spare. But it’s done, for another year. Here’s some quick notes on my prep, and thoughts on the exam.

Preparation

I decided to sit the CCIE R&S Written Exam to renew. This was the easiest route for me. I don’t use Cisco products on a day to day basis, so certifying with a different track would be very hard for me.

The version hasn’t changed since the last time I sat it. It’s still 400-100, v5.1. The only difference is that the “Evolving Technologies” section has been tweaked a little. Think Automation toolsets, Cloud concepts, etc.

I used the study guide I purchased last time from “CCIE in 8 Weeks”. I also re-subscribed to their online practice exams. I meant to only subscribe for 3 months, but…I couldn’t get motivated to do this exam. I ended up paying for another 3 months access, before I finally knuckled down and did the study while I was on vacation. I flicked through my old CCIE flashcards a few times too.

Continue reading

CCIE Renewed Once More – Exam 400-101 v5.1

I’ve given in to the Sunk Cost Fallacy once more: I’ve renewed my CCIE. There was a lot of foot dragging this time around, and I only had four months to spare. But it’s done, for another year. Here’s some quick notes on my prep, and thoughts on the exam.

Preparation

I decided to sit the CCIE R&S Written Exam to renew. This was the easiest route for me. I don’t use Cisco products on a day to day basis, so certifying with a different track would be very hard for me.

The version hasn’t changed since the last time I sat it. It’s still 400-100, v5.1. The only difference is that the “Evolving Technologies” section has been tweaked a little. Think Automation toolsets, Cloud concepts, etc.

I used the study guide I purchased last time from “CCIE in 8 Weeks”. I also re-subscribed to their online practice exams. I meant to only subscribe for 3 months, but…I couldn’t get motivated to do this exam. I ended up paying for another 3 months access, before I finally knuckled down and did the study while I was on vacation. I flicked through my old CCIE flashcards a few times too.

Continue reading

CLI Still Sucks for Automation

Using network CLI for automation has always been fragile. But it keeps surprising me with the way it breaks. This time, it was a combination of Ansible, Arista, replace: config and terminal length used as a config command.

The Problem

I often hang out in the NTC Slack channel. A user reported they were having a problem with Ansible and EOS. Basic changes worked, but when they used eos_config with the replace: config option, it just timed out. We knew basic authentication & connectivity was fine, it had to be something else.

But it made no sense, because these modules are widely used. What’s going on?

Background #1: Pagination

Some commands produce more than one screen’s worth of output - for example, show run can be hundreds of lines long. Most screens don’t have hundreds of lines, so pagination is used. The network Continue reading

CLI Still Sucks for Automation

Using network CLI for automation has always been fragile. But it keeps surprising me with the way it breaks. This time, it was a combination of Ansible, Arista, replace: config and terminal length used as a config command.

The Problem

I often hang out in the NTC Slack channel. A user reported they were having a problem with Ansible and EOS. Basic changes worked, but when they used eos_config with the replace: config option, it just timed out. We knew basic authentication & connectivity was fine, it had to be something else.

But it made no sense, because these modules are widely used. What’s going on?

Background #1: Pagination

Some commands produce more than one screen’s worth of output - for example, show run can be hundreds of lines long. Most screens don’t have hundreds of lines, so pagination is used. The network Continue reading

CLI Still Sucks for Automation

Using network CLI for automation has always been fragile. But it keeps surprising me with the way it breaks. This time, it was a combination of Ansible, Arista, replace: config and terminal length used as a config command.

The Problem

I often hang out in the NTC Slack channel. A user reported they were having a problem with Ansible and EOS. Basic changes worked, but when they used eos_config with the replace: config option, it just timed out. We knew basic authentication & connectivity was fine, it had to be something else.

But it made no sense, because these modules are widely used. What’s going on?

Background #1: Pagination

Some commands produce more than one screen’s worth of output - for example, show run can be hundreds of lines long. Most screens don’t have hundreds of lines, so pagination is used. The network Continue reading

Our Green Card Journey

We are now Lawful Permanent Residents of the United States - aka Green Card Holders. It took a few years to get to this point. Here’s our timeline, why we did it, what it means for us, and what next.

Timeline

I first moved to the US on an L-1B visa. This is an intra-company transfer visa, that let me move to the US to continue working for Brocade.

  • May 2015 - Began work for Brocade, based in New Zealand.
  • Jul 2016 - Received L-1B visa, allowing us to move to US.
  • Aug 2016 - Moved from New Zealand to US.
  • Nov 2016 - Broadcom announces intention to acquire Brocade
  • Nov 2016 - Green Card process initiated - Department of Labour certification filed.
  • Jul 2017 - PERM filed.
  • Oct 2017 - Extreme Network acquired my business unit. I remained employee of Broadcom.
  • Nov 2017 - PERM approved.
  • Jan 2018 - Received permission to transfer L-1 visa to Extreme Networks.
  • Feb 2018 - I-140 and I-485 submitted.
  • Sep 2018 - I-140 approved.
  • Feb 2019 - I-485 interview scheduled.
  • Mar 2019 - I-485 interview held. Lots of questions, confirming details & history, but all straightforward.
  • One week later: cards in hand

Total Continue reading

Our Green Card Journey

We are now Lawful Permanent Residents of the United States - aka Green Card Holders. It took a few years to get to this point. Here’s our timeline, why we did it, what it means for us, and what next.

Timeline

I first moved to the US on an L-1B visa. This is an intra-company transfer visa, that let me move to the US to continue working for Brocade.

  • May 2015 - Began work for Brocade, based in New Zealand.
  • Jul 2016 - Received L-1B visa, allowing us to move to US.
  • Aug 2016 - Moved from New Zealand to US.
  • Nov 2016 - Broadcom announces intention to acquire Brocade
  • Nov 2016 - Green Card process initiated - Department of Labour certification filed.
  • Jul 2017 - PERM filed.
  • Oct 2017 - Extreme Network acquired my business unit. I remained employee of Broadcom.
  • Nov 2017 - PERM approved.
  • Jan 2018 - Received permission to transfer L-1 visa to Extreme Networks.
  • Feb 2018 - I-140 and I-485 submitted.
  • Sep 2018 - I-140 approved.
  • Feb 2019 - I-485 interview scheduled.
  • Mar 2019 - I-485 interview held. Lots of questions, confirming details & history, but all straightforward.
  • One week later: cards in hand

Total Continue reading

Our Green Card Journey

We are now Lawful Permanent Residents of the United States - aka Green Card Holders. It took a few years to get to this point. Here’s our timeline, why we did it, what it means for us, and what next.

Timeline

I first moved to the US on an L-1B visa. This is an intra-company transfer visa, that let me move to the US to continue working for Brocade.

  • May 2015 - Began work for Brocade, based in New Zealand.
  • Jul 2016 - Received L-1B visa, allowing us to move to US.
  • Aug 2016 - Moved from New Zealand to US.
  • Nov 2016 - Broadcom announces intention to acquire Brocade
  • Nov 2016 - Green Card process initiated - Department of Labour certification filed.
  • Jul 2017 - PERM filed.
  • Oct 2017 - Extreme Network acquired my business unit. I remained employee of Broadcom.
  • Nov 2017 - PERM approved.
  • Jan 2018 - Received permission to transfer L-1 visa to Extreme Networks.
  • Feb 2018 - I-140 and I-485 submitted.
  • Sep 2018 - I-140 approved.
  • Feb 2019 - I-485 interview scheduled.
  • Mar 2019 - I-485 interview held. Lots of questions, confirming details & history, but all straightforward.
  • One week later: cards in hand

Total Continue reading

Replacement Strips for Screen Privacy Filter

I use a Privacy Filter on my laptop screen when traveling. I’m doing a bit of time on planes these days, and it makes a big difference. Most of my code is Open Source, but other content is proprietary. High chance of competitors being on the same plane as me, so better to make it harder for others to see.

The only problem with these screens is that if you frequently take it off like I do, the adhesive strips collect dust, and stop sticking after a while. Recently someone asked me how to get them replaced.

3M does not sell replacement strips…but they do something even better: they give them away for free. Pretty cool ah?

Just go here, fill in the details, and they’ll send you some more. How good is that?

Replacement Strips for Screen Privacy Filter

I use a Privacy Filter on my laptop screen when traveling. I’m doing a bit of time on planes these days, and it makes a big difference. Most of my code is Open Source, but other content is proprietary. High chance of competitors being on the same plane as me, so better to make it harder for others to see.

The only problem with these screens is that if you frequently take it off like I do, the adhesive strips collect dust, and stop sticking after a while. Recently someone asked me how to get them replaced.

3M does not sell replacement strips…but they do something even better: they give them away for free. Pretty cool ah?

Just go here, fill in the details, and they’ll send you some more. How good is that?

Replacement Strips for Screen Privacy Filter

I use a Privacy Filter on my laptop screen when traveling. I’m doing a bit of time on planes these days, and it makes a big difference. Most of my code is Open Source, but other content is proprietary. High chance of competitors being on the same plane as me, so better to make it harder for others to see.

The only problem with these screens is that if you frequently take it off like I do, the adhesive strips collect dust, and stop sticking after a while. Recently someone asked me how to get them replaced.

3M does not sell replacement strips…but they do something even better: they give them away for free. Pretty cool ah?

Just go here, fill in the details, and they’ll send you some more. How good is that?

1 2 3 4