Lindsay Hill

Author Archives: Lindsay Hill

Root Cause Analysis – It’s Not Perfect

Automated Root Cause Analysis promises a lot. High-end network monitoring systems promise that they can automatically isolate network problems, and only tell you about the thing that needs fixing. This sounds very enticing. Who wants a flood of alarms, when we could get just one alarm, telling us what we need to fix? But it’s not perfect, and you do need to pay attention to it.

Consider this contrived network:

RCA Example

What happens if the upstream link from the router fails?

RCA Link Down

From the perspective of the NMS, all systems at that site are unreachable. A simple NMS that is unaware of topology will create 4 alarms – one for each of the router, the switches and the server. A smarter NMS will recognise that it only needs one alarm, for the router WAN link being unreachable (and therefore the whole site is offline). It will know that the switches and server are unreachable, but those alarms will be suppressed by the key incident.

This all sounds like a good idea. Why wouldn’t you want that?

But what if the NMS view of the network is incomplete? What might happen then?

Consider the same network as above, but this time a new WAN router has been Continue reading

BYOD: Just another money-grab?

BYOD policies sound alluring. No more forced use of a crappy old corporate laptop – “hey look, we’ll let you choose whatever you want!” But I think it is a way to shift the cost burden over to employees. It will be done slowly, over several years, and we’ll welcome it. But it will lead to employees carrying more costs. I guess we should be careful with what we wish for.

In my teens I spent many years working in the produce & butchery departments at a local supermarket. When I started out, the contracts still had the last vestiges of union-dominated times. So we got paid allowances for laundry, extra allowances if we’d passed some school exams, higher rates for overtime, meal allowances, etc. During the years I was there, these were eroded. Each year they gave us pay rises that were nominally higher than inflation, and yet another allowance was ‘incorporated’ into my wages. Sometimes allowances would remain for older employees. When I left, I was being paid significantly more than new employees, in part because I still had several extra allowances.

I think we’ll see the same thing with BYOD programs. I think it will go like this:

  1. Announce BYOD Continue reading

In Praise of Support Lifecycles

If you’re just starting out working with ‘Enterprise’ products, you may not have come across Support Lifecycles. It’s important to know what these are, and how it affects you. They can have both a positive & a negative impact on when and why you choose to upgrade systems.

What Are Support Lifecycles?

Developers would like to only support the latest version. But customers can’t/won’t always run the latest version. They need to know that they can expect a certain level of support for the version they’re running. As a compromise, software vendors will publish a support lifecycle policy. This will outline the levels of support a product gets, from new product introduction, through to being superceded, and finally moved to end of support. Typical phases include:

  • General Support: Product is in General Availability phase, and is fully supported. You can log support cases, search KB articles, and expect both functionality enhancement and bugfix patches. The current product version will always be in this phase, and typically 1-2 major versions behind will also be included.
  • Limited Support: You can log a support case, and we’ll try to help, but we’re not planning any new patches, and you’ll probably get a suggestion Continue reading

Shellshock: One Month On

Shellshock was released a little over a month ago, to wide predictions of doom & gloom. But somehow the Internet survived, and we lurch on towards the next crisis. I recently gave a talk about Shellshock, the fallout, and some thoughts on wider implications and the future. The talk wasn’t recorded, so here’s a summary of what was discussed.

Background: NZ ISIG: Keeping it Local

The New Zealand Information Security Interest Group (ISIG) runs monthly meetings in Auckland and Wellington. They’re open to all, and are fairly informal affairs. There’s usually a presentation, with a wide-ranging discussion about security topics of the day. No, we don’t normally discuss “picking padlocks, debating whose beard or ponytail is better or which martial art/fitness program is cooler.”

Attend enough meetings, and sooner or later you’ll be called upon to present. I was ‘volunteered’ to speak on Shellshock, about a month after the exploit was made public. I didn’t talk about the technical aspects of the exploit itself – instead I explored some of the wider implications, and industry trends. I felt the talk went well, mainly because it wasn’t just me talking: everyone got involved and contributed to the discussion. It would be a bit Continue reading

Disappointed With Check Point

I have recently started working with Check Point products again, after a 5-year break. This has given me a different perspective on how they are progressing. It has been disappointing to see that they’re still suffering from some of the same old bugs. Some of the core functionality is now showing its age, and is no longer appropriate for modern networks.

When you’re using a product or technology on a regular basis, it can be hard to accurately gauge progress. Maybe it feels like there are only incremental changes, with nothing major happening. But then you come across a 5-year old system, and you realise just how far we’ve come. If you don’t think iOS is changing much, find some videos of the first iPhones.

The opposite is when it feels like there are many regular enhancements…but when you step back you see that core product issues are not dealt with. It can be hard to see this when you’re working at the coal-face. You need to step away, work with other products and systems, then return.

That’s what I’ve done with Check Point recently. Through much of the 2000s, I did a huge amount of work with Check Point firewalls. Continue reading

Don’t Be Afraid of Changing Jobs

Some people are corporate survivors, sticking with one company for decades. Some people move around when it suits, while others would like to move, but are fearful of change. Here’s a few things I’ve learnt about adapting to new work environments. It’s not that scary.

Corporate Survivors

We’ve all seen the people who seem to survive in a corporate environment. They seem to know everyone, and almost everything about the business. Return to a company after 10 years, and they’re still there. Somehow they survive, through mergers, acquisitions, and round after round of re-organisation. But often they seem to be doing more or less the same job for years, with little change.

Why Do People Stay?

There’s four possible reasons for staying at a job for a long time:

  1. You’re really happy with what you do, and you’re well looked after.
  2. You just don’t care. You come to work to eat your lunch and talk to your friends. You don’t care how you’re treated, or what work you do, as long as you get paid.
  3. This is the only possible job you can get, due to location/skills/whatever.
  4. You’re comfortable where you are, and you’re scared of moving, scared of what Continue reading

HP SDN App Store Launches

HP’s SDN App Store has finally seen the light of day. This is intended to be a common platform for users and developers, to find and distributed real-world, practical SDN applications. Some of the launch apps include:

It’s interesting to look at the price points for applications. They are certainly not $0.99 apps, but they are still cheaper than typical ‘Enterprise’ software. I think it will take us a while to figure out what the right level of ‘value’ is.

HP has done well to put together a platform that developers can use to distribute SDN applications. It’s not an easy task to put together all of the back-end work required for something like this. It’s not simply hosting a website, it’s figuring out all the legal & financial implications, the support mechanisms, etc. There’s a lot of non-technical effort that goes into this.

The only challenge is that currently it is for SDN apps that use the HP VAN SDN Controller, which will limit the size of the market. I’m hoping that in future it will work with OpenDaylight. That will expand Continue reading

Utility-Based Pricing Troubles Me

Utility, or Consumption-Based pricing models offer an interesting way of matching costs to revenues. But if they’re not managed well, customer costs could blow out just trying to keep the lights on. We’ve come to expect rapidly declining hardware prices. Have vendors realised their utility prices need to decline at a similar rate?

I’ve been doing more architecture work over the last twelve months, and this has changed some of my thinking about technology. Previously I was only really interested in speeds & feeds, and technical capabilities. Scaling was only about how to add capacity – not what it would cost. When I looked at costs, it was just to shake my head at the ridiculous prices charged for things like a second power supply.

But now I find myself interested in things like cost curves, and trying to figure out how my costs will change as demand changes. The ideal is for their to be a clear relationship between costs & revenue, hopefully with costs growing at a slower rate than demand (and revenue).

Previously we had high upfront costs to buy hardware and software, and we aimed to amortise it over the life of the service. Our costs Continue reading

Knowing Your Audience…and Showing It

We all know that you’re supposed to “Know Your Audience.” Doing so improves engagement, and avoids faux pas like “Suggested Tweets.” But recently I realised that this doesn’t have to be subtle. Drop hints early on in your presentation that you’ve taken the time to understand the audience – it can really lift the mood.

Suggested Tweets – Just Say No

Companies that obsess about the wrong kind of metrics think that all they need is to get their message repeated many times. So they give employees & partners a list of “suggested tweets.” These are pre-written Tweets that people can send out from their own Twitter accounts, to “generate buzz.” I have seen many companies do this, and it is overwhelmingly lame. It devalues the message, and devalues those who send out these “suggested tweets.”

In the lead-up to the recent Cisco UCS event, many members of the Cisco Champions program sent out the same set of tweets. When I see the same tweet from several people in my stream, it’s obvious what’s going on. If you’re running a marketing Twitter account, then yeah, I expect marketing messages. But if you’re a real person, and I’ve Continue reading

The Chassis Switch is Dead

The Chassis Switch is Dead. For most networks, chassis-based switches are no longer appropriate due to cost, inflexibility and risk. I see this as similar to servers, in that server blade chassis are no longer appropriate for most organisations. The alternatives are already better for cost & flexibility. The real question is what our management model will look like for those alternatives.

Dead Collector: ‘Ere, he says he’s not dead.
Leaf-Spine: Yes he is.
Chassis: I’m not.
Dead Collector: He isn’t.
Leaf-Spine: Well, he will be soon, he’s very ill.
Chassis: I’m getting better.
Leaf-Spine: No you’re not, you’ll be stone dead in a moment.

(With apologies to Monty Python)

Blade Servers…

In the late 1990s, and early 2000s, server buying patterns changed significantly. Previously we had a few “Big Iron” Unix systems, but cheaper Intel-based systems changed the economics dramatically. This lead to a rapid sprawl in the number of physical servers.

In the second half of the 2000s, server blades appeared as a seductive answer. They promised simpler management of pools of systems, greater density, better efficiencies, and operational cost savings. Vendors promised long term “investment protection”, assuring us that we could keep the chassis, and upgrade blades Continue reading

No More Single Panes of Glass

The term “Single Pane of Glass” became something of a running joke during Network Field Day 8. The term has become over-used & abused, and it’s time we stopped using it. Time to find better terminology.

According to TechTarget:

A single pane of glass is a phrase used by information technology (IT) marketers to describe a management console that integrates information from multiple components into a unified display

All my information in one place? Sounds good, right? I like Single Panes of Glass. I like them a lot. In fact, I like them so much, I have several. Vendors like them too, so they’ve all got one.

And there’s the rub. The term is over-loaded, with every vendor using the term to describe their management console that can be used for managing all of their systems. The problem is that most vendors only see things from the perspective of their products. They don’t see things from the wider perspective of an organisation that is trying to use many different products to achieve business outcomes.

So the network vendor has a Single Pane of Glass (SPoG) that manages all the network, the MDM vendor has their SPoG for managing mobile Continue reading

ThousandEyes – NOC for the Internet?

ThousandEyes is a network monitoring company that provides application performance visibility across the Internet. They don’t just show how an application is performing, but can identify where across the Internet issues are occurring. Ethan Banks has written up some of the use cases. Recently I realised I could start thinking of them as a “NOC for the Internet.”

I was fortunate enough to attend Network Field Day 8, where ThousandEyes was one of the presenters. During their presentation Mohit Lad gave a demonstration of using ThousandEyes to investigate performance issues:

The problem with troubleshooting issues across the Internet is that it’s hard to get the complete visibility you need to track down where issues are happening. ThousandEyes helps, by giving you more viewpoints, but there’s still limits. Most of us can’t afford to run tests from hundreds of different public & private locations.

Interpreting data is also a challenge. ThousandEyes has done their best to make the data usable, but you might not have the networking resources to be able to fully understand what’s going on. You need both wider visibility, and the experience to fully interpret it.

That’s why I was very pleased to hear the exchange starting Continue reading

Let People Choose Their Own Tools

Why is it that people will pay a lot of money for a consultant’s time and expertise, but then hobble them by limiting the tools they can use?

Chris Wahl has written about learning to cope with the default tools and settings:

It’s almost a given that anything I own – personally or via my employer – will not be allowed to touch any piece of software or hardware in the average client environment. It causes too many headaches with compliance rule sets like Sarbanes-Oxley (SOX)…

This means that I’ve come to rely on whatever tools are universally available. Let’s take PowerShell for example. I have an entire library of scripts that I’ve written over the past several years. More often than not I end up using the vSphere Client or ESXi Shell instead because I can’t get to my scripts. If it’s a highly repetitious task I may just re-create a script by hand, but more often than not, it’s not worth the effort.

I’ve posted similar things to IEOC about the use of aliases on network gear:

I’m a consultant, so I work on a variety of different systems, and can’t rely on having a large list of aliases Continue reading

Rant: Just stop it with the TFTP

TFTP was first defined in 1980. That is a very long time ago in IT, and while it’s s had a good run, it’s time for network engineers to stop using TFTP. It’s slow, insecure, and there are better options available.

TFTP is an unauthenticated, plain-text file transfer protocol. It is commonly used by network engineers to transfer switch configs, or IOS images. No passwords required, just a straight “Get this file ” or “Put this file ”. It uses UDP to transfer data. It is designed to be very simple, and light-weight. This is a large part of why it was popular – TFTP servers or clients could be implemented in low-powered devices, such as switches, VoIP phones, etc. Some systems also use it as part of an initial boot, where TFTP is used to retrieve the initial boot environment.

The main complaints I hear from engineers are “How do I get a TFTP server set up?”, and “Why is this taking so long to transfer?” Server configuration is just a Google exercise, but let’s look at file transfer speed.

Speedy? Not so much

For this test, I have a CentOS 6.x VM running on my laptop. I’m downloading Continue reading

Vocus Acquisition of FX: Good for Customers?

Consolidation is happening in the New Zealand wholesale ISP market, with Vocus acquiring FX. Consolidation can lead to less competition, or it can strengthen it, by making players stronger and more viable. This acquisition should strengthen the market, and hopefully open up new service offerings.

In July Vocus Communications announced its intention to acquire FX Networks. From the press release:

FX owns a unique and high quality fibre optic network consisting of 4,132 kms of modern ducted fibre cable covering both the North and South Islands of New Zealand. The company has 365 customers including 43 of the Top 100 companies in New Zealand.

Vocus will acquire FX for an enterprise value of NZ$115.8m (~A$107.7m). The FX business is expected to deliver NZ$13.5-$14.5m of EBITDA in the first 12 months post acquisition (excluding transaction and integration expenses).

The combination of Vocus and FX strengthens both businesses. Vocus will emerge as the third largest network operator in NZ and the clear leader in trans-Tasman telecommunications and data centres.

Vocus has their own fibre network around Australia, and has a significant international network, with high-level peering. In 2012 they purchased Maxnet, a New Zealand ISP and Data Center Continue reading

CPUG, and The Risk of Single-Admin Communities

CPUG, a Check Point user forum, is near death. The owner has been forced to get rid of it, but rather doing a graceful handover, it has been shut down pending a possible sale. This is a great shame, and it highlights the risks of contributing to a forum controlled by a single person.

CPUG.org started out as an independent Check Point forum in around 2005. It was seeded with Phoneboy’s original FW–1 FAQs, and quickly became the premier independent source of Check Point information. If you had a Check Point problem, chances were you could get a quick answer there.

I used to do a lot of Check Point work, and so I knew a fair bit about it. I had the time, knowledge, and the desire to help the community, so I got involved with CPUG, and became a top contributor. I put a huge amount of effort into it over the years, and hopefully I helped solve a few people’s problems. I have moved away from contributing recently, for various reasons.

At its best, the forum was a fantastic resource, where many of the smartest people were working to help solve the trickiest issues. It became Continue reading

HP OMW: Still Kicking, But Only Just

A year ago I asked “Has HP Abandoned Operations Manager?” There had been no significant development for a long time, and the signs were that HP was moving away from OM to OMi.

Last week HP made a move that confirms my original thinking: It’s dead (it just doesn’t know it yet). HP released a Customer Letter announcing an extension to the “End of Committed Support” date, from December 31, 2016 to June 30, 2018:

HP is committed to providing the highest level of customer care to you while you determine your future strategy for your HP Operations Manager for Windows 9.0x & HP Operations Manager for Windows Basic Suite 9.1x products.

(emphasis mine)

That’s right, no new version announcement, just extending support for the current version. Implication: no new versions coming any time soon.

Applying a few volts to OMW 9.0

HP has released patches OMW_00185 and OMW_00187 for OMW 9.0. These include the usual bugfixes, and these enhancements:

  • Web console enhancements resulting in feature parity with the MMC console while offering significant performance advantages
  • Management Server platform support extension to Windows Server 2012 and Windows Server 2012 R2
  • MMC Console Continue reading

HP NNMi 10.00 Released

HP NNMi version 10.0 has been released. This is a good release, with many usability enhancements. I’m pleased to see continued development, as the future nirvana of all-powerful software defined networks hasn’t quite arrived yet. For now, we still have to manage our networks the old-fashioned way: SNMP is still alive & kicking.

NNMi – Background

HP NNMi is a spiritual descendant of HP OpenView, one of the first network monitoring tools. Between versions 6 and 7, HP completely re-wrote the NNM code, and now we have NNMi. The core product performs network discovery and fault monitoring. Add-on components (iSPIs) offer performance monitoring, NetFlow analysis, IP SLA monitoring, etc. A sister-product (HP Network Automation) is used for network configuration management. The add-on components were all separately licensed, but HP now bundles products together.

Historically NNMi has focused on underlying network monitoring capabilities, and less on the user interface. This meant that almost anything was technically possible, but the visual experience was underwhelming. The integration between core product and add-on components was limited.

The last major release was 9.20, in June 2012. There have been minor enhancements and fixes since, but the last patch was in September 2013. We’ve been due for Continue reading

Screen Scraping: Still Sucks

I’ve written before about “Why Screen Scraping Sucks.” Well, I can report that nothing has changed. It still sucks. This time I got caught out by the changed behaviour of the “logging host” command.

Compliance Checks

At a customer site I use HP IMC to perform compliance checks across HP and Cisco networking gear. This has a set of rules that get run against the latest device backups. I have various rules that look for specific patterns – making sure they do, or don’t exist, as required.

My systems should all have two log servers defined. The configs should look something like this:

Rack1SW1#sh run | inc ^logg
logging 1.1.1.1
logging 2.2.2.2

So I defined an IMC compliance rule that looked for the existence of “logging 1.1.1.1″ and “logging 2.2.2.2″. I’m using the Advanced mode, which uses regex matching, so I need to escape the “.”.

This worked well. It alerted on systems that had the incorrect (or no) destinations defined.

Wait a minute…I thought you said “logging host”?

Turns out that “logging X.X.X.X” was the original form of this command. At 12.3(14)T, Cisco changed Continue reading

War Stories: Gratuitous ARP and VRRP

Continuing our theme of ARP-related war stories, here’s another ARP/switching behaviour I’ve come across. This particular problem didn’t result in any outages, but the network wasn’t working as well as it should have, and started flooding frames unexpectedly. Here’s what was going on:

The Network

Breaking the network down to its simplest level, it looked like this:

VRRP and ARP

The two routers were a VRRP pair. Router-A was 100.100.100 .11, Router-B was 100.100.100.12, and the virtual IP was 100.100.100.1. These acted as a default gateway for the client LAN. PCs connected to the client LAN got their network configuration from DHCP, and set their default gateway to 100.100.100.1. Using this, they were able to get access to resources behind the routers, such as Server-1 at 200.200.200.200. All worked well.

Obviously there was a lot more to the network than what I’ve shown here, but it’s not important.

The Issue

I said it was working well – so what was wrong? One day I was using Wireshark to diagnose a network issue between PC-A and Server-1. I ran Wireshark on PC-A, with a capture filter of “host 200.200.200.200″. The packet flow Continue reading