Archive

Category Archives for "LINDSAY HILL"

The Chassis Switch is Dead

The Chassis Switch is Dead. For most networks, chassis-based switches are no longer appropriate due to cost, inflexibility and risk. I see this as similar to servers, in that server blade chassis are no longer appropriate for most organisations. The alternatives are already better for cost & flexibility. The real question is what our management model will look like for those alternatives.

Dead Collector: ‘Ere, he says he’s not dead.
Leaf-Spine: Yes he is.
Chassis: I’m not.
Dead Collector: He isn’t.
Leaf-Spine: Well, he will be soon, he’s very ill.
Chassis: I’m getting better.
Leaf-Spine: No you’re not, you’ll be stone dead in a moment.

(With apologies to Monty Python)

Blade Servers…

In the late 1990s, and early 2000s, server buying patterns changed significantly. Previously we had a few “Big Iron” Unix systems, but cheaper Intel-based systems changed the economics dramatically. This lead to a rapid sprawl in the number of physical servers.

In the second half of the 2000s, server blades appeared as a seductive answer. They promised simpler management of pools of systems, greater density, better efficiencies, and operational cost savings. Vendors promised long term “investment protection”, assuring us that we could keep the chassis, and upgrade blades Continue reading

No More Single Panes of Glass

The term “Single Pane of Glass” became something of a running joke during Network Field Day 8. The term has become over-used & abused, and it’s time we stopped using it. Time to find better terminology.

According to TechTarget:

A single pane of glass is a phrase used by information technology (IT) marketers to describe a management console that integrates information from multiple components into a unified display

All my information in one place? Sounds good, right? I like Single Panes of Glass. I like them a lot. In fact, I like them so much, I have several. Vendors like them too, so they’ve all got one.

And there’s the rub. The term is over-loaded, with every vendor using the term to describe their management console that can be used for managing all of their systems. The problem is that most vendors only see things from the perspective of their products. They don’t see things from the wider perspective of an organisation that is trying to use many different products to achieve business outcomes.

So the network vendor has a Single Pane of Glass (SPoG) that manages all the network, the MDM vendor has their SPoG for managing mobile Continue reading

ThousandEyes – NOC for the Internet?

ThousandEyes is a network monitoring company that provides application performance visibility across the Internet. They don’t just show how an application is performing, but can identify where across the Internet issues are occurring. Ethan Banks has written up some of the use cases. Recently I realised I could start thinking of them as a “NOC for the Internet.”

I was fortunate enough to attend Network Field Day 8, where ThousandEyes was one of the presenters. During their presentation Mohit Lad gave a demonstration of using ThousandEyes to investigate performance issues:

The problem with troubleshooting issues across the Internet is that it’s hard to get the complete visibility you need to track down where issues are happening. ThousandEyes helps, by giving you more viewpoints, but there’s still limits. Most of us can’t afford to run tests from hundreds of different public & private locations.

Interpreting data is also a challenge. ThousandEyes has done their best to make the data usable, but you might not have the networking resources to be able to fully understand what’s going on. You need both wider visibility, and the experience to fully interpret it.

That’s why I was very pleased to hear the exchange starting Continue reading

Let People Choose Their Own Tools

Why is it that people will pay a lot of money for a consultant’s time and expertise, but then hobble them by limiting the tools they can use?

Chris Wahl has written about learning to cope with the default tools and settings:

It’s almost a given that anything I own – personally or via my employer – will not be allowed to touch any piece of software or hardware in the average client environment. It causes too many headaches with compliance rule sets like Sarbanes-Oxley (SOX)…

This means that I’ve come to rely on whatever tools are universally available. Let’s take PowerShell for example. I have an entire library of scripts that I’ve written over the past several years. More often than not I end up using the vSphere Client or ESXi Shell instead because I can’t get to my scripts. If it’s a highly repetitious task I may just re-create a script by hand, but more often than not, it’s not worth the effort.

I’ve posted similar things to IEOC about the use of aliases on network gear:

I’m a consultant, so I work on a variety of different systems, and can’t rely on having a large list of aliases Continue reading

Rant: Just stop it with the TFTP

TFTP was first defined in 1980. That is a very long time ago in IT, and while it’s s had a good run, it’s time for network engineers to stop using TFTP. It’s slow, insecure, and there are better options available.

TFTP is an unauthenticated, plain-text file transfer protocol. It is commonly used by network engineers to transfer switch configs, or IOS images. No passwords required, just a straight “Get this file ” or “Put this file ”. It uses UDP to transfer data. It is designed to be very simple, and light-weight. This is a large part of why it was popular – TFTP servers or clients could be implemented in low-powered devices, such as switches, VoIP phones, etc. Some systems also use it as part of an initial boot, where TFTP is used to retrieve the initial boot environment.

The main complaints I hear from engineers are “How do I get a TFTP server set up?”, and “Why is this taking so long to transfer?” Server configuration is just a Google exercise, but let’s look at file transfer speed.

Speedy? Not so much

For this test, I have a CentOS 6.x VM running on my laptop. I’m downloading Continue reading

Vocus Acquisition of FX: Good for Customers?

Consolidation is happening in the New Zealand wholesale ISP market, with Vocus acquiring FX. Consolidation can lead to less competition, or it can strengthen it, by making players stronger and more viable. This acquisition should strengthen the market, and hopefully open up new service offerings.

In July Vocus Communications announced its intention to acquire FX Networks. From the press release:

FX owns a unique and high quality fibre optic network consisting of 4,132 kms of modern ducted fibre cable covering both the North and South Islands of New Zealand. The company has 365 customers including 43 of the Top 100 companies in New Zealand.

Vocus will acquire FX for an enterprise value of NZ$115.8m (~A$107.7m). The FX business is expected to deliver NZ$13.5-$14.5m of EBITDA in the first 12 months post acquisition (excluding transaction and integration expenses).

The combination of Vocus and FX strengthens both businesses. Vocus will emerge as the third largest network operator in NZ and the clear leader in trans-Tasman telecommunications and data centres.

Vocus has their own fibre network around Australia, and has a significant international network, with high-level peering. In 2012 they purchased Maxnet, a New Zealand ISP and Data Center Continue reading

CPUG, and The Risk of Single-Admin Communities

CPUG, a Check Point user forum, is near death. The owner has been forced to get rid of it, but rather doing a graceful handover, it has been shut down pending a possible sale. This is a great shame, and it highlights the risks of contributing to a forum controlled by a single person.

CPUG.org started out as an independent Check Point forum in around 2005. It was seeded with Phoneboy’s original FW–1 FAQs, and quickly became the premier independent source of Check Point information. If you had a Check Point problem, chances were you could get a quick answer there.

I used to do a lot of Check Point work, and so I knew a fair bit about it. I had the time, knowledge, and the desire to help the community, so I got involved with CPUG, and became a top contributor. I put a huge amount of effort into it over the years, and hopefully I helped solve a few people’s problems. I have moved away from contributing recently, for various reasons.

At its best, the forum was a fantastic resource, where many of the smartest people were working to help solve the trickiest issues. It became Continue reading

HP OMW: Still Kicking, But Only Just

A year ago I asked “Has HP Abandoned Operations Manager?” There had been no significant development for a long time, and the signs were that HP was moving away from OM to OMi.

Last week HP made a move that confirms my original thinking: It’s dead (it just doesn’t know it yet). HP released a Customer Letter announcing an extension to the “End of Committed Support” date, from December 31, 2016 to June 30, 2018:

HP is committed to providing the highest level of customer care to you while you determine your future strategy for your HP Operations Manager for Windows 9.0x & HP Operations Manager for Windows Basic Suite 9.1x products.

(emphasis mine)

That’s right, no new version announcement, just extending support for the current version. Implication: no new versions coming any time soon.

Applying a few volts to OMW 9.0

HP has released patches OMW_00185 and OMW_00187 for OMW 9.0. These include the usual bugfixes, and these enhancements:

  • Web console enhancements resulting in feature parity with the MMC console while offering significant performance advantages
  • Management Server platform support extension to Windows Server 2012 and Windows Server 2012 R2
  • MMC Console Continue reading

HP NNMi 10.00 Released

HP NNMi version 10.0 has been released. This is a good release, with many usability enhancements. I’m pleased to see continued development, as the future nirvana of all-powerful software defined networks hasn’t quite arrived yet. For now, we still have to manage our networks the old-fashioned way: SNMP is still alive & kicking.

NNMi – Background

HP NNMi is a spiritual descendant of HP OpenView, one of the first network monitoring tools. Between versions 6 and 7, HP completely re-wrote the NNM code, and now we have NNMi. The core product performs network discovery and fault monitoring. Add-on components (iSPIs) offer performance monitoring, NetFlow analysis, IP SLA monitoring, etc. A sister-product (HP Network Automation) is used for network configuration management. The add-on components were all separately licensed, but HP now bundles products together.

Historically NNMi has focused on underlying network monitoring capabilities, and less on the user interface. This meant that almost anything was technically possible, but the visual experience was underwhelming. The integration between core product and add-on components was limited.

The last major release was 9.20, in June 2012. There have been minor enhancements and fixes since, but the last patch was in September 2013. We’ve been due for Continue reading

Screen Scraping: Still Sucks

I’ve written before about “Why Screen Scraping Sucks.” Well, I can report that nothing has changed. It still sucks. This time I got caught out by the changed behaviour of the “logging host” command.

Compliance Checks

At a customer site I use HP IMC to perform compliance checks across HP and Cisco networking gear. This has a set of rules that get run against the latest device backups. I have various rules that look for specific patterns – making sure they do, or don’t exist, as required.

My systems should all have two log servers defined. The configs should look something like this:

Rack1SW1#sh run | inc ^logg
logging 1.1.1.1
logging 2.2.2.2

So I defined an IMC compliance rule that looked for the existence of “logging 1.1.1.1″ and “logging 2.2.2.2″. I’m using the Advanced mode, which uses regex matching, so I need to escape the “.”.

This worked well. It alerted on systems that had the incorrect (or no) destinations defined.

Wait a minute…I thought you said “logging host”?

Turns out that “logging X.X.X.X” was the original form of this command. At 12.3(14)T, Cisco changed Continue reading

War Stories: Gratuitous ARP and VRRP

Continuing our theme of ARP-related war stories, here’s another ARP/switching behaviour I’ve come across. This particular problem didn’t result in any outages, but the network wasn’t working as well as it should have, and started flooding frames unexpectedly. Here’s what was going on:

The Network

Breaking the network down to its simplest level, it looked like this:

VRRP and ARP

The two routers were a VRRP pair. Router-A was 100.100.100 .11, Router-B was 100.100.100.12, and the virtual IP was 100.100.100.1. These acted as a default gateway for the client LAN. PCs connected to the client LAN got their network configuration from DHCP, and set their default gateway to 100.100.100.1. Using this, they were able to get access to resources behind the routers, such as Server-1 at 200.200.200.200. All worked well.

Obviously there was a lot more to the network than what I’ve shown here, but it’s not important.

The Issue

I said it was working well – so what was wrong? One day I was using Wireshark to diagnose a network issue between PC-A and Server-1. I ran Wireshark on PC-A, with a capture filter of “host 200.200.200.200″. The packet flow Continue reading

What Happens When 20 Programs Poll The Network?

Packetpushers show 198 was a great episode about Network Automation. At one point, Greg asks:

“What happens when you’ve got 20 apps polling one device?”

Well, you might hit the same problem I did:

SECURITY-SSHD-6-INFO_GENERAL : Incoming SSH session rate limit exceeded

I have some Python scripts that poll performance and configuration data from a couple of ASR9Ks, and I was getting some gaps in my data. The scripts run on different polling cycles (some hourly, some every 15 minutes, etc). It wasn’t consistent, but now and then my script would fail to collect any data.

I dug into it, and found that I was hitting the default SSH rate limit of 60 per minute, calculated as 1 per second. Because I couldn’t control the exact scheduling of when my polls ran, I inserted a short random wait timer into some of them. That helped, and I had fewer failures, but it still wasn’t quite right.

So I used the command “ssh server rate-limit 120″ to allow 2 SSH connections per second. That has helped, and now I’m not getting any failures.

But it won’t be pretty if I do have 20 different apps all trying to poll at once.

(Yes, I know, I should Continue reading

ScienceLogic Global Network Manager

ScienceLogic 7.5 includes many enhancements and new features. One I’m interested in is “Global Manager” which can be used to massively scale out the ScienceLogic architecture. Here’s some more detail on why ScienceLogic introduced this feature, and what it does.

Problem: A Single Database

I’ve talked before about the ScienceLogic architecture, and noted that the Database can be a bottleneck:

You’ll notice that all the variations only ever have one “active” database at any one time. All the processing is done on this system, with the results replicated to the other databases. You can scale out your Collectors or User Interface by adding more servers – but you can’t scale out the core database. Right now you have to scale up the database – ie. allocate more RAM/CPU/IOPS. This gets around the performance bottlenecks, but comes at a cost.

In this diagram, we can see the database is at the heart of everything. We can have HA & DR options for it, but there is only ever one active DB:

Distributed Architecture - click for larger

Distributed Architecture – click for larger

We can have multiple web interfaces, but they all query the same database.

Solution: More Databases!

The new Global Manager option from Continue reading

1 9 10 11