Archive

Category Archives for "LINDSAY HILL"

Cumulus in the Campus?

Recently I’ve been idly speculating about how campus networking could be shaken up, with different cost and management models. A few recent podcasts have inspired some thoughts on how Cumulus Networks might fit into this.

In response to a PacketPushers podcast on HP Network Management, featuring yours truly, Kanat asks:

For me the benchmark of network management so far is Meraki Dashboard – stupid simple and feature rich…
Yes – it’s a niche product that only focuses on Campus scenarios, Yes – it only supports proprietary HW. But it offers pretty much everything network operator needs – detailed visibility, traffic policy engine with L7 capability, MDM and you can hit it and go full speed right away.

How long will it take HP to achieve that level of simplicity/usability?

He’s right about the Meraki dashboard. It’s fantastic. Fast to get set up, easy to use, it’s what others should aspire to. But there’s a catch: It only works with Meraki hardware. Keep paying your monthly bills, and all is well. But what if you’ve got non-Meraki hardware? Or what if you decide you don’t want to pay Meraki any more? What if Meraki goes out of business (unlikely, but still Continue reading

Accurate Dependency Mapping – One Day?

Recently I’ve been thinking about Root Cause Analysis (RCA), and how it’s not perfect, but there may be hope for the future.

The challenge is that Automated RCA needs an accurate, complete picture of how everything connects together to work well. You need to know all the dependencies between networks, storage, servers, applications, etc. If you have a full dependency mapping, you can start to figure out what the underlying cause of a fault is, or you can start doing ‘What If?’ scenario planning.

But once your network gets past a moderate size, it’s hard to maintain this sort of dependency mapping. Manual methods break down, and we look for automated means instead – but they have gaps and limitations.

Automated Mapping – Approaches & Limitations

Tools such as HP’s CMS suite attempt to discover all objects and dependencies using a combination of network scanning and agents. They’ll use things like ping, SNMP, WMI, nmap to identify systems and running services. Agents can then report more data about installed applications, configurations, etc.

Network sniffing can also be used to identify traffic flows. Most tools will also connect to common orchestration points, such as vCenter, or the AWS console, to Continue reading

Fixed-Price, or T&M?

Recently I posted about Rewarding Effort vs Results, how different contract structures can have different outcomes. This post covers Time & Materials vs Fixed-Price a little more, looking at pros & cons, and where each one is better suited.

Definitions:

  • Time & Materials: Client & supplier agree on the requirements, and an hourly rate. The client is billed based upon the number of hours spent completing the job. Any costs for materials are also passed on. If the job takes 8 hours, the client pays for 8 hours. If it takes 800 hours, the client pays for 800 hours. To prevent bill shock, there will usually be review points to measure progress & time spent. Risk lies with the client.
  • Fixed-Price: Client & Supplier agree beforehand on what outcomes the client needs. It is crucial that this is well-documented, so there are no misunderstandings. The supplier will estimate how long the job will take, allow some extra margin, and quote a figure. The client pays the same amount, regardless of how long the job takes. Risk lies with the supplier.

Comparison:

Time & Materials

Pros: Little time/energy wasted on quoting – engineers can get to work faster. Customer saves money if job Continue reading

Andrisoft Wanguard: Cost-Effective Network Visibility

Andrisoft Wansight and Wanguard are tools for network traffic monitoring, visibility, anomaly detection and response. I’ve used them, and think that they do a good job, for a reasonable price.

Wanguard Overview

There are two flavours to what Andrisoft does: Wansight for network traffic monitoring, and Wanguard for monitoring and response. They both use the same underlying components, the main difference is that Wanguard can actively respond to anomalies (DDoS, etc).

Andrisoft monitors traffic in several ways – it can do flow monitoring using NetFlow/sFlow/IPFIX, or it can work in inline mode, and do full packet inspection. Once everything is setup, all configuration and reporting is done from a console. This can be on the same server as you’re using for flow collection, or you can use a distributed setup.

The software is released as packages that can run on pretty much any mainstream Linux distro. It can run on a VM or on physical hardware. If you’re processing a lot of data, you will need plenty of RAM and good disk. VMs are fine for this, provided you have the right underlying resources. Don’t listen to those who still cling to their physical boxes. They lost.

Anomaly Detection

You Continue reading

Non-Functional Requirements

I’m currently reading and enjoying “The Practice of Cloud System Administration.” It doesn’t go into great depth in any one area, but it covers a range of design patterns and implementation considerations for large-scale systems. It works for two audiences: A primer for junior engineers who need a broad overview, or as a reference for more experienced engineers. It doesn’t cover all the implementation specifics, nor should it: it would date very quickly if it tried.

I’ve long disliked the term “non-functional requirements,” so I enjoyed this passage:

Rather than the term “operational requirements,” some organizations use the term “non-functional requirements.” We consider this term misleading. While these features are not directly responsible for the function of the application or service, the term “non-functional” implies that these features do not have a function. A service cannot exist without the support of these features; they are essential.

It is all the fashion today to separate requirements into ‘functional’ and ‘non-functional,’ but the authors are right to point out that this can be misleading. Perhaps it’s the old Operations Engineer in me, but if a product doesn’t have things like Backup & Restore, or Configuration Management, then it’s a Continue reading

Keep an Open Mind

We all know that IT changes rapidly, but we still don’t always accept what that means. Companies and technologies change over time, and good engineers recognise this. Poor engineers cling to past beliefs, refusing to accept change. Try to keep an open mind, and periodically re-evaluate your opinions.

Consider the Linux vs Microsoft debate. I’ve been an Open Source fan for a long time, and have plenty of experience running Linux on servers and desktops. Today I use OS X as my primary desktop. I’ve cursed at Microsoft many times over the years, usually when dealing with some crash, security issue, or odd design choice.

But it annoys the hell out of me when I hear engineers spouting tired old lines about Microsoft products crashing, or having poor security. This is usually accompanied by some smug look “Hur hur hur…Microsoft crash…Blue Screen of Death…hur hur hur”

I get frustrated because these people aren’t paying attention to what Microsoft has been doing. They have come a very long way since the 2002 Bill Gates email setting security as the top priority. It’s a big ship to turn, and it took time. Their overall security model and practices are far better than they were, Continue reading

Rewarding Effort vs Results

Sometimes we confuse effort with outcome. We think that hours spent are more important than outcomes achieved. Or we unintentionally create a system where effort is rewarded, rather than outcomes.

Consider a situation where you work for a consulting firm, doing capped Time & Materials jobs. The client gets charged for the amount of time actually worked. Any amount of time up to the cap will be accepted. If more time is needed to complete a task, you’ll need to go back to the client to negotiate for more time/money. Occasionally you’ll need to do that, but usually the job will be completed under the cap.

As a consultant, you’re normally measured on your utilisation, and the amount you bill. So what’s the optimum amount of work to do for each job? Funnily enough, it is very close to the amount estimated – no matter what the estimate was. Maximise revenue & utilisation, while still doing the work under budget. There’s no incentive to do the job quicker.

Look at it from the perspective of two different consultants, Alice & Bob:

  • Alice is a diligent worker, who gets through her work as quickly as possible. Repeatable tasks are scripted. She doesn’t muck around.
  • Bob is a Continue reading

APIs Alone Aren’t Enough

Yes, we know: Your product has an API. Yawn. Sorry for not getting excited. That’s just table stakes now. What I’m interested in is the pre-written integrations and code you have that does useful things with that API.

Because sure, an API lets me integrate my various systems however I want. Theoretically. Just the same way that Bunnings probably sells me all the pieces I need to build a complete house.

Random aside: If your “open API” requires signing an NDA to view details, then maybe it’s not so open after all? 

If I’m running a small company staffed by developers, then just giving me an API is acceptable. But in a larger company, or one without developer resources, an API alone isn’t enough. I want to see standard, obvious integrations already available, and supported by the vendor.

In this spirit, I’m very pleased to see that ThousandEyes now has a standard integration with PagerDuty:

ThousandEyes appears as a partner integration from which you can receive notifications; and, within ThousandEyes we now have a link to easily add alerts to your PagerDuty account.

You can read more at the ThousandEyes blog.

This is exactly the sort of obvious integration I Continue reading

Increased MTTR is Good?

In Episode 167 of The Cloudcast – “Bringing Advanced Analytics to DevOps”, Dave Hayes brings up an interesting point about Mean Time to Resolution (MTTR). At about 8:30 in, he states:

“In a counter-intuitive sense, you actually want this to be going up…If you’re removing false alerts, and you’re getting better about the quantity of alerts, you’re going to be solving far fewer, more difficult problems, so you should see a slight trend upwards in Mean Time to Resolution”

This is a really interesting way of looking at things. Obviously you don’t want to set your goal as “Increase our MTTR,” but this could be a positive side-effect of improved processes.

I recommend listening to the whole episode. PagerDuty is a very cool product in itself, but this is a broader discussion about operations, analytics, and best practices.

Subscribe to the podcast while you’re there too. Lots of interesting technology discussed there.

Using Firewalls for Policy Has Been a Disaster

Almost every SDN vendor today talks about policy, how they make it easy to express and enforce network policies. Cisco ACI, VMware NSX, Nuage Networks, OpenStack Congress, etc. This sounds fantastic. Who wouldn’t want a better, simpler way to get the network to apply the policies we want? But maybe it’s worth taking a look at how we manage policy today with firewalls, and why it doesn’t work.

In traditional networks, we’ve used firewalls as network policy enforcement points. These were the only practical point where we could do so. But…it’s been a disaster. The typical modern enterprise firewall has hundreds (or thousands) of rules, has overlapping, inconsistent rules, refers to decommissioned systems, and probably allows far more access than it should. New rules are almost always just added to the bottom, rather than working within the existing framework – it’s just too hard to figure out otherwise.

Why have they been a disaster? Here’s a few thoughts:

  • Traditional firewalls use IP addresses. But there’s no automated connection between server configuration/IP allocation and firewall policies. So as servers move around or get decommissioned, firewall policies don’t get automatically updated. You end up with many irrelevant objects and Continue reading

Root Cause Analysis – It’s Not Perfect

Automated Root Cause Analysis promises a lot. High-end network monitoring systems promise that they can automatically isolate network problems, and only tell you about the thing that needs fixing. This sounds very enticing. Who wants a flood of alarms, when we could get just one alarm, telling us what we need to fix? But it’s not perfect, and you do need to pay attention to it.

Consider this contrived network:

RCA Example

What happens if the upstream link from the router fails?

RCA Link Down

From the perspective of the NMS, all systems at that site are unreachable. A simple NMS that is unaware of topology will create 4 alarms – one for each of the router, the switches and the server. A smarter NMS will recognise that it only needs one alarm, for the router WAN link being unreachable (and therefore the whole site is offline). It will know that the switches and server are unreachable, but those alarms will be suppressed by the key incident.

This all sounds like a good idea. Why wouldn’t you want that?

But what if the NMS view of the network is incomplete? What might happen then?

Consider the same network as above, but this time a new WAN router has been Continue reading

BYOD: Just another money-grab?

BYOD policies sound alluring. No more forced use of a crappy old corporate laptop – “hey look, we’ll let you choose whatever you want!” But I think it is a way to shift the cost burden over to employees. It will be done slowly, over several years, and we’ll welcome it. But it will lead to employees carrying more costs. I guess we should be careful with what we wish for.

In my teens I spent many years working in the produce & butchery departments at a local supermarket. When I started out, the contracts still had the last vestiges of union-dominated times. So we got paid allowances for laundry, extra allowances if we’d passed some school exams, higher rates for overtime, meal allowances, etc. During the years I was there, these were eroded. Each year they gave us pay rises that were nominally higher than inflation, and yet another allowance was ‘incorporated’ into my wages. Sometimes allowances would remain for older employees. When I left, I was being paid significantly more than new employees, in part because I still had several extra allowances.

I think we’ll see the same thing with BYOD programs. I think it will go like this:

  1. Announce BYOD Continue reading

In Praise of Support Lifecycles

If you’re just starting out working with ‘Enterprise’ products, you may not have come across Support Lifecycles. It’s important to know what these are, and how it affects you. They can have both a positive & a negative impact on when and why you choose to upgrade systems.

What Are Support Lifecycles?

Developers would like to only support the latest version. But customers can’t/won’t always run the latest version. They need to know that they can expect a certain level of support for the version they’re running. As a compromise, software vendors will publish a support lifecycle policy. This will outline the levels of support a product gets, from new product introduction, through to being superceded, and finally moved to end of support. Typical phases include:

  • General Support: Product is in General Availability phase, and is fully supported. You can log support cases, search KB articles, and expect both functionality enhancement and bugfix patches. The current product version will always be in this phase, and typically 1-2 major versions behind will also be included.
  • Limited Support: You can log a support case, and we’ll try to help, but we’re not planning any new patches, and you’ll probably get a suggestion Continue reading

Shellshock: One Month On

Shellshock was released a little over a month ago, to wide predictions of doom & gloom. But somehow the Internet survived, and we lurch on towards the next crisis. I recently gave a talk about Shellshock, the fallout, and some thoughts on wider implications and the future. The talk wasn’t recorded, so here’s a summary of what was discussed.

Background: NZ ISIG: Keeping it Local

The New Zealand Information Security Interest Group (ISIG) runs monthly meetings in Auckland and Wellington. They’re open to all, and are fairly informal affairs. There’s usually a presentation, with a wide-ranging discussion about security topics of the day. No, we don’t normally discuss “picking padlocks, debating whose beard or ponytail is better or which martial art/fitness program is cooler.”

Attend enough meetings, and sooner or later you’ll be called upon to present. I was ‘volunteered’ to speak on Shellshock, about a month after the exploit was made public. I didn’t talk about the technical aspects of the exploit itself – instead I explored some of the wider implications, and industry trends. I felt the talk went well, mainly because it wasn’t just me talking: everyone got involved and contributed to the discussion. It would be a bit Continue reading

Disappointed With Check Point

I have recently started working with Check Point products again, after a 5-year break. This has given me a different perspective on how they are progressing. It has been disappointing to see that they’re still suffering from some of the same old bugs. Some of the core functionality is now showing its age, and is no longer appropriate for modern networks.

When you’re using a product or technology on a regular basis, it can be hard to accurately gauge progress. Maybe it feels like there are only incremental changes, with nothing major happening. But then you come across a 5-year old system, and you realise just how far we’ve come. If you don’t think iOS is changing much, find some videos of the first iPhones.

The opposite is when it feels like there are many regular enhancements…but when you step back you see that core product issues are not dealt with. It can be hard to see this when you’re working at the coal-face. You need to step away, work with other products and systems, then return.

That’s what I’ve done with Check Point recently. Through much of the 2000s, I did a huge amount of work with Check Point firewalls. Continue reading

Don’t Be Afraid of Changing Jobs

Some people are corporate survivors, sticking with one company for decades. Some people move around when it suits, while others would like to move, but are fearful of change. Here’s a few things I’ve learnt about adapting to new work environments. It’s not that scary.

Corporate Survivors

We’ve all seen the people who seem to survive in a corporate environment. They seem to know everyone, and almost everything about the business. Return to a company after 10 years, and they’re still there. Somehow they survive, through mergers, acquisitions, and round after round of re-organisation. But often they seem to be doing more or less the same job for years, with little change.

Why Do People Stay?

There’s four possible reasons for staying at a job for a long time:

  1. You’re really happy with what you do, and you’re well looked after.
  2. You just don’t care. You come to work to eat your lunch and talk to your friends. You don’t care how you’re treated, or what work you do, as long as you get paid.
  3. This is the only possible job you can get, due to location/skills/whatever.
  4. You’re comfortable where you are, and you’re scared of moving, scared of what Continue reading

HP SDN App Store Launches

HP’s SDN App Store has finally seen the light of day. This is intended to be a common platform for users and developers, to find and distributed real-world, practical SDN applications. Some of the launch apps include:

It’s interesting to look at the price points for applications. They are certainly not $0.99 apps, but they are still cheaper than typical ‘Enterprise’ software. I think it will take us a while to figure out what the right level of ‘value’ is.

HP has done well to put together a platform that developers can use to distribute SDN applications. It’s not an easy task to put together all of the back-end work required for something like this. It’s not simply hosting a website, it’s figuring out all the legal & financial implications, the support mechanisms, etc. There’s a lot of non-technical effort that goes into this.

The only challenge is that currently it is for SDN apps that use the HP VAN SDN Controller, which will limit the size of the market. I’m hoping that in future it will work with OpenDaylight. That will expand Continue reading

Utility-Based Pricing Troubles Me

Utility, or Consumption-Based pricing models offer an interesting way of matching costs to revenues. But if they’re not managed well, customer costs could blow out just trying to keep the lights on. We’ve come to expect rapidly declining hardware prices. Have vendors realised their utility prices need to decline at a similar rate?

I’ve been doing more architecture work over the last twelve months, and this has changed some of my thinking about technology. Previously I was only really interested in speeds & feeds, and technical capabilities. Scaling was only about how to add capacity – not what it would cost. When I looked at costs, it was just to shake my head at the ridiculous prices charged for things like a second power supply.

But now I find myself interested in things like cost curves, and trying to figure out how my costs will change as demand changes. The ideal is for their to be a clear relationship between costs & revenue, hopefully with costs growing at a slower rate than demand (and revenue).

Previously we had high upfront costs to buy hardware and software, and we aimed to amortise it over the life of the service. Our costs Continue reading

Knowing Your Audience…and Showing It

We all know that you’re supposed to “Know Your Audience.” Doing so improves engagement, and avoids faux pas like “Suggested Tweets.” But recently I realised that this doesn’t have to be subtle. Drop hints early on in your presentation that you’ve taken the time to understand the audience – it can really lift the mood.

Suggested Tweets – Just Say No

Companies that obsess about the wrong kind of metrics think that all they need is to get their message repeated many times. So they give employees & partners a list of “suggested tweets.” These are pre-written Tweets that people can send out from their own Twitter accounts, to “generate buzz.” I have seen many companies do this, and it is overwhelmingly lame. It devalues the message, and devalues those who send out these “suggested tweets.”

In the lead-up to the recent Cisco UCS event, many members of the Cisco Champions program sent out the same set of tweets. When I see the same tweet from several people in my stream, it’s obvious what’s going on. If you’re running a marketing Twitter account, then yeah, I expect marketing messages. But if you’re a real person, and I’ve Continue reading

The Chassis Switch is Dead

The Chassis Switch is Dead. For most networks, chassis-based switches are no longer appropriate due to cost, inflexibility and risk. I see this as similar to servers, in that server blade chassis are no longer appropriate for most organisations. The alternatives are already better for cost & flexibility. The real question is what our management model will look like for those alternatives.

Dead Collector: ‘Ere, he says he’s not dead.
Leaf-Spine: Yes he is.
Chassis: I’m not.
Dead Collector: He isn’t.
Leaf-Spine: Well, he will be soon, he’s very ill.
Chassis: I’m getting better.
Leaf-Spine: No you’re not, you’ll be stone dead in a moment.

(With apologies to Monty Python)

Blade Servers…

In the late 1990s, and early 2000s, server buying patterns changed significantly. Previously we had a few “Big Iron” Unix systems, but cheaper Intel-based systems changed the economics dramatically. This lead to a rapid sprawl in the number of physical servers.

In the second half of the 2000s, server blades appeared as a seductive answer. They promised simpler management of pools of systems, greater density, better efficiencies, and operational cost savings. Vendors promised long term “investment protection”, assuring us that we could keep the chassis, and upgrade blades Continue reading