Four people now live and work in my home 24×7; my wife Andi, her mother, my daughter and myself. Many of you now live in similar situations.
Very occasionally, everyone will have network trouble, such as occurred to us this morning. Sometimes it is our “last mile” connection: it is easy to see these failures in our cable modem log. (Often available by looking at the address 192.168.100.1, which seems to be the default address for cable modems.). Occasionally it can be the ISP (in our case, Comcast), either due to some routing failure or DNS failure. These can be harder to diagnose.
Bufferbloat, however, is insidious. It comes and goes, and most users have been “trained” to ignore temporary bad behavior over many years. When you go to diagnose it, you usually stop the operation that is causing it. This blog has recorded our efforts to fix bufferbloat. Now that there are many more people at home at the same time trying to do more demanding applications, this problem is much more common. Other people in your home can inflict the bufferbloat problem on you without you or they understanding what is happening.
Yesterday afternoon Continue reading
This is an urgent call for expert help to quickly test a possible method to sterilize used N95 masks.
In many places, hospital staff, first responders and others are at grave risk due to inadequate supplies of N95 masks. Already, some hospitals even in the U.S. report running out of N95 masks and face reusing possibly contaminated masks. My local fire department has about 20 N95 masks total available, as they face transporting patients to the hospital. People are faced with reusing masks without sterilization. I offer an idea that might alleviate the critical shortage. I have run this idea past my pulmonary care doctor of many years who believes that the idea may be viable, but everyone on the front lines of the epidemic are already working flat out.
I sent the letter below to Dr. Anthony Fauci that sets the context.
People with the right expertise are needed to vet the idea and ensure that it is safe and effective as quickly as possible.
Dear Dr. Fauci,
N95 masks are in critically short supply. Sterilization of disposable masks N95 could be a Continue reading
Ed Felton tweeted a few days ago: “Often hear that the reason today’s Internet is not more secure is that the early designers failed to imagine that security could ever matter. That is a myth.”
This is indeed a myth. Much of the current morass can be laid at the feet of the United States government, due to its export regulations around cryptography.
I will testify against the myth. Bob Scheifler and I started the X Window System in 1984 at MIT, which is a network transparent window system: that is, applications can reside on computers anywhere in the network and use the X display server. As keyboard events may be transmitted over the network, it was clear to us from the get-go that it was a security issue. It is in use to this day on Linux systems all over the world (remote X11 access is no longer allowed: the ssh protocol is used to tunnel the X protocol securely for remote use). By sometime in 1985 or 1986 we were distributing X under the MIT License, which was developed originally for use of the MIT X Window System distribution (I’d have to go dig Continue reading
Bufferbloat is responsible for much of the poor performance seen in the Internet today and causes latency (called “lag” by gamers), triggered even by your own routine web browsing and video playing.
But bufferbloat’s causes and solutions remind me of the old parable:
It was six men of Indostan, to learning much inclined,
who went to see the elephant (Though all of them were blind),
that each by observation, might satisfy his mind.
……. (six stanzas elided)
And so these men of Indostan, disputed loud and long,
each in his own opinion, exceeding stiff and strong,
Though each was partly in the right, and all were in the wrong!
So, oft in theologic wars, the disputants, I ween,
tread on in utter ignorance, of what each other mean,
and prate about the elephant, not one of them has seen!
John Godfrey Saxe
Most technologists are not truly wise: we are usually like the blind men of Indostan. The TCP experts, network operators, telecom operators, router makers, Internet service operators, router vendors and users have all had a grip *only* on their piece of the elephant.
The TCP experts look at TCP and think “if only TCP were Continue reading
My new years resolution is to restart blogging.
Trying to steer anything the size of the Internet into a better direction is very slow and difficult at best. From the time changes in the upstream operating systems are complete to when consumers can buy new product is typically four years caused by the broken and insecure ecosystem in the embedded device market. Chip vendors, box vendors, I’m looking at you… So much of what is now finally appearing in the market is based on work that is often four years old. Market pull may do what push has not.
The fq_codel & cake work going on in the bufferbloat project is called SQM – “smart queue management.”
See What to do About Bufferbloat for general information. And the DSLReports Speedtest makes it easy to test for bufferbloat. But new commercial products are becoming increasingly available. Here’s some of them.
First up, I’d like call out the Evenroute IQrouter. DSL users have often suffered more than other broadband users, due to bad bloat in the modems compounded by minimal bandwidth, so the DSL version of the IQrouter is particularly welcome. Often DSL ISP’s seem to have the tendency (more Continue reading
Dave Reed just published concerning network neutrality. Everyone interested in the topic should carefully read and understand Does the Internet need “Governance”?
One additional example of “light touch” help for the Internet where government may play a role is transparency: the recent MLAB’s report and the fact that Cogent’s actions caused retail ISP’s to look very badly is a case in point. You can follow up on that topic on the MLabs’s mailing list, if you are so inclined. If a carrier can arbitrarily delay/deprioritize traffic in secret, then the market (as there are usually alternatives in transit providers) cannot function well. And if that provider is an effective monopoly for many paths, that becomes a huge problem.
Vint Cerf wrote a wonderful piece on the problems I’ve been wrestling with the last number of years, called “Bufferbloat and Other Internet Challenges“. It is funny how one thing leads to another; I started just wanting my home network to work as I knew it should, and started turning over rocks. The swamp we’re in is very deep and dangerous, the security problem the worst of all (and given how widespread bufferbloat is, that’s saying something). The “Other Challenges” dwarf bufferbloat, as large a problem as it is.
I gave a lunch talk at the Berkman Center at Harvard in June on the situation and recommend people read the articles by Bruce Schneier and Dan Geer you will find linked there, which is their takes on the situation I laid out to them (both articles were triggered by the information in that talk).
Dan Geer’s piece is particularly important from a policy perspective.
I also recommend reading “Familiarity Breeds Contempt: The Honeymoon Effect and the Role of Legacy Code in Zero-Day Vulnerabilities“, by Clark, Fry, Blaze and Smith, which makes clear to me that our engineering processes need fundamental reform in the face of very Continue reading
Note: Updated October 24, 2013, to fix some editorial nits, and to clarify the intended point that it is the combination of a working mark/drop algorithm with flow scheduling that is the “killer” innovation, rather than the specifics of today’s fq_codel algorithm.
Latency (called “lag” by gamers), once incurred, cannot be undone, as best first explained by Stuart Cheshire in his rant: “It’s the latency, Stupid.” and more formally in “Latency and the Quest for Interactivity,” and noted recently by Stuart’s 12 year old daughter, who sent Stuart a link to one of the myriad “Lag Kills” tee shirts, coffee bugs, and other items popular among gamers.
Out of the mouth of babes…
Any unnecessary latency is too much latency.
Many networking engineers and researchers express the opinion that 100 milliseconds latency is “good enough”. If the Internet’s worst latency (under load) was 100ms, indeed, we’d be much better off than we are today (and would have space warp technology as well!). But the speed of light and human factors research easily demonstrate this opinion is badly flawed.
Many have understood bufferbloat to be a problem that primarily occurs when a saturating “ Continue reading
The bufferbloat project has had trouble getting consistent repeatable results from other experimenters, due to a variety of factors. This Wiki page at bufferbloat.net attempts to identify the most common omissions and mistakes. There be land mines here. Your data will be garbage if you don’t avoid them!
Note that most of these are traps for people doing network research in general, not just bufferbloat research.
I received the following question today from Ralph Droms. I include an edited version of my response to Ralph.
On Thu, Jun 20, 2013 at 9:45 AM, Ralph Droms (rdroms) <rdroms@yyy.zzz> wrote:
Someone suggested to me that bufferbloat might even be worse in switches/bridges than in routers. True fact? If so, can you point me at any published supporting data? Thanks, Ralph
It is hard to quantify as to whether switches or routers are “worse”, and I’ve never tried, nor seen any published systematic data. I
wouldn’t believe such data if I saw it, anyway. What matters is whether you have unmanaged buffers before a bottleneck link.
I don’t have first hand information (to just point you at particular product specs; I tend not to try to find out whom is particularly guilty as it can only get me in hot water if I compare particular vendors). I’ve generally dug into the technology to understand how/why buffering is present to understand what I’ve seen.
You can go look at specs of switches yourself and figure out switches have problems from first principles.
Feel free to write a paper!
Here’s what I do know.
Linux 3.6 just shipped. As I’ve noted before, bloat occurs in multiple places in an OS stack (and applications!). If your OS TCP implementation fills transmit queues more than needed, full queues will cause the RTT to increase, etc. , causing TCP to misbehave. Net result: additional latency, with no increase in bandwidth performance. TCP small queues reduces the buffering without sacrificing performance, reducing latency.
To quote the Kernel Newbies page:
TCP small queues is another mechanism designed to fight bufferbloat. TCP Small Queues goal is to reduce number of TCP packets in xmit queues (qdisc & device queues), to reduce RTT and cwnd bias, part of the bufferbloat problem. Without reduction of nominal bandwidth, we have reduction of buffering per bulk sender : < 1ms on Gbit (instead of 50ms with TSO) and < 8ms on 100Mbit (instead of 132 ms).
Eric Dumazet (now at Google) is the author of TSQ. It is covered in more detail at LWN. Thanks to Eric for his great work!
The combination of TSQ, fq_codel and BQL (Byte Queue Limits) gets us much of the way to solving bufferbloat on Ethernet in Linux. Unfortunately, wireless remains a challenge (the drivers Continue reading
I will be giving a updated version of my bufferbloat talk there on Saturday, October 6. The meeting is about community wireless networks (many of which are mesh wireless networks) on which bufferbloat is a particular issue. It is in Barcelona, Spain, October 4-7.
We tried (and failed) to make ad-hoc mesh networking work when I was at OLPC, and I now know that one of the reasons we were failed was bufferbloat.
I’ll also be giving a talk at the UKNOF (UK Network Operator’s Forum) in London on October 9, but that is now full and there is no space for new registrants.
Bufferbloat was covered in a number of sessions at the Vancouver IETF last week.
The most important of these sessions is a great explanation of Kathie Nichols and Van Jacobson’s CoDel (“coddle”) algorithm given during Tuesday’s transport area meeting by Van. It is not to be missed by serious network engineers. It also touches on why we like fq_codel so much, though I plan to write much more extensively on this topic very soon. CoDel by itself is great, but in combination with SFQ (like) algorithms that segregate flows, the results are stunning; CoDel is the first AQM algorithm which can work across arbitrary number of queues/flows.
The Saturday before the IETF the IAB / IRTF Workshop on Congestion Control for Interactive Real-Time Communication took place. My position paper was my blog entry of several weeks back. In short, there is no single bullet, though with CoDel we finally have the final missing bullet for its complete solution. The other, equally important but non-technical bullets will be market pressure fix broken software/firmware/hardware all over the Internet: so exposing the bloat problem is vital. You cannot successfully engineer around bufferbloat, but you can detect it, and let users know when they Continue reading
Many real time applications such as VOIP, gaming, teleconferencing, and performing music together, require low latency. These are increasingly unusable in today’s internet, and not because there is insufficient bandwidth, but that we’ve failed to look at the Internet as a end to end system. The edge of the Internet now often runs congested. When it does, bufferbloat causes performance to fall off a cliff.
Where once a home user’s Internet connection consisted of a single computer, it now consists of a dozen or more devices – smart phones, TV’s, Apple TV’s/Roku devices, tablet devices, home security equipment, and one or more computer per household member. More Internet connected devices are arriving every year, which often perform background activities without user’s intervention, inducing transients on the network. These devices need to effectively share the edge connection, in order to make each user happy. All can induce congestion and bufferbloat that baffle most Internet users.
The CoDel (“coddle”) AQM algorithm provides the “missing link” necessary for good TCP behavior and solving bufferbloat. But CoDel by itself is insufficient to solve provide reliable, predictable low latency performance in today’s Internet.
Bottlenecks are most common at the “edge” of the Internet and there you must Continue reading
Latency much more than bandwidth governs actual internet “speed”, as best expressed in written form by Stuart Chesire’s It’s the Latency, Stupid rant and more formally in Latency and the Quest for Interactivity.
Speed != bandwidth despite all of what an ISP’s marketing department will tell you. This misconception is reflected up to and including FCC Commissioner Julius Genachowski, and is common even among technologists who should know better, and believed by the general public. You pick an airplane to fly across the ocean, rather than a ship, even though the capacity of the ship may be far higher.
The Internet could and should degrade gradually in the face of load; but today’s Internet does not degrade gracefully due to bufferbloat. Instead, performance falls off a cliff. We are lemmings on migration.
The Internet is designed to run as fast as it can, and so will fill any capacity network link as soon as you have any applications that asks to do so. We have more and more such applications, and the buffers get bigger each hardware generation, though usually operated at a small fraction of that possible bandwidth. As soon as a network or network link reaches100% capacity, the usually grossly Continue reading
The CoDel AQM algorithm by Kathie Nichols and Van Jacobson provides us with an essential missing tool to control queues properly. This work is the culmination of their at three major attempts to solve the problems with AQM algorithms over the last 14 years.
Eric Dumazet wrote the codel queuing discipline (based on a quick prototype by Dave Täht, who spent the last year working 60 hour weeks on bufferbloat) which landed in net-next a week or two ago; yesterday, net-next was merged into the Linux mainline for inclusion in the next Linux release. Eric also implemented a fq_codel queuing discipline, combining fair queuing and CoDel (pronounced “coddle”), and it works very well. The CoDel implementation was dual licensed BSD/GPL to help the *BSD community. Eric and others have tested CoDel on 10G Ethernet interfaces; as expected, CoDel performance is good in what’s been tested to date.
Linux 3.5 will likely release in August. So it was less than a month from first access to the algorithm (which was formally published in the AQM Queue May 6) to Linux mainline; it should be about four total from availability of the algorithm to Linux release. Not bad at all :-).
Felix Fietkau Continue reading
BitTorrent is a lightning rod on two fronts: it is used to download large files, which the MPAA sees as a nightmare to their business model, and BitTorrent has been a performance nightmare to ISP’s and some users. Bram Cohen has taken infinite grief for BitTorrent over the years, when the end user performance problems are not his fault.
Nor is TCP the performance problem, as Bram Cohen recently flamed about TCP on his blog.
I blogged about this before but several key points seem to have been missed by most: BitTorrent was never the root cause of most of the network speed problems BitTorrent triggered when BitTorrent deployed. The broadband edge of the Internet was already broken when BitTorrent deployed, with vastly too much uncontrolled buffering, which we now call bufferbloat. As my demonstration video shows, even a single simple TCP file copy can cause horrifying speed loss in an overbuffered network. Speed != bandwidth, despite what the ISP’s marketing departments tell you.