Archive

Category Archives for "Networking"

Détails de la panne Cloudflare du 2 juillet 2019

Il y a près de neuf ans, Cloudflare était une toute petite entreprise dont j’étais le client, et non l’employé. Cloudflare était sorti depuis un mois et un jour, une notification m’alerte que mon petit site,  jgc.org, semblait ne plus disposer d’un DNS fonctionnel. Cloudflare avait effectué une modification dans l’utilisation de Protocol Buffers qui avait endommagé le DNS.

J’ai contacté directement Matthew Prince avec un e-mail intitulé « Où est mon DNS ? » et il m’a envoyé une longue réponse technique et détaillée (vous pouvez lire tous nos échanges d’e-mails ici) à laquelle j’ai répondu :

De: John Graham-Cumming
Date: Jeudi 7 octobre 2010 à 09:14
Objet: Re: Où est mon DNS?
À: Matthew Prince

Superbe rapport, merci. Je veillerai à vous appeler s’il y a un
problème.  Il serait peut-être judicieux, à un certain moment, d’écrire tout cela dans un article de blog, lorsque vous aurez tous les détails techniques, car je pense que les gens apprécient beaucoup la franchise et l’honnêteté sur ce genre de choses. Surtout si vous y ajoutez les tableaux qui montrent l’augmentation du trafic suite à votre lancement.

Je dispose d’un système robuste de surveillance de mes sites qui m’envoie un  Continue reading

关于 2019 年 7 月 2 日 Cloudflare 中断的详情

大约九年前,Cloudflare 还是一家小公司,我也还是客户,而不是员工。当时,Cloudflare 早在一个月前就已发布了  jgc.org,有一天警报消息显示,这个小网站似乎不再支持 DNS 了。Cloudflare 实施了一项对 Protocol Buffers 使用的改动,这破坏了 DNS。

我直接给 Matthew Prince 写了一封题为“我的 DNS 在哪儿?”的邮件,他回复了一封篇幅很长、内容详实的技术性解答邮件(您可以点击此处查看往来邮件的全部内容),我对该邮件的回复是:

发件人:John Graham-Cumming
日期:2010 年 10 月 7 日星期四上午 9:14
主题:回复:我的 DNS 在哪儿?
收件人:Matthew Prince

谢谢,这是一篇很棒的报告。如果有问题,我一定会去电
问询。 就某种程度而言,在掌握了所有技术细节后,
将它们撰写为一篇博客文章可能会更好,因为我认为
读者会非常感谢博主对这些信息的坦诚公开。
这一点在您看到文章发布后流量增加的图表时,
会感触更深。

我在密切监控着网站,以便在出现任何故障时能够
收到短信通知。 监控显示,我的网站在 13:03:07 至
14:04:12 期间流量下降。 我会每五分钟测试一次。

这只是个小插曲,我相信您会解决这个问题。 但您确定您不需要
有人在欧洲为您分忧吗?:-)

他的回复是:

发件人:Matthew Prince
日期:2010 年 10 月 7 日星期四上午 9:57
主题:回复:我的 DNS 在哪儿?
收件人:John Graham-Cumming

谢谢。我们已经回复了所有来信。我现在要去办公室,
我们会在博客上发布些信息,或在我们的公告栏系统中
置顶一篇官方帖文。我同意 100%
透明度是最好的。

因此,今天,作为规模远胜以往的 Cloudflare 公司的一员,我要写一篇文章,清楚讲述我们所犯的错误、它的影响以及我们正在为此采取的行动。

7 月 2 日事件

7 月 2 日,我们在 WAF 托管规则中部署了一项新规则,导致全球 Cloudflare 网络上负责处理 HTTP/HTTPS 流量的各 CPU 核心上的 CPU 耗尽。我们在不断改进 WAF 托管规则,以应对新的漏洞和威胁。例如,我们在 5 月份以更新 WAF 的速度出台了一项规则,以防范严重的 SharePoint 漏洞。能够快速地全局部署规则是 WAF 的一个重要特征。

遗憾的是,上周二的更新中包含了一个规则表达式,它在极大程度上回溯并耗尽了用于 HTTP/HTTPS 服务的 CPU。这降低了 Cloudflare 的核心代理、CDN 和 WAF 功能。下图显示了专用于服务 HTTP/HTTPS 流量的 CPU,在我们网络中的服务器上,这些 CPU 的使用率几乎达到了 100%。

事件发生期间某个 PoP 的 CPU 利用率

这导致我们的客户(以及他们的客户)在访问任何 Cloudflare 域时都会看到 502 错误页面。502 错误是由前端 Cloudflare Web 服务器生成的,这些服务器仍有可用的 CPU 内核,但无法访问服务 HTTP/HTTPS 流量的进程。

我们知道这对我们的客户造成了多大的伤害。我们为发生这种事件感到羞耻。在我们处理这一事件时,它也对我们自身的运营产生了负面影响。

如果您是我们的客户,您也一定感受到了难以置信的压力、沮丧和恐惧。更令人懊恼的是,我们的六年零全球中断记录也就此打破。

CPU 耗尽是由一个 WAF 规则引起的,该规则里包含不严谨的正则表达式,最终导致了过多的回溯。作为中断核心诱因的正则表达式是 (?:(?:\"|'|\]|\}|\\|\d|(?:nan|infinity|true|false|null|undefined|symbol|math)|\`|\-|\+)+[)]*;?((?:\s|-|~|!|{}|\|\||\+)*.*(?:.*=.*)))

尽管正则表达式本身成为很多人关注的焦点(下文将进行详细讨论),但 Cloudflare 服务中断 27 分钟的真实情况要比“正则表达式出错”复杂得多。我们已经花时间写下了导致中断并使我们无法快速响应的一系列事件。如果您想了解更多关于正则表达式回溯以及如何处理该问题的信息,可在本文末尾的附录中查找。

发生了什么情况

我们按事情发生的先后次序讲述。本博客中的所有时间均为协调世界时 (UTC)。

在 13:42,防火墙团队的一名工程师通过一个自动过程对 XSS 检测规则进行了微小改动。这生成了变更请求票证。我们使用 Jira 管理这些票证,下面是截图。

三分钟后,第一个 PagerDuty 页面出现,显示 WAF 故障。这是一项综合测试,从 Cloudflare 外部检查 WAF 的功能(我们会进行数百个此类测试),以确保其正常工作。紧接着出现了多个页面,显示许多其他的 Cloudflare 服务端到端测试失败、全球流量下降警报、众多的 502 错误,之后便是我们在全球各城市的网点 (PoP) 发来的大量指示 CPU 耗尽的报告。



我收到了其中部分警告并立马起身走出会议室,而正在我回到办公桌的途中,解决方案工程师团队的一名负责人告诉我,我们的流量已经减少了 80%。我跑向 SRE 团队,他们正在排除故障。在中断的最初时刻,有人猜测这是某种我们从未见过的攻击。

Cloudflare 的 SRE 团队成员分布在世界各地,他们全天持续监控着网络。绝大多数此类警报都指出了局部区域有限范围内的非常具体的问题,这些警报均在内部仪表板中监控,并且每天会进行多次处理。但这种页面和警报模式表明发生了严重问题,SRE 立即宣布发生 P0 事件,并上报给工程领导层和系统工程部门。

当时,伦敦工程团队正在我们的主要活动场地听取一场内部技术讲座。讲座被打断,所有人都聚集在大型会议室中,商讨问题或是接打电话。这不是 SRE 能够独立处理的一般问题,它需要所有相关团队立即在线联合处理。

在 14:00,WAF 被确定为导致问题的部分原因,并排除了攻击的可能性。性能团队从一台清楚表明 WAF Continue reading

Details zum Cloudflare-Ausfall am 2. Juli 2019

Vor etwa neun Jahren war Cloudflare noch ein winziges Unternehmen und ich war ein Kunde, kein Mitarbeiter. Cloudflare gab es erst seit einem Monat. Eines Tages wurde ich darüber benachrichtigt, dass bei meiner kleinen Website jgc.org der DNS-Service nicht mehr funktionierte. Cloudflare hat seine Verwendung von Protocol Buffers angepasst und dadurch wurde der DNS-Service unterbrochen.

Ich habe eine E-Mail mit dem Titel „Where‘s my dns?“ (Wo ist mein DNS) direkt an Matthew Prince gesendet und er hat mit einer langen, detaillierten, technischen Erklärung reagiert (Sie können den vollständigen E-Mail-Austausch hier lesen), auf die ich antwortete:

Von: John Graham-Cumming
Datum: Do., 7. Okt. 2010 um 09:14
Betreff: Re: Wo ist mein DNS?
An: Matthew Prince

Toller Bericht, danke. Ich werde auf jeden Fall anrufen, wenn es ein
Problem geben sollte.  Es wäre wahrscheinlich sinnvoll, all das in
einem Blog-Beitrag festzuhalten, wenn Sie alle technischen Details haben. Ich glaube nämlich,
dass es Kunden wirklich zu schätzen wissen, wenn mit solchen Dingen offen und ehrlich umgegangen wird.
Sie könnten auch die Traffic-Zunahme nach der Implementierung mit
Diagrammen veranschaulichen.

Ich habe eine recht zuverlässige Überwachung für meine Websites eingerichtet, deshalb bekomme ich eine SMS, wenn
etwas ausfällt.  Meine Daten zeigen,  Continue reading

Multipath routing with VPNv4

When we talk about VPNv4 prefixes – Route Distinguishers (RDs) play an incredibly important role in ensuring multipath routing. We talked about this a little bit in our last post, but I want to hit it home in this post as well as cover a couple of other items in a little great detail.

To start with, we’re going to use our same physical lab topology, but changes things up slightly. Namely….

  • vMX7 will remain a BGP route reflector but will also now participate in the dataplane by having it’s interfaces enabled with LDP and MPLS
  • vMX1 and vMX3 will act as remote PEs for our provider that are hosting an anycast service. Both will be advertising 140.10.20.0/24 into a “Provider” VPN which we will then import into our customer VPN.
  • vMX2 will now act like a Customer Edge (CE) router peering to vMX5 which will act as the provider edge (PE) router. It will peer with the provider in AS65000 from AS12345.

So our diagram will now look something like this…

Alright. So for the sake of thoroughness, I’ll start by including our base configurations again since they did change ever so slightly, and I Continue reading

Opting Out Of Toxic Culture

In a culture of Internet toxicity, a question all productivity-minded people should ponder is why they are participating in social platforms. For example, Twitter has become a predominantly negative cesspool. Even the pun-loving techies I monitor from a distance seem to lean increasingly toward darkness and anger.

Few rainbows are to be found on Twitter these days. Maybe it’s just the dark mode UI talking, but I don’t go there to laugh anymore. I don’t go there to connect with friends. Instead, I put on my virtual armor, and read through comments and responses directed at me or my company. To be sure, much of what I see is fine. However, many comments are meant to start fires, even when couched in smiling niceties.

That’s the Internet for you. We’ve always had flame wars, trolls, and haters, all the way back to my dial-up days on Delphi forums and AOL. I know how to block and mute people, and of course that helps. Even so, I find that there’s something different in the tone these days. Folks are on crusades to bring others down. To shame. To burn in digital effigy.

To destroy.

When toxicity spills over into my timeline Continue reading

The Development of DevNet’s Future

You’re probably familiar with Cisco DevNet. If not, DevNet is the place Cisco has embraced outreach to the developer community building for software-defined networking (SDN). Though initially cautious in getting into the software developer community, Cisco has embraced their new role and really opened up to help networking professionals embrace the new software normal in networking. But where is DevNet going to go from here?

Humble Beginnings

DevNet wasn’t always the darling of Cisco’s offerings. I can remember sitting in on some of the first discussions around Cisco OnePK and thinking to myself, “This is never going to work.”

My hesitation with Cisco’s first attempts to focus on software platforms came from two places. The first was what I saw as Cisco trying to figure out how to extend the platforms to include some programmability. It was more about saying they could do software and less about making that software easy to use or program against. The second place was actually the lack of a place to store all of this software knowledge. Programmers and developers are fickle lot and you have to have a repository where they can get access to the pieces they needed.

DevNet was that Continue reading

The Internet of Things: Connecting the Dots to Become a Smart Consumer

According to a recent survey conducted by Consumers International and the Internet Society, 63% of consumers think the way Internet-connected devices collect data is “creepy.” The Trust Opportunity: Exploring Consumer Attitudes to the Internet of Things, which polled people in the US, Canada, Japan, Australia, France, and the UK, also found that 73% of consumers think people using connected devices should worry about eavesdropping. And yet, new connected devices are being introduced practically every day, and sales show no sign of slowing down.

The word “smart” is used to describe almost all of these devices. But is that right?

The marketing around the Internet of Things (IoT) has become almost non-stop. Smart-this will make your life better, happier, more efficient. If only you had smart-that, you would reap the benefits of the marvelous technological age in which we live. But this often leaves out key information consumers need to make real smart choices.

It’s really about connectivity. For instance, that smart oven is a computer that happens to get hot in the middle. These IoT devices are able to perform smart functions because they are connected to the Internet. And while the marketing focuses on features and functionality, Continue reading

Avi Networks Now Part of VMware

By Tom Gillis, SVP/GM of Networking and Security BU

When we first announced our intent to acquire Avi Networks, the excitement within our customer base, with industry watchers and within our own business was overwhelming. IDC analysts wrote, “In announcing its intent to acquire software ADC vendor Avi Networks, VMware both enters the ADC market and transforms its NSX datacenter and multicloud network-virtualization overlay (NVO) into a Layer 2-7 full-stack SDN fabric (1).

Avi possesses exceptional alignment with VMware’s view of where the network is going, and how data centers must evolve to operate like public clouds to help organizations reach their full digital potential. It’s for these reasons that I am happy to announce VMware has closed the acquisition of Avi Networks and they are now officially part of the VMware family going forward.

I’ve heard Pat Gelsinger say many times that VMware wants to aggressively “automate everything.” With Avi, we’re one step closer to meeting this objective. The VMware and Avi Networks teams will work together to advance our Virtual Cloud Network vision, build out our full stack L2-7 services, and deliver the public cloud experience for on-prem environments. We will introduce the Avi platform Continue reading

Kernel of Truth season 2 episode 10: Practical open networking

Subscribe to Kernel of Truth on iTunes, Google Play, SpotifyCast Box and Sticher!

Click here for our previous episode.

But wait, there’s more! If you keep up with our podcast you may have noticed the previous episode where we talk about what open networking was, so why are we chatting about it again? Last time we talked about having open API’s and having the demarcation point between components but in this podcast, we’re extending the conversation out to show how everyone can take advantage of open networking in a wider, practical sense. Guests Rama Darbha and Roopa Prabhu join host Brian to share their thoughts, experiences and expertise on the subject. Listen, enjoy, and feel free to comment away here or on our social media channels if you have any questions or thoughts to add.

Guest Bios

Brian O’Sullivan: Brian currently heads Product Management for Cumulus Linux. For 15 or so years he’s held software Product Management positions at Juniper Networks as well as other smaller companies. Once he saw the change that was happening in the networking space, he decided to join Cumulus Networks to be a part of the open networking innovation. When not working, Brian is Continue reading

IPv6 Buzz 030: Overcoming The Big 3 Objections To IPv6 Adoption

Objections to IPv6 adoption tend to follow three tracks: we don't need it, we don't have budget, and we'll lose the security and multihoming benefits of NAT. On today's IPv6 Buzz podcast, Dr. David Holder explains why these objections don't hold water, and how to communicate with business and technical leaders to overcome them.

The post IPv6 Buzz 030: Overcoming The Big 3 Objections To IPv6 Adoption appeared first on Packet Pushers.

Smarter IoT concepts reveal creaking networks

The internet of things (IoT) needs its own infrastructure ecosystem — one that doesn't use external clouds at all, researchers at the University of Magdeburg say.The computer scientists recently obtained funding from the German government to study how to build a future-generation of revolutionary, emergent IoT systems. They say networks must be fault tolerant, secure, and traverse disparate protocols, which they aren't now.[ Read also: What is edge computing? and How edge networking and IoT will reshape data centers ] The researchers say a smarter, unique, and organic infrastructure needs to be developed for the IoT and that simply adapting the IoT to traditional networks won't work. They say services must self-organize and function autonomously and that people must accept the fact that we are using the internet in ways never originally intended. To read this article in full, please click here

The Network is the Computer: A Conversation with John Gage

The Network is the Computer: A Conversation with John Gage
The Network is the Computer: A Conversation with John Gage

To learn more about the origins of The Network is the Computer®, I spoke with John Gage, the creator of the phrase and the 21st employee of Sun Microsystems. John had a key role in shaping the vision of Sun and had a lot to share about his vision for the future. Listen to our conversation here and read the full transcript below.


[00:00:13]

John Graham-Cumming: I’m talking to John Gage who was what, the 21st employee of Sun Microsystems, which is what Wikipedia claims and it also claims that you created this phrase “The Network is the Computer,” and that's actually one of the things I want to talk about with you a little bit because I remember when I was in Silicon Valley seeing that slogan plastered about the place and not quite understanding what it meant. So do you want to tell me what you meant by it or what Sun meant by it at the time?

[00:00:40]

John Gage: Well, in 2019, recalling what it meant in 1982 or 83’ will be colored by all our experience since then but at the time it seemed so obvious that when we introduced the first scientific workstations, they Continue reading

The Network is the Computer: A Conversation with Ray Rothrock

The Network is the Computer: A Conversation with Ray Rothrock
The Network is the Computer: A Conversation with Ray Rothrock

Last week I spoke with Ray Rothrock, former Director of CAD/CAM Marketing at Sun Microsystems, to discuss his time at Sun and how the Internet has evolved. In this conversation, Ray discusses the importance of trust as a principle, the growth of Sun in sales and marketing, and that time he gave Vice President Bush a Sun demo. Listen to our conversation here and read the full transcript below.

[00:00:07]

John Graham-Cumming: Here I am very lucky to get to talk with Ray Rothrock who was I think one of the first investors in Cloudflare, a Series A investor and got the company a little bit of money to get going, but if we dial back a few earlier years than that, he was also at Sun as the Director of CAD/CAM Marketing. There is a link between Sun and Cloudflare. At least one, but probably more than one, which is that Cloudflare has recently trademarked, “The Network is the Computer”. And that was a Sun trademark, wasn’t it?

[00:00:43]

Ray Rothrock: It was, yes.

[00:00:46]

Graham-Cumming: I talked to John Gage and I asked him about this as well and I asked him to explain to me what it Continue reading