Last Monday, I decided to review and merge the “VXLAN on Cumulus Linux 5.x with NVUE” pull request. I usually run integration tests on the modified code to catch any remaining gremlins, but this time, all the integration tests started failing during the VM creation phase. I was completely weirded out, considering everything worked a week ago.
Fortunately, Vagrant debugging is pretty good1 and I was quickly able to pinpoint the issue (full printout):
Methods of steering traffic into MPLS and Segment Routing LSP is one of the least standardized and most confusing parts of traffic engineering.
Despite some existing interoperability issues, in general, the MPLS and SR control …
Donatas Abraitis asked me to spread the word about the first ever Baltic NOG meeting in the second half of September 2025 (more details)
If you were looking for a nice excuse to visit that part of Europe (it’s been on my wish list for a very long time), this might be a perfect opportunity to do it 😎.
On a tangential topic of fascinating destinations 😉, there’s also ITNOG in Bologna (May 19th-20th, 2025), Autocon in Prague (May 26th-30th, 2025), and SWINOG in Bern (late June 2025).
Civil society organizations have always been at the forefront of humanitarian relief efforts, as well as safeguarding civil and human rights. These organizations play a large role in delivering services during crises, whether it is fighting climate change, support during natural disasters, providing health services to marginalized communities and more.
What do many of these organizations have in common? Many times, it’s cyber attacks from adversaries looking to steal sensitive information or disrupt their operations. Cloudflare has seen this firsthand when providing free cybersecurity services to vulnerable groups through programs like Project Galileo, and found that in aggregate, organizations protected under the project experience an average of 95 million attacks per day. While cyber attacks are a problem across all industries in the digital age, civil society organizations are disproportionately targeted, many times due to their advocacy, and because attackers know that they typically operate with limited resources. In most cases, these organizations don’t even know they have been attacked until it is too late.
Over the last 10 years of Project Galileo, we’ve had the opportunity to work more closely with leading civil society organizations. This has led to a number of exciting new partnerships, Continue reading
The results of netlab integration tests are stored in YAML files, making it easy to track changes improvements with Git. However, once I added the time of test and netlab version to the test results, I could no longer use git diff to figure out which test results changed after a test run – everything changed.
For example, these are partial test results from the OSPFv2 tests:
Sequence-to-sequence (seq2seq) language translation and Generative Pretrained Transformer (GPT) models are subcategories of Natural Language Processing (NLP) that utilize the Transformer architecture. Seq2seq models are typically using Long Short-Term Memory (LSTM) networks or encoder-decored based Transformers. In contrast, GPT is an autoregressive language model that uses decoder-only Transformer mechanism. The purpose of this chapter is to provide an overview of the decoder-only Transformer architecture.
The Transformer consists of stacks of decoder modules. A word embedding vector, a result of the word tokenization and embbeding, is fed as input to the first decoder module. After processing, the resulting context vector is passed to the next decodeer, and so on. After the final decoder, a softmax layer evaluates the output against the complete vocabulary to predict the next word. As an autoregressive model, the predicted word vector from the softmax layer is converted into a token before being fed back into the subsequent decoder layer. This process involves a token-to-word vector transformation prior to re-entering the decoder.
Each decoder module consists of an attention layer, Add & Normalization layer and a feedforward neural network (FFNN). Rather than feeding the embedded word vector (i.e., token embedding plus positional encoding) directly Continue reading
We often try to “institutionalize” things that work into repeatable processes—and most of the time, it doesn’t work. The process ends up becoming unwieldy, eventually failing to prevent failures and stifling innovation. How can we get out of this rut? Differentiating between architecture and process. Far too many IT shops try to replace architecture with process. Our second topic for this episode is the destructive lies of the tool trope. Tools are not “neutral,” they impact the way we think and work. A primary example of a tool that can often reshape our thinking and doing in very negative ways is … the process.
Depending on your configuration, the Linux kernel can produce a hung task warning message in its log. Searching the Internet and the kernel documentation, you can find a brief explanation that the kernel process is stuck in the uninterruptable state and hasn’t been scheduled on the CPU for an unexpectedly long period of time. That explains the warning’s meaning, but doesn’t provide the reason it occurred. In this blog post we’re going to explore how the hung task warning works, why it happens, whether it is a bug in the Linux kernel or application itself, and whether it is worth monitoring at all.
The hung task message in the kernel log looks like this:
INFO: task XXX:1495882 blocked for more than YYY seconds.
Tainted: G O 6.6.39-cloudflare-2024.7.3 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:XXX state:D stack:0 pid:1495882 ppid:1 flags:0x00004002
. . .
Processes in Linux can be in different states. Some of them are running or ready to run on the CPU — they are in the TASK_RUNNING
state. Others are waiting for some signal or event to happen, e.g. network packets to arrive or terminal input Continue reading
This is my response to Rob Pike’s words On Bloat.
I’m not surprised to see this from Pike. He’s a NIH extremist. And yes, in this aspect he’s my spirit animal when coding for fun. I’ll avoid using a framework or a dependency because it’s not the way that I would have done it, and it doesn’t do it quite right… for me.
And he correctly recognizes the technical debt that an added dependency involves.
But I would say that he has two big blind spots.
He doesn’t recognize that not using the available dependency is also adding huge technical debt. Every line of code you write is code that you have to maintain, forever.
The option for most software isn’t “use the dependency” vs “implement it yourself”. It’s “use the dependency” vs “don’t do it at all”. If the latter means adding 10 human years to the product, then most of the time the trade-off makes it not worth doing at all.
He shows a dependency graph of Kubernetes. Great. So are you going to write your own Kubernetes now?
Pike is a good enough coder that he can write his own editor (wikipedia: “Pike has written many text Continue reading
As we kick off the new year, we’re excited to introduce the latest updates to Calico, designed to create a single, unified platform for all your Kubernetes networking, security, and observability needs. These new features help organizations reduce tool sprawl, streamline operations, and lower costs, making it more convenient and efficient to manage Kubernetes environments.
In this blog, we’ll highlight some of the most exciting additions that include a major new product capability, an ingress gateway.
Managing and securing traffic in Kubernetes environments is one of the most complex and critical challenges organizations face today. With more than 60% of enterprises having adopted Kubernetes, according to an annual CNCF survey, controlling and optimizing how external traffic enters clusters is more important than ever. As applications grow in scale and complexity, legacy ingress solutions often fall short, plagued by operational inefficiencies, reliance on proprietary APIs, limited scalability, and difficulty in customization. These limitations make it difficult for teams to maintain consistent performance and robust security across their environments.
To address these challenges, we’re excited to introduce the Calico Ingress Gateway, an enterprise hardened, 100% upstream distribution of Envoy Gateway that leverages and expands the Continue reading
Audit logs are a critical tool for tracking and recording changes, actions, and resource access patterns within your Cloudflare environment. They provide visibility into who performed an action, what the action was, when it occurred, where it happened, and how it was executed. This enables security teams to identify vulnerabilities, ensure regulatory compliance, and assist in troubleshooting operational issues. Audit logs provide critical transparency and accountability. That's why we're making them "automatic" — eliminating the need for individual Cloudflare product teams to manually send events. Instead, audit logs are generated automatically in a standardized format when an action is performed, providing complete visibility and ensuring comprehensive coverage across all our products.
We're excited to announce the beta release of Automatic Audit Logs — a system that unifies audit logging across Cloudflare products. This new system is designed to give you a complete and consistent view of your environment’s activity. Here’s how we’ve enhanced our audit logging capabilities:
Standardized logging: Previously, audit logs generation was dependent on separate internal teams, which could lead to gaps and inconsistencies. Now, audit logs are automatically produced in a seamless and standardized way, eliminating Continue reading