If you’re a marketer, advertiser, or a business owner that runs your own website, there’s a good chance you’ve used Google tags in order to collect analytics or measure conversions. A Google tag is a single piece of code you can use across your entire website to send events to multiple destinations like Google Analytics and Google Ads.
Historically, the common way to deploy a Google tag meant serving the JavaScript payload directly from Google’s domain. This can work quite well, but can sometimes impact performance and accurate data measurement. That’s why Google developed a way to deploy a Google tag using your own first-party infrastructure using server-side tagging. However, this server-side tagging required deploying and maintaining a separate server, which comes with a cost and requires maintenance.
That’s why we’re excited to be Google’s launch partner and announce our direct integration of Google tag gateway for advertisers, providing many of the same performance and accuracy benefits of server-side tagging without the overhead of maintaining a separate server.
Any domain proxied through Cloudflare can now serve your Google tags directly from that domain. This allows you to get better measurement signals for your website and can enhance your Continue reading
After inspecting the confusing bridging/routing/switching terminology and a brief detour into the control/data plane details, let’s talk about how packets actually move across a network.
As always, things were simpler when networks were implemented with a single cable. In that setup, all nodes were directly reachable, and the only challenge was figuring out the destination node’s address; it didn’t matter whether it was a MAC address, an IP address, or a Fiber Channel address. On a single cable, you could just broadcast, like, “Who has this service?” and someone would reply, “I’m the printer you’re looking for.” That’s how many early non-IP protocols operated.
At Cloudflare, we do everything we can to avoid interruption to our services. We frequently deploy new versions of the code that delivers the services, so we need to be able to restart the server processes to upgrade them without missing a beat. In particular, performing graceful restarts (also known as "zero downtime") for UDP servers has proven to be surprisingly difficult.
We've previously written about graceful restarts in the context of TCP, which is much easier to handle. We didn't have a strong reason to deal with UDP until recently — when protocols like HTTP3/QUIC became critical. This blog post introduces udpgrm, a lightweight daemon that helps us to upgrade UDP servers without dropping a single packet.
Here's the udpgrm GitHub repo.
In the early days of the Internet, UDP was used for stateless request/response communication with protocols like DNS or NTP. Restarts of a server process are not a problem in that context, because it does not have to retain state across multiple requests. However, modern protocols like QUIC, WireGuard, and SIP, as well as online games, use stateful flows. So what happens to the state associated with a flow when a server process is Continue reading
Dr. Tony Przygienda left a very valid (off-topic) comment to my Breaking APIs or Data Models Is a Cardinal Sin blog post:
If, on the other hand, the customers would not camp for literally tens of years on regex scripts scraping screens, lots of stuff could progress much faster.
He’s right, particularly from Juniper’s perspective; they were the first vendor to use a data-driven approach to show commands. Unfortunately, we’re still not living in a perfect world:
In a previous blog post, I described how OSPF route selection rules prevent a summary LSA from being inserted back into an area from which it was generated. That works nicely for area prefixes turned directly into summary LSAs, but how does the loop prevention logic work for summarized prefixes (what OSPF calls area ranges)?
TL&DR: It doesn’t, unless… ;)
Today the following network operating systems include support for the drop notification extension in their sFlow agent implementations:
Two additional sFlow dropped packet notification implementations are in the pipeline and should be available later this year:
Has your browsing experience ever been disrupted by this error page? Sometimes Cloudflare returns "Error 500" when our servers cannot respond to your web request. This inability to respond could have several potential causes, including problems caused by a bug in one of the services that make up Cloudflare's software stack.
We know that our testing platform will inevitably miss some software bugs, so we built guardrails to gradually and safely release new code before a feature reaches all users. Health Mediated Deployments (HMD) is Cloudflare’s data-driven solution to automating software updates across our global network. HMD works by querying Thanos, a system for storing and scaling Prometheus metrics. Prometheus collects detailed data about the performance of our services, and Thanos makes that data accessible across our distributed network. HMD uses these metrics to determine whether new code should continue to roll out, pause for further evaluation, or be automatically reverted to prevent widespread issues.
Cloudflare engineers configure signals from their service, such as alerting rules or Service Level Objectives (SLOs). For example, the following Service Level Indicator (SLI) checks the rate of HTTP 500 errors over 10 minutes returned from a service in our software stack.
sum(rate(http_request_count{code="500"}[10m])) Continue reading
In many ways, IS-IS is a simpler, and perhaps more elegant, routing protocol than OSPF. However, it often gets misunderstood. Perhaps due to its roots in OSI or perhaps because it’s not as widely deployed as OSPF. Some of the confusion come from how it behaves on multi-access networks. What is a Designated Intermediate System (DIS)? Why do we need a pseudonode? How do we flood Link State PDUs (LSPs)? In this post, I’ll cover all of that and more. This is going to be a deep dive so save this blog for when you have some time to focus.
IS-IS is a link state protocol, which means that we need to build a link state database that describes how all the intermediate systems (routers) are interconnected, and what prefixes they are associated with. Simply put, we need to build a graph. Let’s do a quick recap of graph theory.
A graph consists of vertices (nodes) and edges (links). When referring to a single node, it’s called a vertex. This is shown below:
There are different types of graphs. They can be undirected and unweighted:
With this type of graph, there is no weight assigned and there is no way Continue reading
When building a scalable, resilient GPU network fabric, the design of the rail layer, the portion of the topology that interconnects GPU servers via Top-of-Rack (ToR) switches, plays a critical role. This section explores three different models: Multi-rail-per-switch, Dual-rail-per-switch, and Single-rail-per-switch. All three support dual-NIC-per-GPU designs, allowing each GPU to connect redundantly to two separate switches, thereby removing the Rail switch as a single point of failure.
In this model, multiple small subnets and VLANs are configured per switch, with each logical rail mapped to a subset of physical interfaces. For example, a single 48-port switch might host four or eight logical rails using distinct Layer 2 and Layer 3 domains. Because all logical rails share the same physical device, isolation is logical. As a result, a hardware or software failure in the switch can impact all rails and their associated GPUs, creating a large failure domain.
This model is not part of NVIDIA’s validated Scalable Unit (SU) architecture but may suit test environments, development clusters, or small-scale GPU fabrics where hardware cost efficiency is a higher priority than strict fault isolation. From a CapEx perspective, multi-rail-per-switch is the most economical, requiring fewer switches.
Figure 13-10 illustrates the Continue reading
Hello my friend,
Today’s topic is critical to complete full picture of software development for network automation. Today’s topic is what allows you to conduct your tasks within meaningful time frame, especially when you have a lot of network devices, servers, virtual machines to manage. Today’s topic is concurrency of code execution in Python and Golang.
There are more than 100 programming languages out there. Some of them are quite universal and allow development of almost any kind of application. Others are more specific. Python is probably the most universal programming language, from what I’ve worked with or heard of. It can be used in infrastructure management at scale (e.g., OpenStack is written in Python), web applications, data science and many more. Golang is much more low-level compared to Python and, therefore, way more performant. Various benchmarks available online suggests that the same business tasks could be 3-30 times quicker in Golang compared to Python; therefore Golang is suitable for system programing (e.g, Kubernetes and Docker are created in Go). That’s what we cover in our blogs. Apart from them there are a lot of other: C/C++ if you Continue reading