Every major economy that is not the United States or China, which has a disproportionate share of HPC national labs as well as hyperscaler and cloud builder tech titans, wants AI sovereignty a whole lot more than they ever worried about HPC simulation and modeling. …
Brazil Lays The Hardware Foundation For Its AI Ambitions was written by Timothy Prickett Morgan at The Next Platform.
Previous posts Why Go is not my favourite language and Go programs are not portable have me critiquing Go for over a decade.
These things about Go are bugging me more and more. Mostly because they’re so unnecessary. The world knew better, and yet Go was created the way it was.
For readers of previous posts you’ll find some things repeated here. Sorry about that.
Here’s an example of the language forcing you to do the wrong thing. It’s very helpful for the reader of code (and code is read more often than it’s written), to minimize the scope of a variable. If by mere syntax you can tell the reader that a variable is just used in these two lines, then that’s a good thing.
Example:
if err := foo(); err != nil {
return err
}
(enough has been said about this verbose repeated boilerplate that I don’t have to. I also don’t particularly care)
So that’s fine. The reader knows err
is here and only here.
But then you encounter this:
bar, err := foo()
if err != nil {
return err
}
if err = Continue reading
Hi all, welcome back to the AWS networking series. This is actually part 3 of just Transit Gateway. I know some of you might be thinking, why are we still talking about Transit Gateway? But please bear with me. TGW is such an important concept, and it shows up in almost every architecture you come across.
So far, we've covered what a Transit Gateway is, how to create one, how route tables work, and how to manage associations and propagations. We also looked at how to create a VPN and attach it to the TGW, and we went through the process of sharing a TGW with other AWS accounts using AWS Resource Access Manager (RAM). In this post, we'll look at how to peer a Transit Gateway with another TGW, even when they are in different regions. So let's get to it.
If you're completely new to Transit Gateway, I highly recommend checking out the earlier introductory posts listed below.
In this article we will use Multipass to create a virtual machine to experiment with pwru. Multipass is a command line tool for running Ubuntu virtual machines on Mac or Windows. Multipass uses the native virtualization capabilities of the host operating system to simplify the creation of virtual machines.
multipass launch --name=ebpf noble multipass exec ebpf -- sudo apt update multipass exec ebpf -- sudo apt -y install git clang llvm make libbpf-dev flex bison golang multipass exec ebpf -- git clone https://github.com/cilium/pwru.git multipass exec ebpf --working-directory pwru -- make multipass exec ebpf -- sudo ./pwru/pwru -hRun the commands above to create the virtual machine and build pwru from sources.
multipass exec ebpf -- sudo ./pwru/pwru port httpsRun pwru to trace https traffic on the virtual machine.
multipass exec ebpf -- curl https://sflow-rt.comIn a second window, run the above command to generate an https request from the virtual machine.
SKB CPU PROCESS NETNS MARK/x IFACE PROTO MTU LEN TUPLE FUNC 0xffff9fc40335a0e8 0 ~r/bin/curl:8966 4026531840 0 0 Continue reading
Welcome to Technology Short Take #186! Yes, it’s been quite a while since I published a Technology Short Take; life has “gotten in the way,” so to speak, of gathering links to share with all of you. However, I think this crazy phase of my life is about to start settling down (I hope so, anyway), and I’m cautiously optimistic that I’ll be able to pick up the blogging pace once again. For now, though, here’s a collection of links I’ve gathered since the last Technology Short Take. I hope you find something useful here!
Many of us old timers (and a lot of young timers) worry about the future of networking. What if the future isn’t a technology, or even AI, but a change in focus? Mike Bushong joins Tom and Russ to argue for operations as the future of networking.
download
As I was running the netlab pre-release integration tests, I noticed that ArubaCX failed the IPv6 Common Services test (it worked before). Here’s the gist of what that test does:
Here’s the relevant part of the netlab lab topology:
This week, Amazon Web Services announced the availability of its first UltraServer pre-configured supercomputers based on Nvidia’s “Grace” CG100 CPUs and its “Blackwell” B200 GPUs in what is called a GB200 NVL72 shared GPU memory configuration. …
Sizing Up AWS “Blackwell” GPU Systems Against Prior GPUs And Trainiums was written by Timothy Prickett Morgan at The Next Platform.
One of the biggest questions that enterprises, governments, academic institutions, and HPC centers the world over are going to have to answer very soon – if they have not made the decision already – is if they are going to train their own AI models and the inference software stacks that make them useful or just buy them from third parties and get to work integrating AI with their applications a lot faster. …
Will Companies Build Or Buy Their GenAI Models? was written by Timothy Prickett Morgan at The Next Platform.
The metrics include:
Note: InfluxDB Cloud has a free service tier that can be used to test this example.
Save the following compose.yml file on a system running Docker.
configs: config.telegraf: content: | [agent] interval = '15s' round_interval = true omit_hostname = true [[outputs.influxdb_v2]] urls = ['https://<INFLUXDB_CLOUD_INSTANCE>.cloud2.influxdata.com'] Continue reading
Quicksilver is a key-value store developed internally by Cloudflare to enable fast global replication and low-latency access on a planet scale. It was initially designed to be a global distribution system for configurations, but over time it gained popularity and became the foundational storage system for many products in Cloudflare.
A previous post described how we moved Quicksilver to production and started replicating on all machines across our global network. That is what we called Quicksilver v1: each server has a full copy of the data and updates it through asynchronous replication. The design served us well for some time. However, as our business grew with an ever-expanding data center footprint and a growing dataset, it became more and more expensive to store everything everywhere.
We realized that storing the full dataset on every server is inefficient. Due to the uniform design, data accessed in one region or data center is replicated globally, even if it's never accessed elsewhere. This leads to wasted disk space. We decided to introduce a more efficient system with two new server roles: replica, which stores the full dataset and proxy, which acts as a persistent cache, evicting unused key-value pairs to free Continue reading
AI has the power to transform how organizations derive insights, make decisions, and unlock value, but all that depends on the quality of the data. …
How BigQuery Combines Data And AI For Business Transformation was written by Timothy Prickett Morgan at The Next Platform.
The European Union cannot practically declare its independence from Nvidia GPUs any more than any other nation can at this point. …
With Money And Rhea1 Tapeout, SiPearl Gets Real About HPC CPUs was written by Timothy Prickett Morgan at The Next Platform.
In the previous post, we covered the basics of Transit Gateway, what it is, what problem it solves, and we also looked at how to create one. We walked through attaching two VPCs to the TGW and establishing connectivity between them. We also covered the important concepts of TGW attachments, associations, and propagations.
In this post, we will build on that knowledge and look at
As always, if you find this post helpful, press the ‘clap’ button. It means a lot to me and helps me know you enjoy this type of content. If I get enough claps for this series, I’ll make sure to write more on this specific topic.
We have already seen how to create a Site-to-Site Continue reading
Did you ever wonder why pressing an up-arrow in a (Linux) terminal window sometimes recalls the previous command but other times creates ^[[A
?
Julia Evans did, and spent months exploring the quirks of the Linux terminal (and writing blog posts describing what she found), finally resulting in The Secret Rules of the Terminal (including the various shells, terminal emulators, escape codes, and TTY driver). A must-read if you’re a newbie who wants to understand why things happen the way they do.
There are definitely easier businesses to be in than operating a neocloud. …
Only The Biggest Neoclouds Will Survive was written by Timothy Prickett Morgan at The Next Platform.
At Cloudflare, PostgreSQL and ClickHouse are our standard databases for transactional and analytical workloads. If you’re part of a team building products with configuration in our Dashboard, chances are you're using PostgreSQL. It’s fast, versatile, reliable, and backed by over 30 years of development and real-world use. It has been a foundational part of our infrastructure since the beginning, and today we run hundreds of PostgreSQL instances across a wide range of configurations and replication setups.
ClickHouse is a more recent addition to our stack. We started using it around 2017, and it has enabled us to ingest tens of millions of rows per second while supporting millisecond-level query performance. ClickHouse is a remarkable technology, but like all systems, it involves trade-offs.
In this post, I’ll explain why we chose TimescaleDB — a Postgres extension — over ClickHouse to build the analytics and reporting capabilities in our Zero Trust product suite.
After a decade in software development, I’ve grown to appreciate systems that are simple and boring. Over time, I’ve found myself consistently advocating for architectures with the fewest moving parts possible. Whenever I see a system diagram with more than three boxes, I ask: Why Continue reading
Rolling out network policies in a live Kubernetes cluster can feel like swapping wings mid-flight—one typo or overly broad rule and critical traffic is grounded. Calico’s Staged Network Policies remove the turbulence by letting you deploy policies in staged mode, so you can observe their impact before enforcing anything. Add Whisker, the open-source policy enforcement and testing tool (introduced as part of Calico Open Source 3.30) that captures every flow and tags it with a policy verdict, and you’ve got a safety harness that proves your change is sound long before you flip the switch. In this post, we’ll walk you through how you can leverage these capabilities to tighten security, validate intent, and ship changes confidently—without a single packet of downtime.
Calico for Policy is a CNI agnostic tool. Refer to the Calico Open Source docs for a list of supported CNIs. The git repository for this blog post can be found here.
For this post, let’s deploy a simple AKS cluster with Azure CNI.
## Configure az group create --name calicooss --location eastus2 ## Create a 3 node AKS cluster with Azure CNI az aks create \ --resource-group calicooss \ --name Continue reading