>At IETF 118 in November 2023 I attended the meeting of the Measurement and Analysis of Protocols Research Group, and here are my impressions from that meeting.
The initial idea behind this blog was to have a medium to store and share notes on the different technologies I worked on in an searchable manner. I have decided to step back from work and take a year out so this new life tab of the blog will be for all things non-IT related. I still plan to write technology based blogs over this time (got a few automation projects and Azure tips to share), however this is unlikely to start happening until later into next year.
A real-time, dynamic, and high-definition network map can often provide information about whether the network is functioning properly or whether there are problems with any particular device.
Integrating observability tools with automation is paramount in the realm of modern IT operations, as it fosters a symbiotic relationship between visibility and efficiency. Observability tools provide deep insights into the performance, health, and behavior of complex systems, enabling organizations to proactively identify and rectify issues before they escalate.
When seamlessly integrated with automation frameworks, these tools empower businesses to not only monitor but also respond to dynamic changes in real time. This synergy between observability and automation enables IT teams to swiftly adapt to evolving conditions, minimize downtime, and optimize resource utilization. By automating responses based on observability data, organizations can enhance their agility, reduce manual intervention, and maintain a robust and resilient infrastructure. In essence, using observability with automation is indispensable for achieving a proactive, responsive, and streamlined operational environment in the fast-paced and complex landscape of today’s technology.
In this blog post, we will look at a common use case involving the monitoring of processes on both bare metal and virtual machines. Our exploration will focus on utilizing Dynatrace's OneAgent, a deployed binary file on hosts that encompasses a suite of specialized services meticulously configured for environment monitoring. These services actively gather telemetry metrics, capturing insights into Continue reading
IPUs (Infrastructure Processing Units) are hardware accelerators designed to offload compute-intensive infrastructure tasks like packet processing, traffic shaping, and virtual switching from CPUs.
We've been relying on ML and AI for our core services like Web Application Firewall (WAF) since the early days of Cloudflare. Through this journey, we've learned many lessons about running AI deployments at scale, and all the tooling and processes necessary. We recently launched Workers AI to help abstract a lot of that away for inference, giving developers an easy way to leverage powerful models with just a few lines of code. In this post, we’re going to explore some of the lessons we’ve learned on the other side of the ML equation: training.
Each of these products, along with many others, has elevated ML models — including experimentation, training, and deployment — to a crucial position within Continue reading
On November 22nd, 2023, AMS-IX, one of the largest Internet exchanges in Europe, experienced a significant performance drop lasting more than four hours. While its peak performance is around 10 Tbps, it dropped to about 2.1 Tbps during the outage.
On November 22nd, 2023, AMS-IX, one of the largest Internet exchanges in Europe, experienced a significant performance drop lasting more than four hours. While its peak performance is around 10 Tbps, it dropped to about 2.1 Tbps during the outage.
SPONSORED POST: Edge security is a growing headache. The attack surface is expanding as more operational functions migrate out of centralized locations and into distributed sites and devices. …
Welcome to Day Two Cloud! In this episode with Microsoft, it’s November 2023, and I’m in Seattle for the Microsoft Ignite conference. If you think you heard this intro before and you’re wondering if you’re hearing the same episode again, you’re not. This is the second episode from Microsoft Ignite 2023. Today, I’m talking with... Read more »
Azure Boost is a hardware offload of Azure virtual machines designed to improve VM performance. On today's Day Two Cloud we dig into how it works. We also talk about how to implement security in Virtual Network Manager, as well as how to optimize your Azure observability--meaning, how not to blow up your budget with unnecessary logging.
Note: We will be updating this story with more information once our contacts in China are awake:
China is using a domestic processor as the backbone for double the performance of the Tianhe-2 system, which topped the Top 500 starting in 2013 and running through late 2015 before being overshadowed by the Sunway system in recent years. …
Cloudflare recently announced Workers AI, giving developers the ability to run serverless GPU-powered AI inference on Cloudflare’s global network. One key area of focus in enabling this across our network was updating our Baseboard Management Controllers (BMCs). The BMC is an embedded microprocessor that sits on most servers and is responsible for remote power management, sensors, serial console, and other features such as virtual media.
To efficiently manage our BMCs, Cloudflare leverages OpenBMC, an open-source firmware stack from the Open Compute Project (OCP). For Cloudflare, OpenBMC provides transparent, auditable firmware. Below describes some of what Cloudflare has been able to do so far with OpenBMC with respect to our GPU-equipped servers.
Ouch! That’s HOT!
For this project, we needed a way to adjust our BMC firmware to accommodate new GPUs, while maintaining the operational efficiency with respect to thermals and power consumption. OpenBMC was a powerful tool in meeting this objective.
OpenBMC allows us to change the hardware of our existing servers without the dependency of our Original Design Manufacturers (ODMs), consequently allowing our product teams to get started on products quickly. To physically support this effort, our servers need to be able to supply enough power and keep Continue reading
Most organizations are terribly bad at interviewing people. They overcomplicate things by holding too many interviews (more than 2-3) and often focus their interview on trivia and memorization rather than walking through a scenario. Every interview should have some form of a scenario and a whiteboard if you are hiring a Network Engineer. Rather than overcomplicating things, here’s how you can interview someone using a single scenario that you can expand on and go to different depths at different stages depending on the focus of the role.
Scenario:
You are an employee working in a large campus network. Your computer has just started up and has not previously communicated with anything before you open your browser and type in microsoft.com.
Before any communication can take place, you need an IP address. What IP protocols are there? What are the main differences between the two?
Things to look for: IPv4 vs IPv6. ARP vs ND. DHCP vs RA. Broadcast vs multicast.
What methods are there of configuring an IP address?
Things to look for: Static IP vs DHCP vs RA.
When I need to communicate to something external, traffic goes through a gateway. What type of device would Continue reading