Video: netlab IP Address Management (IPAM)

Did you know that netlab includes full-blown IP address management? You can define address pools (or use predefined ones) and get IPv4 and IPv6 prefixes from those pools assigned to links, interfaces, and loopbacks. You can also assign static prefixes to links, use static IP addresses, interface addresses as an offset within the link subnet, or use unnumbered interfaces.

For an overview of netlab IPAM, watch the netlab address management video (part of the Network Automation Tools webinar), for more details read the netlab addressing tutorial.

You need Free ipSpace.net Subscription to watch the video and Standard ipSpace.net Subscription to watch the rest of the webinar.

Life is Life

The initial idea behind this blog was to have a medium to store and share notes on the different technologies I worked on in an searchable manner. I have decided to step back from work and take a year out so this new life tab of the blog will be for all things non-IT related. I still plan to write technology based blogs over this time (got a few automation projects and Azure tips to share), however this is unlikely to start happening until later into next year.

While you sleep, Automate resolving Dynatrace problem alerts and report them to ServiceNow!

Integrating observability tools with automation is paramount in the realm of modern IT operations, as it fosters a symbiotic relationship between visibility and efficiency. Observability tools provide deep insights into the performance, health, and behavior of complex systems, enabling organizations to proactively identify and rectify issues before they escalate. 

When seamlessly integrated with automation frameworks, these tools empower businesses to not only monitor but also respond to dynamic changes in real time. This synergy between observability and automation enables IT teams to swiftly adapt to evolving conditions, minimize downtime, and optimize resource utilization. By automating responses based on observability data, organizations can enhance their agility, reduce manual intervention, and maintain a robust and resilient infrastructure. In essence, using observability with automation is indispensable for achieving a proactive, responsive, and streamlined operational environment in the fast-paced and complex landscape of today’s technology.

In this blog post, we will look at a common use case involving the monitoring of processes on both bare metal and virtual machines. Our exploration will focus on utilizing Dynatrace's OneAgent, a deployed binary file on hosts that encompasses a suite of specialized services meticulously configured for environment monitoring. These services actively gather telemetry metrics, capturing insights into Continue reading

ML Ops Platform at Cloudflare

ML Ops Platform at Cloudflare

We've been relying on ML and AI for our core services like Web Application Firewall (WAF) since the early days of Cloudflare. Through this journey, we've learned many lessons about running AI deployments at scale, and all the tooling and processes necessary. We recently launched Workers AI to help abstract a lot of that away for inference, giving developers an easy way to leverage powerful models with just a few lines of code. In this post, we’re going to explore some of the lessons we’ve learned on the other side of the ML equation: training.

Cloudflare has extensive experience training models and using them to improve our products. A constantly-evolving ML model drives the WAF attack score that helps protect our customers from malicious payloads. Another evolving model power bot management product to catch and prevent bot attacks on our customers. Our customer support is augmented by data science. We build machine learning to identify threats with our global network. To top it all off, Cloudflare is delivering machine learning at unprecedented scale across our network.

Each of these products, along with many others, has elevated ML models — including experimentation, training, and deployment — to a crucial position within Continue reading

AMS-IX Outage: Layer-2 Strikes Again

On November 22nd, 2023, AMS-IX, one of the largest Internet exchanges in Europe, experienced a significant performance drop lasting more than four hours. While its peak performance is around 10 Tbps, it dropped to about 2.1 Tbps during the outage.

AMS-IX published a very sanitized and diplomatic post-mortem incident summary in which they explained the outage was caused by LACP leakage. That phrase should be a red flag, but let’s dig deeper into the details.

AMS-IX Outage: Layer-2 Strikes Again

On November 22nd, 2023, AMS-IX, one of the largest Internet exchanges in Europe, experienced a significant performance drop lasting more than four hours. While its peak performance is around 10 Tbps, it dropped to about 2.1 Tbps during the outage.

AMS-IX published a very sanitized and diplomatic post-mortem incident summary in which they explained the outage was caused by LACP leakage. That phrase should be a red flag, but let’s dig deeper into the details.

AMD Is The Undisputed Datacenter GPU Performance Champ – For Now

There is nothing quite like great hardware to motivate people to create and tune software to take full advantage of it during a boom time.

The post AMD Is The Undisputed Datacenter GPU Performance Champ – For Now first appeared on The Next Platform.

AMD Is The Undisputed Datacenter GPU Performance Champ – For Now was written by Timothy Prickett Morgan at The Next Platform.

D2C223: Accelerating VM Performance With Azure Boost

Azure Boost is a hardware offload of Azure virtual machines designed to improve VM performance. On today's Day Two Cloud we dig into how it works. We also talk about how to implement security in Virtual Network Manager, as well as how to optimize your Azure observability--meaning, how not to blow up your budget with unnecessary logging.

The post D2C223: Accelerating VM Performance With Azure Boost appeared first on Packet Pushers.

What’s Inside China’s New Homegrown “Tianhe Xingyi” Supercomputer?

Note: We will be updating this story with more information once our contacts in China are awake:

China is using a domestic processor as the backbone for double the performance of the Tianhe-2 system, which topped the Top 500 starting in 2013 and running through late 2015 before being overshadowed by the Sunway system in recent years.

The post What’s Inside China’s New Homegrown “Tianhe Xingyi” Supercomputer? first appeared on The Next Platform.

What’s Inside China’s New Homegrown “Tianhe Xingyi” Supercomputer? was written by Nicole Hemsoth Prickett at The Next Platform.

How we used OpenBMC to support AI inference on GPUs around the world

Cloudflare recently announced Workers AI, giving developers the ability to run serverless GPU-powered AI inference on Cloudflare’s global network. One key area of focus in enabling this across our network was updating our Baseboard Management Controllers (BMCs). The BMC is an embedded microprocessor that sits on most servers and is responsible for remote power management, sensors, serial console, and other features such as virtual media.

To efficiently manage our BMCs, Cloudflare leverages OpenBMC, an open-source firmware stack from the Open Compute Project (OCP). For Cloudflare, OpenBMC provides transparent, auditable firmware. Below describes some of what Cloudflare has been able to do so far with OpenBMC with respect to our GPU-equipped servers.

Ouch! That’s HOT!

For this project, we needed a way to adjust our BMC firmware to accommodate new GPUs, while maintaining the operational efficiency with respect to thermals and power consumption. OpenBMC was a powerful tool in meeting this objective.

OpenBMC allows us to change the hardware of our existing servers without the dependency of our Original Design Manufacturers (ODMs), consequently allowing our product teams to get started on products quickly. To physically support this effort, our servers need to be able to supply enough power and keep Continue reading

How to Interview a Network Engineer Using a Single Scenario

Most organizations are terribly bad at interviewing people. They overcomplicate things by holding too many interviews (more than 2-3) and often focus their interview on trivia and memorization rather than walking through a scenario. Every interview should have some form of a scenario and a whiteboard if you are hiring a Network Engineer. Rather than overcomplicating things, here’s how you can interview someone using a single scenario that you can expand on and go to different depths at different stages depending on the focus of the role.

Scenario:

You are an employee working in a large campus network. Your computer has just started up and has not previously communicated with anything before you open your browser and type in microsoft.com.

Before any communication can take place, you need an IP address. What IP protocols are there? What are the main differences between the two?

Things to look for: IPv4 vs IPv6. ARP vs ND. DHCP vs RA. Broadcast vs multicast.

What methods are there of configuring an IP address?

Things to look for: Static IP vs DHCP vs RA.

When I need to communicate to something external, traffic goes through a gateway. What type of device would Continue reading