Presto is a free, open source SQL query engine. We’ve been using it at Meta for the past ten years, and learned a lot while doing so. Running anything at scale - tools, processes, services - takes problem solving to overcome unexpected challenges. Here are four things we learned while scaling up Presto to Meta scale, and some advice if you’re interested in running your own queries at scale.
Scaling Presto rapidly to meet growing demands: What challenges did we face?
You can subscribe to the system design newsletter to excel in system design interviews and software architecture. The original article was published on systemdesign.one website.
What Is Gossip Protocol?
The typical problems in a distributed system are the following [1], [11]:
maintaining the system state (liveness of nodes)
communication between nodes
The potential solutions to these problems are as follows [1]:
Internal ELB — An internal Load balancer is not exposed to the internet and is deployed in a private subnet. A DNS record gets created, which will have a private-IP address of the load-balancer. It’s worth noting to know DNS records will be publicly resolvable. The main intention is to distribute traffic to EC2 instances. Across availability zones, provided all of them have access to VPCs.
External ELB — Also called an Internet-Facing Load Balancer and deployed in the Public subnet. Similar to Internal ELB, this can also be used to distribute and balance traffic across two availability zones.
High Availability Aspect: ELB (Can be any load-balancing) can distribute traffic across various different targets, including EC2-Instances, Containers, and IP addresses in either a single AZ or multiple AZs within a region.
Health Checks: An additional health check can be included to ensure that the end hos serving the application is healthy. This is typically done through HTTP status codes, with a return value 200 indicating a healthy host. If discrepancies are found during the health check, the ELB can gracefully drain traffic and divert it to another host in the target group. If auto-scaling is set up, it can also auto-scale as needed.
Network Design: Depending on the type of traffic and Application traffic pattern, the load and burst-rate choice of load-balancer will differ.
Various Features — High-Availability, High-Throughput, Health-Checks, Sticky-Sessions, Operational-Monitoring and Logging, Delete-Protection.
TLS Termination — You can also have integrated certificate management and SSL decryption which offloads end-host CPU load and acts as a central Continue reading
To Distribute incoming application traffic across EC2 instances / Containers / IP Addresses and these entities, generally, can be called targets and grouped as Target-Groups.
Another advantage is performing health checks, such as packet-loss or high-latency, and can be integrated with Auto-scaling, hence Elastic.
Depending on the ELB type and its operational requirement, this can further be subdivided into CLB, ALB, GLB, and NLB.
CLB — Classic Load Balancer
– AWS Recommends ALB today instead of CLB – Intended for EC2 instances which are built in EC2-Classic Network – Layer 4 or Layer 7 Load Balancing – Provides SSL Offloading and IPv6 support for Classic Networks
ALB — Application Load Balancer
Works at the 7th Layer of the OSI Model.
Supports applications that run in Containers
Supports Content-based Routing
HTTP/HTTPS, Mobile Apps, Containers in EC2, and Microservices benefit greatly from ALB
Why Elastic Load Balancer?
To Distribute incoming application traffic across EC2 instances / Containers / IP Addresses and these entities, generally, can be called targets and grouped as Target-Groups.
Another advantage is performing health checks, such as packet-loss or high-latency, and can be integrated with Auto-scaling, Continue reading
Note: This requires the purchase of a wireless router which is capable of running a Wireguard package in this case it’s Slate-Plus GL-A1300 and I do not have any affiliate or ads program with them, I simply liked it for its effectiveness and low cost.
The Need :
For one reason many of us want a VPN server which does decent encryption but won’t charge us a lot of money, in some cases, it can be done free of cost and in others for not want us to install a variety of software which messes up with internal client routing and also against some of the IT-Policies, even if it’s a browser-based plugin.
The Choice :
Wireguard: https://www.wireguard.com/ — VPN Software, Software-based encryption, extremely fast and light-weight.
GL-A1300 Slate-Plus — Wireless Router with support for Wireguard which is not a feature in many of the current market routers, had OpenWrt as the installed software.
Features
The GL-A1300 Slate Plus wireless VPN encrypted travel router comes packed with features that will make your life easier while travelling. Here are just a few of the most important:
A few weeks ago, I set up a bird feeder and used it to capture bird images, the classifier itself was not that accurate but was doing a decent job, what I have realised is that not every time we end up with highly accurate on-board edge classification especially while learning how to implement them.
So after a few weeks, there were a lot of images some of them sure enough had birds while some of them were taken in Pitch Dark and am not even sure what made the classifier figure out a birdie in the snapshot from the camera.
Now for me in order to make an Image, I have to rely on a re-classifier doing the job for me, initially, I thought I will write a lambda-based classifier as a learning experiment, but then I thought it was a one-time process every 6 months or so, so I went ahead with a managed service option and in this case, its AWS Rekognition, which is quite amazing.
Content delivery network (CDN) and cloud computing provider Akamai has opened three new internet point of presence (POP) data centers this week, and will open two more later in the quarter, as the company looks to take over a bigger portion of the public cloud market.The three new sites open as of this week are in Paris, Washington, D.C., and Chicago. Sites in Seattle and Chennai will follow in the coming months, Akamai said. The expansion is part of the company's push into the public cloud market dominated by incumbents like Google, Amazon and Microsoft, Akamai said.The sites were chosen carefully, according to a statement from Akamai. Washington, D.C., is one of the biggest data center hubs in the world, with Northern Virginia containing upward of half of the major data center capacity in the US. Chicago is a well-situated secondary site for latency-sensitive workloads running either locally or in nearby markets, including Philadelphia and Washington, while the Paris site represents a new option for organizations facing data sovereignty challenges posed by EU regulation.To read this article in full, please click here
On today’s Heavy Networking we talk LACP and link aggregation. While bonding two or more links together to act as a single virtual link has been done for decades, LACP and link aggregation aren't the same thing, and the distinction matters. Our guest to get into the differences is network instructor Tony Bourke.
On today’s Heavy Networking we talk LACP and link aggregation. While bonding two or more links together to act as a single virtual link has been done for decades, LACP and link aggregation aren't the same thing, and the distinction matters. Our guest to get into the differences is network instructor Tony Bourke.
MLCommons, a group that develops benchmarks for AI technology training algorithms, revealed the results for a new test that determines system speeds for training algorithms specifically used for the creation of chatbots like ChatGPT.MLPerf 3.0 is meant to provide an industry-standard set of benchmarks for evaluating ML model training. Model training can be a rather lengthy process, taking weeks and even months depending on the size of a data set. That requires an awful lot of power consumption, so training can get expensive.The MLPerf Training benchmark suite is a full series of tests that stress machine-learning models, software, and hardware for a broad range of applications. It found performance gains of up to 1.54x compared to just six months ago and between 33x and 49x compared to the first round in 2018.To read this article in full, please click here
MLCommons, a group that develops benchmarks for AI technology training algorithms, revealed the results for a new test that determines system speeds for training algorithms specifically used for the creation of chatbots like ChatGPT.MLPerf 3.0 is meant to provide an industry-standard set of benchmarks for evaluating ML model training. Model training can be a rather lengthy process, taking weeks and even months depending on the size of a data set. That requires an awful lot of power consumption, so training can get expensive.The MLPerf Training benchmark suite is a full series of tests that stress machine-learning models, software, and hardware for a broad range of applications. It found performance gains of up to 1.54x compared to just six months ago and between 33x and 49x compared to the first round in 2018.To read this article in full, please click here
Cisco is continuing its summer buying spree with the acquisition of security startup Oort for an undisclosed amount.Oort offers an identity threat detection and response platform for enterprise security. Founded in 2019, Oort raised $15 million in Series A funding that included money from Cisco’s venture capital arm.“With Oort’s API-driven, cloud-native, and agentless platform, they eliminate identity visibility gaps across disparate data sources, show misconfigurations, check for security vulnerabilities, and offer predictive identity analytics to proactively stop attacks,” wrote Raj Chopra, senior vice president and chief product officer for Cisco Security, in a blog about the acquisition. To read this article in full, please click here
Cisco is continuing its summer buying spree with the acquisition of security startup Oort for an undisclosed amount.Oort offers an identity threat detection and response platform for enterprise security. Founded in 2019, Oort raised $15 million in Series A funding that included money from Cisco’s venture capital arm.“With Oort’s API-driven, cloud-native, and agentless platform, they eliminate identity visibility gaps across disparate data sources, show misconfigurations, check for security vulnerabilities, and offer predictive identity analytics to proactively stop attacks,” wrote Raj Chopra, senior vice president and chief product officer for Cisco Security, in a blog about the acquisition. To read this article in full, please click here
On the week of July 10, 2023, we launched a new capability for Zone Versioning - Version Comparisons. With Version Comparisons, you can quickly get a side by side glance of what changes were made between two versions. This makes it easier to evaluate that a new version of your zone’s configuration is correct before deploying to production.
A quick recap about Zone Versioning
Zone Versioning was launched at the start of 2023 to all Cloudflare Enterprise customers and allows you to create and manage independent versions of your zone configuration. This enables you to safely configure a set of configuration changes and progressively roll out those changes together to predefined environments of traffic. Having the ability to carefully test changes in a test or staging environment before deploying them to production, can help catch configuration issues before they can have a large impact on your zone’s traffic. See the general availability announcement blog for a deeper dive on the overall capability.
Why we built Version Comparisons
Diff is a well known and often used tool by many software developers to quickly understand the difference between two files. While originally just a command line utility it is now ubiquitous across Continue reading
Our customers use Ansible Automation Platform across a multitude of platforms, in a plethora of ways. Providing an accurate accounting and reporting capability is sometimes difficult across the various types of use cases we encounter.
If you have traditionally used the platform with infrequently changing or more static types of managed hosts, you’re probably pretty much covered. If however, you administer a more diverse and dynamic set of hosts, there may be occasions where you require more flexibility, when accounting for managed hosts against your purchased subscription.
That’s why in Ansible Automation Platform 2.4, we’ve introduced a new Host Metrics dashboard tab with the ability to:
View high level automation run details per managed host
The first and last time automated (this metric already existed)
The number of times automation has been run or attempted to be run against a host (new in 2.4)
The number of times a managed host has been deleted (new in 2.4)
The ability to view the number of times automation has been run on hosts is a simple but really useful metric: