Helios: hyperscale indexing for the cloud & edge, Potharaju et al., PVLDB’20
Last time out we looked at the motivations for a new reference blueprint for large-scale data processing, as embodied by Helios. Today we’re going to dive into the details of Helios itself. As a reminder:
Helios is a distributed, highly-scalable system used at Microsoft for flexible ingestion, indexing, and aggregation of large streams of real-time data that is designed to plug into relationals engines. The system collects close to a quadrillion events indexing approximately 16 trillion search keys per day from hundreds of thousands of machines across tens of data centres around the world.
As an ingestion and indexing system, Helios separates ingestion and indexing and introduces a novel bottoms-up index construction algorithm. It exposes tables and secondary indices for use by relational query engines through standard access path selection mechanisms during query optimisation. As a reference blueprint, Helios’ main feature is the ability to move computation to the edge.
Helios is designed to ingest, index, and aggregate large streams of real-time data (tens of petabytes a day). For example, the log data generated by Azure Cosmos. It supports key use cases such as finding Continue reading
This short article documents how I run Isso, the commenting system used by this blog, inside a Docker container on NixOS, a Linux distribution built on top of Nix. Nix is a declarative package manager for Linux and other Unix systems.
While NixOS 20.09 includes a derivation for Isso, it is unfortunately broken and relies on Python 2. As I am also using a fork of Isso, I have built my own derivation, heavily inspired by the one in master:1
issoPackage = with pkgs.python3Packages; buildPythonPackage rec { pname = "isso"; version = "custom"; src = pkgs.fetchFromGitHub { # Use my fork owner = "vincentbernat"; repo = pname; rev = "vbe/master"; sha256 = "0vkkvjcvcjcdzdj73qig32hqgjly8n3ln2djzmhshc04i6g9z07j"; }; propagatedBuildInputs = [ itsdangerous jinja2 misaka html5lib werkzeug bleach flask-caching ]; buildInputs = [ cffi ]; checkInputs = [ nose ]; checkPhase = '' ${python.interpreter} setup.py nosetests ''; };
I want to run Isso through Gunicorn. To this effect, I build an
environment combining Isso and Gunicorn. Then, I can invoke the latter
with "${issoEnv}/bin/gunicorn"
.
issoEnv = pkgs.python3.buildEnv.override { extraLibs = [ issoPackage pkgs.python3Packages. Continue reading
As some of you know – Im a big believer that we all learn differently. You may read something the first time and immediately grasp the topic whereas I may read it and miss the point entirely. For me, decorators have been one of those things that I felt like I was always close to understanding but still not quite getting it. Sure – some of the examples I read made sense but then I’d find another one that didn’t. In my quest to understand them, I spent a lot of time reviewing a lot of examples and asking a lot of very patient friends for help. At this point, I feel like I know enough to try and explain the topic in a manner that might hopefully help someone else who was having a hard time with the concept. With my learning philosophy out of the way, let’s jump right in….
I want to jump right into a real (albeit not super useful) example of decorators using the full decorator (or shorthand) syntax. Let’s start with this…
def a_decorator(a_function): print("You've been decorated!") return a_function @a_decorator def print_name_string(your_name): name_string = "Your name is: " + your_name return name_string print(print_your_name("Jon"))
Check out our twenty-first edition of The Serverlist below. Get the latest scoop on the serverless space, get your hands dirty with new developer tutorials, engage in conversations with other serverless developers, and find upcoming meetups and conferences to attend.
Sign up below to have The Serverlist sent directly to your mailbox.
Continuing our commitment to helping organizations around the world deliver a public cloud experience in the data center through VMware’s Virtual Cloud Network, we’re excited to announce the general availability of VMware NSX-TTM 3.1. This latest release of our full stack Layer 2 – 7 networking and security platform delivers capabilities that allow you to build modern networks at cloud scale while simplifying operations and strengthening security for east-west traffic inside the data center.
As we continue to adapt to new realities, organizations need to build modern networks that can deliver any application, to any user, anywhere at any time, over any infrastructure — all while ensuring performance and connectivity objectives are met. And they need to do this at public cloud scale. NSX-T 3.1 gives organizations a way to simplify modern networks and replace legacy appliances that congest data center traffic. The Virtual Cloud Network powered by NSX-T enables you to achieve a stronger security posture and run virtual and containerized workloads anywhere.
On August 13th, we announced the implementation of rate limiting for Docker container pulls for some users. Beginning November 2, Docker will begin phasing in limits of Docker container pull requests for anonymous and free authenticated users. The limits will be gradually reduced over a number of weeks until the final levels (where anonymous users are limited to 100 container pulls per six hours and free users limited to 200 container pulls per six hours) are reached. All paid Docker accounts (Pro, Team or Legacy subscribers) are exempt from rate limiting.
The rationale behind the phased implementation periods is to allow our anonymous and free tier users and integrators to see the places where anonymous CI/CD processes are pulling container images. This will allow Docker users to address the limitations in one of two ways: upgrade to an unlimited Docker Pro or Docker Team subscription, or adjust application pipelines to accommodate the container image request limits. After a lot of thought and discussion, we’ve decided on this gradual, phased increase over the upcoming weeks instead of an abrupt implementation of the policy. An up-do-date status update on rate limitations is available at https://www.docker.com/increase-rate-limits.
Docker users Continue reading
When it comes to data-intensive supercomputing, few centers have the challenges Pawsey Supercomputing Centre faces. …
Pawsey Finds I/O Sweet Spots for Data-Intensive Supercomputing was written by Nicole Hemsoth at The Next Platform.
Continuing with our move towards consumption-based limits, customers will see the new rate limits for Docker pulls of container images at each tier of Docker subscriptions starting from November 2, 2020.
Anonymous free users will be limited to 100 pulls per six hours, and authenticated free users will be limited to 200 pulls per six hours. Docker Pro and Team subscribers can pull container images from Docker Hub without restriction as long as the quantities are not excessive or abusive.
In this article, we’ll take a look at determining where you currently fall within the rate limiting policy using some command line tools.
Requests to Docker Hub now include rate limit information in the response headers for requests that count towards the limit. These are named as follows:
The RateLimit-Limit
header contains the total number of pulls that can be performed within a six hour window. The RateLimit-Remaining
header contains the number of pulls remaining for the six hour rolling window.
Let’s take a look at these headers using the terminal. But before we can make a request to Docker Hub, we need to obtain a bearer token. We will then Continue reading
Today’s Heavy Networking show dives into Digital Experience Monitoring (DEM) with sponsor Catchpoint. Catchpoint combines synthetic testing with end user device monitoring to provide greater visibility into the end user experience while helping network engineers and IT admins support and troubleshoot a distributed workforce. Our guests from Catchpoint are Nik Koutsoukos, CMO; and Tony Ferelli, VP Operations.
The post Heavy Networking 547: Building And Monitoring A User-Centric Digital Experience With Catchpoint (Sponsored) appeared first on Packet Pushers.
If hardware doesn’t scale well, either up into a more capacious shared memory system or out across a network of modestly powered nodes, there isn’t much you can do about it. …
Where Latency Is Key And Throughput Is Of Value was written by Timothy Prickett Morgan at The Next Platform.