The case for network-accelerated query processing

The case for network-accelerated query processing Lerner et al., CIDR’19

Datastores continue to advance on a number of fronts. Some of those that come to mind are adapting to faster networks (e.g. ‘FARM: Fast Remote Memory’) and persistent memory (see e.g. ‘Let’s talk about storage and recovery methods for non-volatile memory database systems’), deeply integrating approximate query processing (e.g. ‘ApproxHadoop: Bringing approximations to MapReduce frameworks’ and ‘BlinkDB’), embedding machine learning in the core of the system (e.g. ‘SageDB’), and offloading processing into the network (e.g KV-Direct) — one particular example of exploiting hardware accelerators. Today’s paper gives us an exciting look at the untapped potential for network-accelerated query processing. We’re going to need all that data structure synthesis and cost-model based exploration coupled with self-learning to unlock the potential that arises from all of these advances in tandem!

NetAccel uses programmable network devices to offload some query patterns for MPP databases into the switch.

Thus, for the first time, moving data through networking equipment can contributed to query execution. Our preliminary results show that we can improve response times on even the best Continue reading

A Discussion On Storage Overhead

Let’s talk about transmission overhead.

For various types of communications protocols, ranging from Ethernet to Fibre Channel to SATA to PCIe, there’s typically additional bits that are transmitted to help with error correction, error detection, and/or clock sync. These additional bits eat up some of the bandwidth, and is referred to generally as just “the overhead”.

For 1 Gigabit Ethernet and 8 Gigabit Fibre Channel as well as SATA I, II, and III, they use 8/10 overhead. Which means for every eight bits of data, an additional two bits are sent.

The difference is who pays for those extra bits. With Ethernet, Ethernet pays. With Fibre Channel and SATA, the user pays.

1 Gigabit Ethernet has a raw transmit rate of 1 gigabit per second. However, the actual transmission rate (baud, the rate at which raw 1s and 0s are transmitted) for Gigabit Ethernet is 1.25 gigabaud. This is to make up for the 8/10 overhead.

SATA and Fibre Channel, however, do not up the baud rate to accommodate for the 8/10 overhead. As such, even though 1,000 Gigabit / 8 bits per byte = 125 MB/s, Gigabit Fibre Channel only provides 100 MB/s. 25 MB/s is eaten up Continue reading

Jupyter Lab Ruby Kernel Install

Jupyter Lab is the next-generation web-based user interface for Jupyter project. I first encountered The Jupyter project back when it was know as IPython notebooks and used it for hacking on python projects. I was pleasantly surprised to learn that the Jupyter project also supports ...

rbenv Install Ubuntu1804

rbenv is a utility for installing multiple ruby versions on a host machine. Using rbenv allows you to install ruby in a path you have ownership over so you can install gems without having to have sudo or root privileges. rbenv also allows you to target the exact ruby version in development...

In India, Days Left to Comment on Rules That Could Impact Your Privacy

The public has until 31 January to comment on a draft set of rules in India that could result in big changes to online security and privacy.

The Indian government published the draft Information Technology [Intermediary Guidelines (Amendment) Rules] 2018, also known as the “Intermediary Rules” for public comment.

When it comes to the Internet, intermediaries are companies that mediate online communication and enable various forms of online expression.

The draft Intermediary Rules would change parts of the Information Technology Act, 2000 (the “IT Act”), which sets out the requirements intermediaries must meet to be shielded from liability for the activities of their users. The draft rules would also expand the requirements for all intermediaries, which are defined by the Indian government and include Internet service providers, cybercafés, online companies, social media platforms, and others. For example, all intermediaries would have to regularly notify users on content they shouldn’t share; make unlawful content traceable; and deploy automated tools to identify and disable unlawful information or content, among other new requirements.

Here’s some more background:

  • News reports are citing a number of concerns about the draft rules. Ours centers on their potential impact on the use of encryption.
  • Encryption is the process Continue reading

This Data Privacy Day Take Steps to Protect Your Data

As champions of an open, globally-secure, and trusted Internet, International Data Privacy Day is a big deal around these parts.

But making sure you’re able to share what you want, when you want, should be something the world stands for more than once a year. Every day should be Data Privacy Day.

These days, it feels all too common to hear stories about policy or law enforcement officials trying to create backdoors into technologies like encryption. These backdoors could put our online security at risk.

Just a little over one month ago, Business Insider reported that smart home devices dominated Christmas 2018 sales on Amazon, while the Alexa app, which enables people to control those smart devices, was the most downloaded on Google Play and the Apple App store on Christmas Day.

As the Internet becomes more and more a part of our everyday lives, each of us can take actions to ensure that privacy and security are a top priority.

Let’s come together on Data Privacy Day to celebrate the possibilities an open, globally connected, trusted, and secure Internet brings. Here are ways you can help make it happen where you live:

(And don’t forget to make a cake! Continue reading

My Privacy Online: Championing Trust in the Era of IoT

Data Privacy Day is a little like celebrating the anniversary of your first date.

They are both a yearly occasion to reflect on the most important relationships in our life, the former with those who know the most about us, the latter with our significant other.

It’s also a reminder that relationships are built on trust – and how fragile that trust can be.

Privacy online relies on trust at its core. But as we become more reliant on connected devices and virtual assistants to handle our most intimate health, banking, and private information, we’re putting our trust into shaky hands.

Honesty is the foundation of trust and it’s just as important in our relationships with loved ones as those with data brokers. It’s crucial for data brokers to be honest with users about who, when, and how people have access to their personal data, especially as we transition into smarter homes and cities.

Let’s face it: there’s a huge market for the information we share online. Both U.S. and Canadian Internet companies are increasingly trying to collect our personal data – whether we know it or not.

It’s clear we want more control over our privacy, but each Continue reading

A Primer for Home NAS Storage Speed Units and Abbreviations

One of the most common mistakes/confusion I see with regard to storage is how speed is measured.

In tech, there’s some cultural conventions to which units speeds are discussed in.

  • In the networking world, we measure bits per second
  • In the storage and server world, we measure speed in bytes per second

Of course they both say the same thing, just in different units. You could measure bytes per second in the networking world and bits per second in the server/storage world, but it’s not the “native” method and could add to confusion.

For NAS, we have a bit of a conundrum in that we’re talking about both worlds. So it’s important to communicate effectively which method you’re using to measure speed: bits of bytes.

Generally speaking, if you want to talk about Bytes, you capitalize the B. If you want to talk about bits, the b is lower case. I.e. 100 MB/s (100 Megabytes per second) and 100 Mbit or Mb (100 Megabit per second).

This is important, because there a 8 bits in a byte, the difference in speed is pretty stark depending on if you’re talking about bits per second or bytes per Continue reading

Moving a Prototype Network to Production

Network Engineers create and operate prototype networks all the time. Prototype networks are used to validate designs, test features or changes, troubleshoot use-case scenarios, and often just for learning. Typically, pre-prod testing environments are set up in such a way that device host names, attributes, configurations, IP assignments, software versions, and topologies are mostly inconsistent with production environments. This inconsistency is counter-intuitive, considering that accurate design validations should closely match reality to avoid any mistakes when deploying in production.

Cumulus Linux can run as a virtual appliance, allowing network engineers to build to-scale virtual networks for activities like modeling changes and performing validations, while opening the door for similar DevOps methodologies application developers have operated with for years: validated testing before deploying in production for continuous integration.

Enter Cumulus VX

Cumulus VX (Virtual Experience) is a Cumulus Linux virtual appliance. You can test drive Cumulus Linux on a laptop, while those fluent with Cumulus Linux can prototype large networks and develop software integrations before deploying into production environments.

Cumulus VX is a platform — just like Cumulus Linux on a real switch — and therefore is designed to perform just like an actual switch running Cumulus Linux. Every feature you Continue reading

What is a firewall? How they work and how they fit into enterprise security

Firewalls been around for three decades, but they’ve evolved drastically to include features that used to be sold as separate appliances and to pull in externally gathered data to make smarter decisions about what network traffic to allow and what traffic to block.Now just one indespensible element in an ecosystem of network defenses, the latest versions are known as enterprise firewalls or next-generation firewalls (NGFW) to indicate who should use them and that they are continually adding functionality.What is a firewall? A firewall is a network device that monitors packets going in and out of networks and blocks or allows them according to rules that have been set up to define what traffic is permissible and what traffic isn’t.To read this article in full, please click here

What is a firewall? How they work and how they fit into enterprise security

Firewalls been around for three decades, but they’ve evolved drastically to include features that used to be sold as separate appliances and to pull in externally gathered data to make smarter decisions about what network traffic to allow and what traffic to block.Now just one indespensible element in an ecosystem of network defenses, the latest versions are known as enterprise firewalls or next-generation firewalls (NGFW) to indicate who should use them and that they are continually adding functionality.What is a firewall? A firewall is a network device that monitors packets going in and out of networks and blocks or allows them according to rules that have been set up to define what traffic is permissible and what traffic isn’t.To read this article in full, please click here

What is a firewall? How they work and how they fit into enterprise security

Firewalls been around for three decades, but they’ve evolved drastically to include features that used to be sold as separate appliances and to pull in externally gathered data to make smarter decisions about what network traffic to allow and what traffic to block.Now just one indespensible element in an ecosystem of network defenses, the latest versions are known as enterprise firewalls or next-generation firewalls (NGFW) to indicate who should use them and that they are continually adding functionality.What is a firewall? A firewall is a network device that monitors packets going in and out of networks and blocks or allows them according to rules that have been set up to define what traffic is permissible and what traffic isn’t.To read this article in full, please click here