adriancolyer

Author Archives: adriancolyer

The measure and mismeasure of fairness: a critical review of fair machine learning

The measure and mismeasure of fairness: a critical review of fair machine learning, Corbett-Davies & Goel, arXiv 2018

With many thanks to Ben Fried and the ACM Queue editorial board for the paper recommendation.

We’ve visited the topic of fairness in the context of machine learning several times on The Morning Paper (see e.g. [1]1, [2]2, [3]3, [4]4). I’m still picking up new insights every time I revisit the topic though, and today’s paper choice is no exception.

In 1911 Russell & Whitehead published Principia Mathematica, with the goal of providing a solid foundation for all of mathematics. In 1931 Gödel’s Incompleteness Theorem shattered the dream, showing that for any consistent axiomatic system there will always be theorems that cannot be proven within the system. In case you’re wondering where on earth I’m going with this… it’s a very stretched analogy I’ve been playing with in my mind. One premise of many models of fairness in machine learning is that you can measure (‘prove’) fairness of a machine learning model from within the system – i.e. from properties of the Continue reading

Seamless offloading of web app computations from mobile device to edge clouds via HTML5 Web Worker migration

Seamless offloading of web app computations from mobile device to edge clouds via HTML5 web worker migration, Jeong et al., SoCC’19 [^1]

This paper caught my eye for its combination of an intriguing idea (opportunistic offload of computation from mobile devices to the edge) and the elegance of the way the web worker interface supports this use case. It’s live migration – but for web workers instead of the more usual VMs or containers.

Why would we want to live migrate web workers?

Emerging mobile applications, such as mobile cloud gaming or augmented reality, require strict latency constraints as well as high computer power… A survey on the latency of games has reported that less than ~50ms of network latency is preferred for time-critical games, which is hard to achieve with a traditional cloud system where computing servers are located in datacenters far from clients…

So you’ve got mobile devices without the computing power needed to deliver a great experience, and cloud computing that has all the needed power that’s too far away. Edge servers are the middle ground – more compute power than a mobile device, but with latency of just a few ms. The kind of Continue reading

Narrowing the gap between serverless and its state with storage functions

Narrowing the gap between serverless and its state with storage functions, Zhang et al., SoCC’19

"Narrowing the gap" was runner-up in the SoCC’19 best paper awards. While being motivated by serverless use cases, there’s nothing especially serverless about the key-value store, Shredder, this paper reports on. Shredder’s novelty lies in a new implementation of an old idea. On the mainframe we used to call it function shipping. In databases you probably know it as stored procedures. The advantages of function shipping (as opposed to the data shipping we would normally do in a serverless application) are that (a) you can avoid moving potentially large amounts of data over the network in order to process it, and (b) you might be able to collapse multiple remote calls into one if the function traverses structures that otherwise could not be fetched in a single call.

Shredder is "a low-latency multi-tenant cloud store that allows small units of computation to be performed directly within storage nodes."

Running end-user compute inside the datastore is not without its challenges of course. From an operator perspective it makes it harder to follow the classic cloud-native design in which a global storage Continue reading

Reverb: speculative debugging for web applications

Reverb: speculative debugging for web applications, Netravali & Mickens, SOCC’19

This week we’ll be looking at a selection of papers from the 2019 edition of the ACM Symposium of Cloud Computing (SoCC). First up is Reverb, which won a best paper award for its record and replay debugging framework that accommodates speculative edits (i.e., candidate bug-fixes) during replay. In the context of the papers we’ve been looking at recently, and for a constrained environment, Reverb is helping its users to form an accurate mental model of the system state, and to form and evaluate hypotheses in-situ.

Reverb has three features which enable a fundamentally more powerful debugging experience. First, Reverb tracks precise value provenance, allowing a developer to quickly identify the reads and writes to JavaScript state that affected a particular variable’s value. Second, Reverb enables speculative bug fix analysis… Third, Reverb supports wide-area debugging for applications whose server-side components use event-driven architectures.

The problem

Reverb’s goal is to aid in debugging the client-side of JavaScript web applications. These are "pervasively asynchronous and event-driven" which makes it notoriously difficult to figure out what’s going on. See e.g. "Debugging data flows Continue reading

Trade-offs under pressure: heuristics and observations of teams resolving internet service outages (Part II)

Trade-offs under pressure: heuristics and observations of teams resolving internet service outages, Allspaw, Masters thesis, Lund University 2015

This is part 2 of our look at Allspaw’s 2015 master thesis (here’s part 1). Today we’ll be digging into the analysis of an incident that took place at Etsy on December 4th, 2014.

  • 1:00pm Eastern Standard Time the Personalisation / Homepage Team for Etsy are in a conference room kicking off a lunch-and-learn session on the personalised feed feature on the Etsy.com homepage
  • 1:06pm reports of the personalised homepage having issues start appearing from multiple sources. Instead of the personalised feed, the site has fallen back to serving a generic ‘trending items’ feed. This is a big deal during the important holiday shopping season. Members of the team begin diagnosing the issue using the #sysops and #warroom internal IRC channels.
  • 1:18pm a key observation was made that an API call to populate the homepage sidebar saw a huge jump in latency
  • 1:28pm an engineer reported that the profile of errors for a specific API method matched the pattern of sidebar errors
  • 1:32pm the API errors were narrowed down to requests for data on a specific single shop. The Continue reading

Trade-offs under pressure: heuristics and observations of teams resolving internet service outages (Part 1)

Trade-offs under pressure: heuristics and observations of teams resolving internet service outages, Allspaw, Masters thesis, Lund University, 2015

Following on from the STELLA report, today we’re going back to the first major work to study the human and organisational side of incident management in business-critical Internet services: John Allspaw’s 2015 Masters thesis. The document runs to 87 pages, so I’m going to cover the material across two posts. Today we’ll be looking at the background and literature review sections, which place the activity in a rich context and provide many jumping off points for going deeper in areas of interest to you. In the next post we’ll look at the detailed analysis of how a team at Etsy handled a particular incident on December 4th 2014, to see what we can learn from it.

Why is this even a thing?

Perhaps it seems obvious that incident management is hard. But it’s worth recaping some of the reasons why this is the case, and what makes it an area worthy of study.

The operating environment of Internet services contains many of the ingredients necessary for ambiguity and high consequences for mistakes in the diagnosis and response of an adverse Continue reading

STELLA: report from the SNAFU-catchers workshop on coping with complexity

STELLA: report from the SNAFU-catchers workshop on coping with complexity, Woods 2017, Coping with Complexity workshop

“Coping with complexity” is about as good a three-word summary of the systems and software challenges facing us over the next decade as I can imagine. Today’s choice is a report from a 2017 workshop convened with that title, and recommended to me by John Allspaw – thank you John!

Workshop context

The workshop brought together about 20 experts from a variety of different companies to share and analyse the details of operational incidents (and their postmortems) that had taken place at their respective organisations. Six themes emerged from those discussions that sit at the intersection of resilience engineering and IT. These are all very much concerned with the interactions between humans and complex software systems, along the lines we examined in Ironies of Automation and Ten challenges for making automation a ‘team player’ in joint human-agent activity.

There’s a great quote on the very front page of the report that is worth the price of admission on its own:

Woods’ Theorem: As the complexity of a system increases, the accuracy of any single agent’s own model of that system decreases rapidly.

Remember Continue reading

Synthesizing data structure transformations from input-output examples

Synthesizing data structure transformations from input-output examples, Feser et al., PLDI’15

The Programmatically Interpretable Reinforcement Learning paper that we looked at last time out contained this passing comment coupled with a link to today’s paper choice:

It is known from prior work that such [functional] languages offer natural advantages in program synthesis.

That certainly caught my interest. The context for the quote is synthesis of programs by machines, but when I’m programming, I’m also engaged in the activity of program synthesis! So a work that shows functional languages have an advantage for programmatic synthesis might also contain the basis for an argument for natural advantages to the functional style of programming. I didn’t find that here. We can however say that this paper shows “functional languages are well suited to program synthesis.”

Never mind, because the ideas in the paper are still very connected to a question I’m fascinated by at the moment: “how will we be developing software systems over this coming decade?”. There are some major themes to be grappled with: system complexity, the consequences of increasing automation and societal integration, privacy, ethics, security, trust (especially in supply chains), interpretability vs black box models, Continue reading

Programmatically interpretable reinforcement learning

Programmatically interpretable reinforcement learning, Verma et al., ICML 2018

Being able to trust (interpret, verify) a controller learned through reinforcement learning (RL) is one of the key challenges for real-world deployments of RL that we looked at earlier this week. It’s also an essential requirement for agents in human-machine collaborations (i.e, all deployments at some level) as we saw last week. Since reading some of Cynthia Rudin’s work last year I’ve been fascinated with the notion of interpretable models. I believe there are a large set of use cases where an interpretable model should be the default choice. There are so many deployment benefits, even putting aside any ethical or safety concerns.

So how do you make an interpretable model? Today’s paper choice is the third paper we’ve looked at along these lines (following CORELS and RiskSlim), enough for a recognisable pattern to start to emerge. The first step is to define a language — grammar and associated semantics — in which the ultimate model to be deployed will be expressed. For CORRELS this consists of simple rule based expressions, and for RiskSlim it is scoring sheets. For Programmatically Interpretable Reinforcement Learning (PIRL) as we shall Continue reading

Challenges of real-world reinforcement learning

Challenges of real-world reinforcement learning, Dulac-Arnold et al., ICML’19

Last week we looked at some of the challenges inherent in automation and in building systems where humans and software agents collaborate. When we start talking about agents, policies, and modelling the environment, my thoughts naturally turn to reinforcement learning (RL). Today’s paper choice sets out some of the current (additional) challenges we face getting reinforcement learning to work well in many real-world systems.

We consider control systems grounded in the physical world, optimization of software systems, and systems that interact with users such as recommender systems and smart phones. … RL methods have been shown to be effective on a large set of simulated environments, but uptake in real-world problems has been much slower.

Why is this? The authors posit that there’s a meaningful gap between the tightly-controlled and amenable to simulation research settings where many RL systems do well, and the messy realities and constraints of real-world systems. For example, there may be no good simulator available, exploration may be curtailed by strong safety constraints, and feedback cycles for learning may be slow.

This lack of available simulators means learning must be done using Continue reading

Ten challenges for making automation a ‘team player’ in joint human-agent activity

Ten challenges for making automation a ‘team player’ in joint human-agent activity, Klein et al., IEEE Computer Nov/Dec 2004

With thanks to Thomas Depierre for the paper suggestion.

Last time out we looked at some of the difficulties inherit in automating control systems. However much we automate, we’re always ultimately dealing with some kind of human/machine collaboraton. Today’s choice looks at what it takes for machines to participate productively in collaborations with humans. Written in 2004, the ideas remind me very much of Mark Burgess’ promise theory, which was also initially developed around the same time.

Let’s work together

If a group of people (or people and machines) are going to coordinate with each other to achieve a set of shared ends then there are four basic requirements that must be met to underpin their joint activity:

  1. They must agree to work together (the authors call this agreement a Basic Compact).
  2. They must be mutually predictable in their actions
  3. They must be mutually directable.
  4. They must maintain common ground.

A basic compact is…

… an agreement (often tacit) to facilitate coordination, work toward shared goals, and prevent breakdowns in team coordination. This Compact involves a commitment Continue reading

Ironies of automation

Ironies of automation, Bainbridge, Automatica, Vol. 19, No. 6, 1983

With thanks to Thomas Depierre for the paper recommendation.

Making predictions is a dangerous game, but as we look forward to the next decade a few things seem certain: increasing automation, increasing system complexity, faster processing, more inter-connectivity, and an even greater human and societal dependence on technology. What could possibly go wrong? Automation is supposed to make our lives easier, but ~~if~~ when it goes wrong it can put us in a very tight spot indeed. Today’s paper choice, ‘Ironies of Automation’ explores these issues. Originally published in this form in 1983, its lessons are just as relevant today as they were then.

The central irony (‘combination of circumstances, the result of which is the direct opposite of what might be expected’) referred to in this paper is that the more we automate, and the more sophisticated we make that automation, the more we become dependent on a highly skilled human operator.

Automated systems need highly skilled operators

Why do we automate?

The designer’s view of the human operator may be that the operator is unreliable and inefficient, so should be eliminated from the system.

An automated system Continue reading

The Year Ahead

Welcome to another year of The Morning Paper! Over the holidays I spent
some time mapping out a partial conference calendar for the year, and thinking about the kinds of papers I want to be reading. In a typical year, I’ll cover somewhere north of 120 papers on this blog. That’s a tiny drop in the ocean compared to the amount of research published. And then as well as dipping my toes into the new, I also want to make more space for papers that have stood the test of time. Following the Lindy effect these are the ones most likely to continue giving ten years or more into the future. Where have we come from? Where are we now? And where are we heading? My only firm rule for paper selection is that I must find it interesting. As regular readers of The Morning Paper will know though, my interests are pretty broad and will no doubt take many twists and turns over the course of the year.

Through the course of a year I often have the pleasure of bumping into many readers of The Morning Paper. And very often they tell me apologetically that they don’t always Continue reading

End of term

My children broke up from school this past weekend, which seems as good a reason as any to call this ‘end of term’ for The Morning Paper. I’ll be taking a break until the New Year, topping up my reading lists and getting ready for a whole new crop of papers and discoveries. The Morning Paper will resume on Monday 6th January.

Since term began on the 19th August we’ve looked at 50 different papers, and I had the pleasure of attending VLDB and HPTS in person as well. I learned a ton! I hope you found something you enjoyed in the paper selections as well.

Here’s a small selection of my personal highlights from the term, in case you missed any of them (in the order in which they originally appeared on the blog):

How do committees invent?

How do committees invent?, Conway, Datamation magazine 1968

With thanks to Chris Frost for recommending this paper – another great example of a case where we all know the law (Conway’s law in this case), but many of us have not actually read the original ideas behind it.

We’re back in 1968, a time when it was taken for granted that before building a system, it was necessary to design it. The systems under discussion are not restricted to computer systems either by the way – one of example of a system is "the public transport network." Designs are produced by people, and the set of people working on a design are part of a design organisation.

The definition of design itself is quite interesting:

That kind of intellectual activity which creates a whole from its diverse parts may be called the design of a system.

When I think about design, I more naturally think about it the other way around: how to decompose the whole into a set of parts that will work together to accomplish the system goals. But of course Conway is right that those parts do have to fit together to produce the intended Continue reading

A tale of two abstractions: the case for object space

A tale of two abstractions: the case for object space, Bittman et al., HotStorage 2019.

This is a companion paper to the "persistent problem" piece that we looked at earlier this week, going a little deeper into the object pointer representation choices and the mapping of a virtual object space into physical address spaces.

…software operating on persistent data structures requires "global" pointers that remain valid after a process terminates, while hardware requires that a diverse set of devices all have the same mappings they need for bulk transfers to and from memory, and that they be able to do so for a potentially heterogeneous memory system. Both abstractions must be implemented in a way that is efficient using existing hardware.

Application requirements

In-memory data structures are notable for the rich inter-weaving of pointer references between them. If we take those data structures and make them also be the persistent representation, "then applications need a way to refer to data such that references have the same lifetime as the referenced data." Epheremal virtual addresses don’t cut it as the basis for persistent pointers.

Applications running on BNVM (byte-addressable non-volatile memory) must have a way Continue reading

A persistent problem: managing pointers in NVM

A persistent problem: managing pointers in NVM Bittman et al., PLOS’19

At the start of November I was privileged to attend HPTS (the High Performance Transaction Systems) conference in Asilomar. If you ever get the chance to go, I highly recommend it. It’s a comparatively small gathering with a great mix of people, and fabulous discussions. A big thank you to everyone that I met there for making me feel so welcome.

On the last morning of the conference Daniel Bittman presented some of the work being done in the context of the Twizzler OS project to explore new programming models for NVM. It’s a really bold project (‘let’s rethink the OS from the ground up’) and generated a lot of lively discussion.

(Byte-addressable non-volatile memory,) NVM will fundamentally change the way hardware interacts, the way operating systems are designed, and the way applications operate on data.

The starting point is a set of three asumptions for an NVM-based programming model:

  1. Compared to traditional persistent media, NVM is fast. This means that the overheads of system calls become much more noticeable. Therefore any programming abstraction must be low latency and the kernel needs to be kept off the path Continue reading

Benchmarking spreadsheet systems

Benchmarking spreadsheet systems Rahman et al., Preprint

A recent TwThread drew my attention to this pre-print paper. When spreadsheets were originally conceived, data and formula were input by hand and so everything operated at human scale. Increasingly we’re dealing with larger and larger datasets — for example, data imported via csv files — and spreadsheets are creaking. I’m certainly familiar with the sinking feeling on realising I’ve accidentally asked a spreadsheet to open up a file with 10s of thousands of rows, and that my computer is now going to be locked up for an age. Rahman et al. construct a set of benchmarks to try and understand what might be going on under the covers in Microsoft Excel, Google Sheets, and LibreOffice Calc.

Spreadsheets claim to support pretty large datasets these days – e.g. five million cells for Google Sheets, and even more than that for Excel. But in practice, they struggle at sizes well below this.

With increasing data sizes… spreadsheets have started to break down to the point of being unusable, displaying a number of scalability problems. They often freeze during computation, and are unable to import datasets well below the size limits posed by Continue reading

Declarative assembly of web applications from pre-defined concepts

Declarative assembly of web applications from predefined concepts De Rosso et al., Onward! 2019

I chose this paper to challenge my own thinking. I’m not really a fan of low-code / no-code / just drag-and-drop-from-our-catalogue forms of application development. My fear is that all too often it’s like jumping on a motorbike and tearing off at great speed (rapid initial progress), only to ride around a bend and find a brick wall across the road in front of you. That doesn’t normally end well. I’ve seen enough generations of CASE (remember that acronym?), component-based software development, reusable software catalogues etc. to develop a healthy scepticism: lowest-common denominators, awkward or missing round-tripping behaviour, terrible debugging experiences, catalogues full of junk components, inability to accommodate custom behaviours not foreseen by the framework/component developers, limited reuse opportunities in practice compared to theory, and so on.

The thing is, on one level I know that I’m wrong. To start with, there’s Grady Booch’s observation that “the whole history of computer science is one of ever rising levels of abstraction1. Then there’s the changing demographic of software building. Heather Miller recently gave a great presentation on this topic, ‘The Continue reading

Efficient lock-free durable sets

Efficient lock-free durable sets Zuriel et al., OOPSLA’19

Given non-volatile memory (NVRAM), the naive hope for persistence is that it would be a no-op: what happens in memory, stays in memory. Unfortunately, a very similar set of issues to those concerned with flushing volatile memory to persistent disk exist here too, just at another level. Memory might be durable, but…

…it is expected that caches and registers will remain volatile. Therefore the state of data structures underlying standard algorithms might not be complete in the NVRAM view, and after a crash this view might not be consistent because of missed writes that were in the caches but did not reach the memory. Moreover, for better performance, the processor may change the order in which writes reach the NVRAM, making it difficult for the NVRAM to even reflect a consistent prefix of the computation.

Plus ça change, plus c’est la même chose.

So, we’re going to need to take care that everything we say is committed is truly durable, and that we can recover to a consistent state following a crash. The traditional way to accomplish this is with a write-ahead log. You’ll no doubt be familiar with the phrase Continue reading