Mergeable replicated data types Kaki et al., OOPSLA’19
This paper was published at OOPSLA, but perhaps it’s amongst the distributed systems community that I expect there to be the greatest interest. Mergeable Replicated Data Types (MRDTs) are in the same spirit as CRDTs but with the very interesting property that they compose. Furthermore, a principled approach for deriving MRDTs is provided, allowing a wide range of data-types to be lifted to their mergeable, replicated counterparts with very little effort on the part of the programmer. The paper sets the discussion in the context of geo-replicated distributed systems, but of course the same mechanisms could be equally useful in the context of local-first applications.
There’s a lot of material in this paper’s 29 pages, and to make sure I have the space to properly understand the ideas (the process of summary writing is a critical part of that for me), I’m going to spread my coverage over two posts.
We’ll begin by looking at the context and the high-level idea behind MRDTs, then we’ll investigate how merge specifications can be derived and composed, and how merge implementations can be derived too. Finally we’ll touch on the implementation of MRDTs in Continue reading
PlanAlyzer: assessing threats to the validity of online experiments Tosch et al., OOPSLA’19
It’s easy to make experimental design mistakes that invalidate your online controlled experiments. At an organisation like Facebook (who kindly supplied the corpus of experiments used in this study), the state of art is to have a pool of experts carefully review all experiments. PlanAlyzer acts a bit like a linter for online experiment designs, where those designs are specified in the PlanOut language.
We present the first approach for statically checking the internal validity of online experiments. Our checks are based on well-known problems that arise in experimental design and causal inference… PlanAlyzer checks PlanOut programs for a variety of threats to internal validity, including failures of randomization, treatment assignment, and causal sufficiency.
As well as pointing out any bugs in the experiment design, PlanAlyzer will also output a set of contrasts — comparisons that you can safely make given the design of the experiment. Hopefully the comparison you wanted to make when you set up the experiment is in that set!
PlanOut is a open source framework for online field experiments, developed by and extensively used at Facebook. To quote Continue reading
Local-first software: you own your data, in spite of the cloud Kleppmann et al., Onward! ’19
Watch out! If you start reading this paper you could be lost for hours following all the interesting links and ideas, and end up even more dissatisfied than you already are with the state of software today. You might also be inspired to help work towards a better future. I’m all in :).
On the one-hand we have ‘cloud apps’ which make it easy to access our work from multiple devices and to collaborate online with others (e.g. Google Docs, Trello, …). On the other hand we have good old-fashioned native apps that you install on your operating system (a dying breed? See e.g. Brendan Burns’ recent tweet). Somewhere in the middle, but not-quite perfect, are online (browser-based) apps with offline support.
The primary issue with cloud apps (the SaaS model) is ownership of the data.
Unfortunately, cloud apps are problematic in this regard. Although they let you access your data anywhere, all data access must go via the server, and you can only do the things that the server will let you do. Continue reading
Formal foundations of serverless computing Jangda et al., OOPSLA’19
Jangda et al. won a distinguished paper award at OOPSLA this year for their work on ‘Formal foundations of serverless computing.’ Central to the paper is their observation that the serverless execution environment has a number of unique properties (such as warm starting / reuse of execution environments) that can make it harder to build correct applications. They show the conditions under which a serverless function can safely ignore these peculiarities, and thus become much simpler to reason about. They also introduce a composition language based on arrows for function composition.
The serverless computing abstraction, despite its many advantages, exposes several low-level operational details that make it hard for programmers to write and reason about their code. For example, to reduce latency, serverless platforms try to reuse the same function instance to process multiple requests. However, this behaviour is not transparent and it is easy to write a serverless function that produces incorrect results or leaks confidential data when reused. A related problem is that serverless platforms abruptly terminate function instances when they are idle…
These first problems are related to Continue reading
Taiji: managing global user traffic for large-scale internet services at the edge Xu et al., SOSP’19
It’s another networking paper to close out the week (and our coverage of SOSP’19), but whereas [Snap][Snap] looked at traffic routing within the datacenter, Taiji is concerned with routing traffic from the edge to a datacenter. It’s been in production deployment at Facebook for the past four years.
When a user makes a request to http://www.facebook.com
, DNS will route the request to one of dozens of globally deployed edge nodes. Within the edge node, a load balancer (the Edge LB) is responsible for routing requests through to frontend machines in datacenters. The question Taiji addresses is a simple one on the surface: what datacenter should a given request be routed to?
There’s one thing that Taiji doesn’t have to worry about: backbone capacity between the edge nodes and datacenters— this is provisioned in abundance such that it is not a consideration in balancing decisions. However, there are plenty of other things going on that make the decision challenging:
Scaling symbolic evaluation for automated verification of systems code with Serval Nelson et al., SOSP’19
Serval is a framework for developing automated verifiers of systems software. It makes an interesting juxtaposition to the approach Google took with Snap that we looked at last time out. I’m sure that Google engineers do indeed take extreme care when building the low level networking code that powers Google’s datacenters, but their fundamental design point was to enable frequent releases for fast iteration, feedback on their designs, and yes, early detection of problems.
Formal verification is at the other end of the spectrum. In theory it enables you to eliminate whole classes of problems and vulnerabilities entirely (in practice perfection is still hard to come by), and so it can be especially valuable in security sensitive situations. But it comes with a very high price tag:
Writing proofs requires a time investment that is usually measured in person-years, and the size of the proofs can be several times or even more that an order of magnitude larger than that of implementation code.
That’s both very expensive and an incredibly long wait for feedback. To invest in formally modelling something, you really Continue reading
Snap: a microkernel approach to host networking Marty et al., SOSP’19
This paper describes the networking stack, Snap, that has been running in production at Google for the last three years+. It’s been clear for a while that software designed explicitly for the data center environment will increasingly want/need to make different design trade-offs to e.g. general-purpose systems software that you might install on your own machines. But wow, I didn’t think we’d be at the point yet where we’d be abandoning TCP/IP! You need a lot of software engineers and the willingness to rewrite a lot of software to entertain that idea. Enter Google!
I’m jumping ahead a bit here, but the component of Snap which provides the transport and communications stack is called Pony Express. Here are the bombshell paragraphs:
Our datacenter applications seek ever more CPU-efficient and lower-latency communication, which Pony Express delivers. It implements reliability, congestion control, optional ordering, flow control, and execution of remote data access operations. Rather than reimplement TCP/IP or refactor an existing transport, we started Pony Express from scratch to innovate on more efficient interfaces, architecture, and protocol. (Emphasis mine).
and later on “we are seeking to grow Continue reading
The inflection point hypothesis: a principled debugging approach for locating the root cause of a failure Zhang et al., SOSP’19
It’s been a while since we looked a debugging and troubleshooting on The Morning Paper (here’s a sample of earlier posts on the topic). Today’s paper introduces a root cause of failure detector for those hard-to-pin-down bugs. Whereas most root cause analysis systems start from the failure and work backwards, Kairux starts from the beginning of the program execution and works forwards. It’s based on the simple idea that the root cause is likely to be found where the failing execution deviates from successful executions.
If we model an execution as a totally ordered sequence of instructions, then the root cause can be identified by the first instruction where the failure execution deviates from the non-failure execution that has the longest instruction sequence prefix in common with that of the failure execution.
This point where the failing run deviates from the longest successful instruction sequence prefix is called the inflection point, and the Inflection Point Hypothesis says that the inflection point is likely to be the root cause.
The definition of root cause itself is taken from Continue reading
File systems unfit as distributed storage backends: lessons from 10 years of Ceph evolution Aghayev et al., SOSP’19
Ten years of hard-won lessons packed into just 17 pages (13 if you don’t count the references!) makes this paper extremely good value for your time. It’s also a fabulous example of recognising and challenging implicit assumptions. In this case, the assumption that a distributed storage backend should clearly be layered on top of a local file system. Breaking that assumption allowed Ceph to introduce a new storage backend called BlueStore with much better performance and predictability, and the ability to support the changing storage hardware landscape. In the two years since it’s release, 70% of all Ceph users had switched to running BlueStore in production.
Ceph is a widely-used, open-source distributed file system that followed this convention [of building on top of a local file system] for a decade. Hard lessons that the Ceph team learned using several popular file systems led them to question the fitness of file systems as storage backends. This is not surprising in hindsight.
Sometimes, things that aren’t surprising in hindsight can be the very hardest of things to spot!
An analysis of performance evolution of Linux’s core operations Ren et al., SOSP’19
I was drawn in by the headline results here:
This paper presents an analysis of how Linux’s performance has evolved over the past seven years… To our surprise, the study shows that the performance of many core operations has worsened or fluctuated significantly over the years.
When you get into the details I found it hard to come away with any strongly actionable takeaways though. Perhaps the most interesting lesson/reminder is this: it takes a lot of effort to tune a Linux kernel. For example:
Meanwhile, Linux releases a new kernel every 2-3 months, with between 13,000 and 18,000 commits per release.
Clearly, performance comes at a high cost, and unfortunately, this cost is difficult to get around. Most Linux users cannot afford Continue reading
Optimized risk scores Ustun & Rudin, KDD’17
On Monday we looked at the case for interpretable models, and in Wednesday’s edition of The Morning Paper we looked at CORELS which produces provably optimal rule lists for categorical assessments. Today we’ll be looking at RiskSLIM, which produces risk score models together with a proof of optimality.
A risk score model is a a very simple points-based system designed to be used (and understood by!) humans. Such models are widely used in e.g. medicine and criminal justice. Traditionally they have been built by panels of experts or by combining multiple heuristics . Here’s an example model for the CHADS2 score assessing stroke risk.
Even when you don’t consider interpretability as a goal (which you really should!), “doing the simplest thing which could possibly work” is always a good place to start. The fact that CORELS and RiskSLIM come with an optimality guarantee given the constraints fed to them on model size etc. also means you can make informed decisions about model complexity vs performance trade-offs if a more complex model looks like it may perform better. It’s a refreshing change of mindset to shift from “finding an Continue reading
Learning certifiably optimal rule lists for categorical data Angelino et al., JMLR 2018
Today we’re taking a closer look at CORELS, the Certifiably Optimal RulE ListS algorithm that we encountered in Rudin’s arguments for interpretable models earlier this week. We’ve been able to create rule lists (decision trees) for a long time, e.g. using CART, C4.5, or ID3 so why do we need CORELS?
…despite the apparent accuracy of the rule lists generated by these algorithms, there is no way to determine either if the generated rule list is optimal or how close it is to optimal, where optimality is defined with respect to minimization of a regularized loss function. Optimality is important, because there are societal implications for lack of optimality.
Rudin proposed a public policy that for high-stakes decisions no black-box model should be deployed when there exists a competitive interpretable model. For the class of logic problems addressable by CORELS, CORELS’ guarantees provide a technical foundation for such a policy:
…we would like to find both a transparent model that is optimal within a particular pre-determined class of models and produce a certificate of its optimality, with respect Continue reading
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead Rudin et al., arXiv 2019
With thanks to Glyn Normington for pointing out this paper to me.
It’s pretty clear from the title alone what Cynthia Rudin would like us to do! The paper is a mix of technical and philosophical arguments and comes with two main takeaways for me: firstly, a sharpening of my understanding of the difference between explainability and interpretability, and why the former may be problematic; and secondly some great pointers to techniques for creating truly interpretable models.
There has been a increasing trend in healthcare and criminal justice to leverage machine learning (ML) for high-stakes prediction applications that deeply impact human lives… The lack of transparency and accountability of predictive models can have (and has already had) severe consequences…
A model can be a black box for one of two reasons: (a) the function that the model computes is far too complicated for any human to comprehend, or (b) the model may in actual fact be simple, but its details are proprietary and not available for inspection.
In explainable ML we make predictions using a complicated Continue reading
Task-based effectiveness of basic visualizations Saket et al., IEEE Transactions on Visualization and Computer Graphics 2019
So far this week we’ve seen how to create all sorts of fantastic interactive visualisations, and taken a look at what data analysts actually do when they do ‘exploratory data analysis.’ To round off the week today’s choice is a recent paper on an age-old topic: what visualisation should I use?
No prizes for guessing “it depends!”
…the effectiveness of a visualization depends on several factors including task at the hand, and data attributes and datasets visualized.
Is this the paper to finally settle the age-old debate surrounding pie-charts??
Saket et al. look at five of the most basic visualisations —bar charts, line charts, pie charts, scatterplots, and tables— and study their effectiveness when presenting modest amounts of data (less than 50 visual marks) across 10 different tasks. The task taxonomy comes from the work of Amar et al., describing a set of ten low-level analysis tasks that describe users’ activities while using visualization tools.
Futzing and moseying: interviews with professional data analysts on exploration practices Alspaugh et al., VAST’18
What do people actually do when they do ‘exploratory data analysis’ (EDA)? This 2018 paper reports on the findings from interviews with 30 professional data analysts to see what they get up to in practice. The only caveat to the results is that the interviews were conducted in 2015, and this is a fast-moving space. The essence of what and why is probably still the same, but the tools involved have evolved.
Exploration here is defined as “open-ended information analysis,” which doesn’t require a precisely stated goal. It comes after data ingestion, wrangling and profiling (i.e., when you have the data in a good enough state to ask question of it). The authors place it within the overall analysis process like this:
That looks a lot more waterfall-like than my experience of reality though. I’d expect to see lots of iterations between explore and model, and possibly report as well.
The guidance given to survey participants when asking about EDA is as follows:
EDA is an approach to analyzing data, usually undertaken at the beginning of an analysis, to Continue reading
Vega-lite: a grammar of interactive graphics Satyanarayan et al., IEEE transactions on visualization and computer graphics, 2016
From time to time I receive a request for more HCI (human-computer interaction) related papers in The Morning Paper. If you’ve been a follower of The Morning Paper for any time at all you can probably tell that I naturally gravitate more towards the feeds-and-speeds end of the spectrum than user experience and interaction design. But what good is a super-fast system that nobody uses! With the help of Aditya Parameswaran, who recently received a VLDB Early Career Research Contribution Award for his work on the development of tools for large-scale data exploration targeting non-programmers, I’ve chosen a selection of data exploration, visualisation and interaction papers for this week. Thank you Aditya! Fingers crossed I’ll be able to bring you more from Aditya Parameswaran in future editions.
Vega and Vega-lite follow in a long line of work that can trace its roots back to Wilkinson’s ‘The Grammar of Graphics.’ It’s all the way back to 2015 since we last looked at the Vega-family on The Morning Paper (see ‘Declarative interaction design for data visualization’ and ‘[Reactive Continue reading
HackPPL: a universal probabilistic programming language Ai et al., MAPL’19
The Hack programming language, as the authors proudly tell us, is “a dominant web development language across large technology firms with over 100 million lines of production code.” Nail that niche! Does your market get any smaller if we also require those firms to have names starting with ‘F’ ? ;)
In all serious though, Hack powers large chunks of Facebook, running on top of the HipHopVM (HHVM). It started out as an optionally-typed dialect of PHP, but in recent times the Hack development team decided to discontinue PHP support ‘in order open up opportunities for sweeping language advancements’. One of those sweeping advancements is the introduction of support for probabilistic reasoning. This goes under the unfortunate sounding name of ‘HackPPL’. I’m not sure how it should officially be pronounced, but I can tell you that PPL stands for “probabilistic programming language”, not “people”!
This paper is interesting on a couple of levels. Firstly, the mere fact that probabilistic reasoning is becoming prevalent enough for Facebook to want to integrate it into the language, and secondly of course what that actually looks like.
Probabilistic reasoning has Continue reading
“I was told to buy a software or lose my computer. I ignored it”: a study of ransomware Simoiu et al., SOUPS 2019
This is a very easy to digest paper shedding light on the prevalence of ransomware and the characteristics of those most likely to be vulnerable to it. The data comes from a survey of 1,180 US adults conducted by YouGov, an online global market research firm. YouGov works hard to ensure respondent participation representative of (in this case) the general population in the U.S., but the normal caveats apply.
We define ransomware as the class of malware that attempts to defraud users by restricting access to the user’s computer or data, typically by locking the computer or encrypting data. There are thousands of different ransomware strains in existence today, varying in design and sophistication.
The survey takes just under 10 minutes to complete, and goes to some lengths to ensure that self-reporting victims really were victims of ransomware (and not some other computer problem).
For respondents that indicated they had suffered from a ransomware attack, data was collected on month and year, the name of the ransomware variant, the ransom demanded, the payment method, Continue reading
Invisible mask: practical attacks on face recognition with infrared Zhou et al., arXiv’18
You might have seen selected write-ups from The Morning Paper appearing in ACM Queue. The editorial board there are also kind enough to send me paper recommendations when they come across something that sparks their interest. So this week things are going to get a little bit circular as we’ll be looking at three papers originally highlighted to me by the ACM Queue board!
‘Invisible Mask’ looks at the very topical subject of face authentication systems. We’ve looked at adversarial attacks on machine learning systems before, including those that can be deployed in the wild, such as decorating stop signs. Most adversarial attacks against image recognition systems require you to have pixel-level control over the input image though. Invisible Mask is different, it’s a practical attack in that the techniques described in this paper could be used to subvert face authentication systems deployed in the wild, without there being any obvious visual difference (e.g. specially printed glass frames) in the face of the attacker to a casual observer. That’s the invisible part: to the face recognition system it’s as if you are wearing a mask, Continue reading
Learning a unified embedding for visual search at Pinterest Zhai et al., KDD’19
Last time out we looked at some great lessons from Airbnb as they introduced deep learning into their search system. Today’s paper choice highlights an organisation that has been deploying multiple deep learning models in search (visual search) for a while: Pinterest.
With over 600 million visual searches per month and growing, visual search is one of the fastest growing products at Pinterest and of increasing importance.
Visual search is pretty fundamental to the Pinterest experience. The paper focuses on three search-based products: Flashlight, Lens, and Shop-the-Look.
In Flashlight search the search query is a source image either from Pinterest or the web, and the search results are relevant pins. In Lens the search query is a photograph taken by the user with their camera, and the search results are relevant pins. In Shop-the-Look the search query is a source image from Pinterest or the web, and the results are products which match items in the image.
Models are like microservices in one sense: it seems that they have a tendency to proliferate within organisations once they start to take hold! (Aside, I wonder if there’s Continue reading