0
Debugging distributed systems with why-across-time provenance Whittaker et al., SoCC’18
This value is 17 here, and it shouldn’t be. Why did the get request return 17?
Sometimes the simplest questions can be the hardest to answer. As the opening sentence of this paper states:
Debugging distributed systems is hard.
The kind of why questions we’re interested in for this paper are questions of provenance. What are the causes of this output? Provenance has been studied in the context of relational databases and dataflow systems, but here we’re interested in general distributed systems. (Strictly, those where the behaviour of each node can be modelled by a deterministic state machine: non-deterministic behaviour is left to future work).
Why why-provenance doesn’t work
Relational databases have why-provenance, which sounds on the surface exactly like what we’re looking for.
Given a relational database, a query issued against the database, and a tuple in the output of the query, why-provenance explains why the output tuple was produced. That is, why -provenance produces the input tuples that, if passed through the relational operators of the query, would produce the output tuple in question.
One reason that won’t work in our distributed systems setting is that Continue reading