Smoke: fine-grained lineage at interactive speed
Smoke: fine-grained lineage at interactive speed Psallidas et al., VLDB’18
Data lineage connects the input and output data items of a computation. Given a set of output records, a backward lineage query selects a subset of the output records and asks “which input records contributed to these results?” A forward lineage query selects a subset of the input records and asks, “which output records depend on these inputs?”. Lineage-enabled systems capture record-level relationships throughout a workflow and support lineage queries.
Data lineage is useful in lots of different applications; this paper uses as its main example interactive visualisation systems. This domain requires fast answers to queries and is typically dominated by hand-written implementations. Consider the two views in the figure below. When the user selects a set of marks in , marks derived from the same records are highlighted in
(linked brushing).

A typical visualisation system implements this manually, but it can equally be viewed as a backward lineage query from the selection points in , followed by a forward lineage query from the resulting input records to
.
(See ‘Explaining outputs in modern data analytics’ which we looked at last year for an introduction Continue reading




The PX-Enterprise 1.6 updates take into account the distributed storage nature of modern applications running across different clouds and container environments.
Nokia Slashes 500 Jobs; Oracle Cloud Exec on Extended Leave; SK Telecom Picks 5G Vendors Nokia will cut 500 jobs in Illinois by year-end as part of a restructuring plan. Oracle executives declined to elaborate about the company’s cloud chief taking an extended leave from work. SK Telecom ignored Chinese vendor Huawei and picked Nokia,...
Exfo is onboarding its service assurance VNFs into the Amdocs NFV software and services portfolio, which is powered by ONAP.
The RISC-V initiative is creating a free and open instruction set architecture for the next generation of chipsets.
SDxCentral's latest research brief is aimed at providing enterprises contemplating purchasing SD-WAN solutions with a concrete five-step process and a set of core considerations critical to success.
“If you’ve got mining happening on your network, you probably have other bad stuff happening on your network, and it’s probably much worse than mining,” said CTA’s Neil Jenkins.
The software repository used the news to take a swipe at rival GitHub, which Microsoft purchased for $7.5 billion.