ScootR: scaling R dataframes on dataflow systems
ScootR: scaling R dataframes on dataflow systems Kunft et al., SoCC’18
The language of big data is Java ( / Scala). The languages of data science are Python and R. So what do you do when you want to run your data science analysis over large amounts of data?
…programming languages with rich support for data manipulation and statistics, such as R and Python, have become increasingly popular… [but]… they are typically designed for single machine and in-memory usage…. In contrast, parallel dataflow systems, such as Apache Flink and Apache Spark, are able to handle large amounts of data. However, data scientists are often unfamiliar with the systems’ native language and programming abstraction, which is crucial to achieve good performance.
A tempting solution is to embed Python / R support within the dataflow engine. There are two basic approaches to this today:
- Keep the guest language components in a separate process and use IPC (inter-process communication) to exchange input and output data between the dataflow engine and the guest language process. This approach can support the full power of the guest language, but pays a heavy price in IPC and serialisation costs.
- Use source-to-source (STS) translation to translate guest Continue reading

Vidder’s technology is already integrated into Verizon’s SDP service.
Its Cloud Migration Factory platform gained new automation features, while its Multicloud Management Platform was infused with ServiceNow support.
SDxCentral Weekly Wrap for Nov. 16, 2018: Germany Jumps on Huawei 5G Ban Plans.
Kurian spent 22 years at Oracle before abruptly resigning last month. Reports quickly surfaced saying that Kurian clashed with CTO Larry Ellison over Oracle’s cloud strategy.
In a year when security startups are raising hundreds of millions in initial public offerings — including Cylance competitor Carbon Black that scored $152 million in its May IPO — it was widely assumed Cylance would follow suit.
AWS adds new infrastructure regions; The Linux Foundation launches a lab for SDN projects; ZTE contributes 5G patents to ETSI.
Cloud service providers have come to realize the importance of high-quality network connectivity to ensure a happy end user, says IHS Markit analyst Cliff Grossner.