Snuba: automating weak supervision to label training data
Snuba: automating weak supervision to label training data Varma & Ré, VLDB 2019
This week we’re moving on from ICML to start looking at some of the papers from VLDB 2019. VLDB is a huge conference, and once again I have a problem because my shortlist of “that looks really interesting, I’d love to read it” papers runs to 54 long at the moment! As a special bonus for me, I’m actually going to be at VLDB this year, where no doubt I’ll learn about even more interesting things! By the time you get to read this, it should be the first (workshop) day of the conference…
The conference may have changed, but to bridge from ICML to VLDB I’m going to start with a paper on very much the same theme as we’ve been dipping into over the past couple of weeks: how to combine and learn from multiple noisy sources of data and labels. Snuba is from the same Stanford line as Snorkel which we looked at last year. It’s tackling the same fundamental problem: how to gather enough labeled data to train a model, and how to effectively use it in a weak supervision setting (supervised learning Continue reading