By Devin Gupta
Founded in the Stanford Artificial Intelligence Lab, Snorkel began as a research project for former Stanford Ph.D. students. Their research became the foundation for their company, a data management product that helps companies build, manage and monitor artificial intelligence (AI) applications. Today, Snorkel works with various partners, including Stanford Medicine and Google.
Traditionally, machine learning, an application of AI, allows systems to learn from data and improve from experience without being explicitly programmed. Typically, these systems utilize millions of examples to learn. However, the team’s research into “weak supervision” proved that researchers could teach the machine using imprecise, but commonly correct, rules.
CEO Alex Ratner Ph.D. ’19 uses contracts as an example to explain weak supervision. In the past, the machine would require lots of previous labeled contracts in order to learn how to identify new contracts. Weak supervision allows the administrator to provide a filter, telling the machine, “If you see the word ‘employment’ in the first 10 words, then probably label it an employment contract.” After using many filters, the machine eventually learns which filters are most accurate.
The idea for Snorkel came about after a group of then-Ph.D. candidates noticed commonalities in their work creating various machine learning solutions for different businesses. They decided to build a widely applicable end-to-end platform for creating, capturing and using data.
Their first product, Snorkel Flow, allows users access to plentiful data, the ability to clean it so that it is easy to work with and use in diverse ways and the ability to understand and correct errors in one interface.
Snorkel cofounder Paroma Verma Ph.D. ’19 says that their technology is much faster than the traditional process and radically more efficient.
“You’ve gone from weeks or months of manual annotation required to label these data points to maybe a matter of hours or days by just writing a couple of rules.”
Part of the effectiveness of the Snorkel platform is due to the versatility and newfound benefits of weak supervision. Where previously experts had to spend months categorizing and manually reviewing data, now they can set up the platform to do the same work, only much faster. This change provides the opportunity to use machine learning for a new host of scenarios, including many that were formerly unavailable such as medical research, contract negotiation and marketing analysis.
Ratner explained the existing difficulty of using machine learning in these kinds of fields.
“Every problem you want to tackle, you need someone to [manually] label all this training data. This is really prohibitive when you have problems that require experts, include privacy concerns or where the problem is changing all the time,” he said.
Ratner and his team believe that Snorkel can help address these issues and streamline the data analysis process.
Recounting their time in the lab, cofounder and head of technology Braden Hancock ’19 says, “Anytime you see so much industry interest in a project, it suggests that you’re not just finding something interesting, but that will actually move the bottom line.”
“When we worked with Google, we saw that even for an organization as sophisticated and well-funded as Google, it could still provide a huge lift, and that was a confirmation that this solves real problems at large scales with big dollar amounts associated,” he added.
On July 14 the company left “stealth mode” after raising a combined $15 million from Google Ventures and Greylock Partners.
“We had a team of extremely talented people and a lot of ideas after four and a half years of building [the platform] with various partners. In some sense, we had already built it,” Ratner said. “At that point, we just thought, ‘We just need some money to keep the lights on and pay people.’”
Contact Devin Gupta at devin.gupta.dg ‘at’ gmail.com.