Stanford Medicine has developed a vision-language artificial intelligence (AI) model that can predict cancer outcomes, such as forecasting melanoma relapses and patient responses to immunotherapy. However, the model is not yet ready for use in clinical practices.
The Multimodal transformer with Unified maSKed modeling, or MUSK for short, is trained on over 50 million histopathology images and one billion text tokens from clinical reports to predict cancer prognosis. By integrating both visual and language-based data, the model mirrors the approach used by oncologists, who draw from multiple sources to make informed treatment decisions. Yet, developing multimodal models has historically been difficult for other precision oncology AI initiatives.
“In a lot of these previous [AI developments], data is used in silo. This [AI] group is focused on imaging, the other group is focused on language, and they develop all these single-model based approaches,” said Ruijang Li, an associate professor of radiation oncology, who oversaw the project. “But in practice, physicians now would almost never do that.”
The team, primarily made up of researchers from Stanford’s Departments of Pathology and Radiation Oncology, published a study in Nature on Jan. 8, detailing the model’s architecture and its potential to help physicians develop more effective treatment plans for cancer patients.
As a foundation model, MUSK is trained on vast amounts of pathological data and can be tailored to specific applications with minimal additional training. The model leverages unlabelled and unpaired datasets, eliminating the need for manually-annotated images.
The team of researchers specifically evaluated the model’s ability to identify patients with the highest risk of melanoma relapse. MUSK outperformed existing vision-language models, correctly identifying those patients 83 percent of the time – a nearly 12 percent improvement over other models.
The model also outperformed in predicting patients with advanced gastroesophageal cancer who were most likely to benefit from immune checkpoint inhibitors (ICIs), a form of immunotherapy. MUSK predicted patient outcomes at a rate that was 7 to 12 percent more accurate compared to other unimodal and multimodal models.
“Only 20 percent of patients get responses from immunotherapies, so we need to find out and identify who those 20 percent of patients are,” said Jinxi Xiang, the lead author of the paper and a postdoctoral student. “For those who will not have responses, we will not get them into treatment because of economic burden, and there are lots of side effects.”
Across 16 major cancer types, the model could predict the disease-specific survival of a patient 75 percent of the time.
“Obviously, everything else that comes after diagnosis is really important,” said Steven Lin, director of Stanford’s Healthcare AI Applied Research team. “How do you manage patients? How do you treat them? How do you predict whether or not they will respond? And this model gets at that latter half of the equation, which is really refreshing.”
As a practicing primary care physician, Lin believes the use of predictive AI to advance personalized medicine is exactly “where medicine needs to go.”
With aims to deploy the model in high-risk clinical applications, the MUSK team first plans to validate their findings with more data. The model’s high demand for computing power and infrastructure may also present future challenges in the transition to the clinical setting.
“We mainly relied on the data from Stanford hospital, but for a clinical and reliable model, we need to actually collect more [data] from other hospitals as well, so we can test whether it has generalized ability on other patients of different races, patients of different characteristics,” Xiang said.