Associate Professor Karen Monsen, center, discusses CaringBridge data with research assistants Sasank Maganti, left, and Christie Martin.

Spinning unstructured text into gold

Nursing informaticians comb through 14 million CaringBridge journals to better understand the healing journey
Brett Stursa

Chances are one in nine that you have read a CaringBridge page to follow someone’s health journey.

The Minnesota-based website invites people to communicate progress in their health journey, be it a cancer diagnosis or birth of a baby, without the burdensome redundancy that can happen with phone calls or texts. CaringBridge’s founder Sona Mehring sees the website as a holistic intervention and an important part of anyone’s health journey.

With a desire to understand CaringBridge’s impact and to better understand the healing process, she approached Mary Jo Kreitzer, PhD, RN, FAAN, School of Nursing professor and Earl E. Bakken Center for Spirituality and Healing director, to explore a research partnership.

Kreitzer recognized that more research possibilities would emerge as the data was explored. “It was clear from the earliest conversations that it would be important to convene an interdisciplinary team with diverse methods and tools to apply to the many questions that could be explored,” said Kreitzer. A team formed with researchers from the Center for Spirituality and Healing, the School of Nursing and the College of Science and Engineering.

While the center’s work has focused on how short, daily mind-body practice can help those facing health challenges, the School of Nursing’s work is using a novel text mining approach to discover important patterns in the content of the journal entries.

CaringBridge users shared more than 14 million journal entries about their healing process during a 10-year period starting in 2006. As a nurse informatician and big data researcher, Associate Professor Karen Monsen, PhD, RN, FAAN, knew the richness of the data stored in the journals, and she also knew that combing through millions of journals would be a long, challenging process. “The corpus itself was massive, and it was also unstructured,” said Monsen. “So those two aspects made it very challenging.”

While typical health data is often structured making it easier to analyze, CaringBridge journals are written in natural language that is harder to code. Instead of the diagnosis codes clinicians use, CaringBridge users express themselves in widely divergent and individual ways. For example, they may refer to mental health distress using phrases like they are feeling down or with other unique, non-clinical phrases.

Monsen considered this a Rumpelstiltskin problem. “Rumpelstiltskin spins straw into gold. That’s what I want to do with CaringBridge text. I want to turn text into structured data so that I can apply the usual data mining algorithms and techniques we use with our big data research to look at how CaringBridge can better help and support users,” said Monsen.

To structure the text, Monsen used an approach called the Omaha System, which was particularly well-suited for this project because it comes from research that looked at how people describe health. “It is made as simple as possible so it matches the way consumers think about health, in a way that other terminologies do not,” said Monsen. The Center for Nursing Informatics at the School of Nursing is home to the research arm of the Omaha System, which defines the whole person in 42 health concepts.

Already, Monsen’s team is discovering interesting patterns. “People who are talking about infections are also talking about heart-related issues. And, that isn’t a natural pairing,” said Monsen. “We need to look deeper to see what’s going on with those journals so we can understand what’s going on with that group of people.”

So far, sleep and spirituality top the list of topics mentioned in the journal entries. By looking at word frequency in the journal entries, the team made a sobering discovery. “There were many, many words that were just basically screams,” said Monsen. “There was the letter a repeated, and then it would end like ‘aaarrrggggh’ or ‘aaaaaagony.’” Monsen said the discovery was a reminder that behind many of the journals are people who are suffering. “Most importantly, as you explore this data, you have to be careful and respectful,” she said.

Monsen’s team continues work on tagging journals and refining methods

“We’re really excited about clustering and moving on once we have the journals tagged,” said Monsen. “I think there’s going to be a really great opportunity to understand how consumers approach their health journeys, and it will be different from how we understand them from our perspective as health care researchers.”


In Print

Karen Monsen’s article Use of the Omaha System for Ontology-based Text Mining to Discover Meaning within CaringBridge Social Media Journals is published in Kontakt, Fall 2018.