Conversational AI for Patient Education

A conversational agent, called Hernia Coach, I designed, built, and evaluated for surgery patient education.

Dissertation defended September 2019

Lead UX researcher, developer, designer, content writer, & project manager.

Collaborators

Primary: Hernia patients, surgeons, nurses.
Secondary: Dissertation committee, UW Department of Health Informatics, clinic/hospital administrative staff, National Library of Medicine.

I designed, built, and evaluated a conversational agent to support hernia surgery patient education for my PhD dissertation

Hernia Coach taught me that defining what actions an AI system should not perform is as important as defining what it should. This is a lesson I now apply daily to agentic AI at Microsoft Azure. When my dissertation research concluded, Hernia Coach achieved a 93% true positive rate with its query responses and demonstrated that a conversational agent could effectively support patient education for hernia surgery.

This work established early evidence for patient-facing conversational AI in healthcare, years before LLM-powered agents became commercially available. Additionally, my scientific contributions became the foundation for a CHI Best Paper I co-authored that extends Nielsen's heuristics for conversational agents.

Setting the foundation

Understanding hernia surgery patient information needs and the potential role of conversational agents in their care journey

Hernia surgery patients have information needs throughout their care journey. Before surgery, they want to understand the risk of complications, the impact on their activity levels, and how the procedure is performed. After surgery, they need to know how to care for themselves at home, manage pain, and keep their surgical site healthy.

Traditionally, patients have relied on their care team and printed information packets for answers. But care teams have limited time during clinical encounters and are not available outside the hospital. Additionally, the printed packets may contain dense information at reading levels above what most patients could absorb. In turn patients could turn to online sources, often from sites with unvetted or inaccurate content. Overall, it was challenging for patients to get answers to their pressing questions quickly, easily, and accurately.

In the late 2010s, conversational agents in the form of smart assistants were gaining widespread adoption on mobile phones. My hypothesis was that a purpose-built agent for hernia surgery patient education could address the shortcomings of traditional methods. I started by mapping the hernia surgery experience end-to-end: what patients went through, what information they needed, and how they felt about using a conversational agent for support.

Key research questions

What is the end-to-end journey of hernia surgery patients from day one through recovery?
What are the common information needs of hernia surgery patients along their journey?
What are the common questions nurses and surgeons receive from patients?
How open are patients, nurses, and surgeons to patients using a conversational agent to answer common hernia surgery questions?
Which questions are most appropriate for a conversational agent, nurse, and surgeon to answer?

Methods

Participatory design sessions with seven former hernia surgery patients, two nurses, and six surgeons incorporating brainstorming and journey mapping activities. Brainstorming exercise data was analyzed and synthesized using affinity diagramming. I considered semi-structured interviews alone, but chose participatory design because the research goal was as much about co-creating the agent's scope as it was about understanding patient needs.

Results & impact

The research surfaced three distinct phases of the hernia surgery journey: initial diagnosis, day of surgery, and post-op recovery. At each phase the patients, nurses, and surgeons were open to using a conversational agent for patient information needs. All three groups agreed that the agent should handle common questions broadly applicable to most surgery patients. Patient-specific questions like medication management, and urgent situations like potential surgical site infections, should be handled by nurses and surgeons. This work defined the scope question that shapes every patient-facing AI system: which questions or actions are appropriate for the system, and which require human judgment. That distinction became the design principle for the proof-of-concept I built and evaluated next.

Building Hernia Coach

Designing and building a smartphone-native conversational agent for surgical patient education within the constraints of pre-LLM voice assistant platforms

With patient, nurse, and surgeon openness to using conversation agents for patient education established in the prior phase, the next step was building a proof-of-concept. Three design principles guided the build:

Access within a widely adopted smartphone assistant
Support of multiple interaction modalities (voice and keyboard)
Present accessible information across multiple media formats (text, audio, image, video)

Key research questions

Which smartphone assistants had the largest reach, supported multiple interaction modalities, allowed multimedia content, and supported building custom conversational agents?
Was it feasible to build a proof-of-concept conversational agent with accessible, patient-centered health education content?

Methods

I conducted a systematic review of smartphone assistants using the Institute for Healthcare Improvement's Framework for Selecting Digital Health Technology, with custom conversational agent development as a key inclusion criterion.I considered building a standalone mobile app but rejected it because of the cold-start adoption problem. Patients would need to discover and install a new app, whereas smartphone assistants were already on their devices.

Results & impact

Platform selection: The systematic review surfaced five candidate smartphone assistants for evaluation: Amazon's Alexa, Apple's Siri, Google's Assistant, Microsoft's Cortana, and Samsung's Bixby. I selected Google Assistant and Dialogflow because they supported custom conversational agent development, both voice and keyboard interaction, free distribution on iOS and Android, and full multimedia (text, audio, image, video).

Content development: Next, I extracted content from hernia surgery patient education packets published by national healthcare organizations, academic medical centers, and hernia medical supply manufacturers. I rewrote the content to a 5th-grade reading level, aligning with Joint Commission patient communication guidelines. The biggest unexpected challenge was writing agent responses within Dialogflow's 300-character limit while keeping the content accessible.

At the end of this phase, I had a proof-of-concept conversational agent, Hernia Coach, ready for heuristic evaluation, usability testing, and query response accuracy assessments in the next phase.

Evaluating Hernia Coach

Assessing the feasibility of a conversational agent to answer common hernia surgery patient questions.

The next phase of research involved evaluating Hernia Coach’s effectiveness. The evaluation had three parts: heuristic evaluation with design experts in healthcare and conversational agents, usability testing with former hernia surgery patients, and query response accuracy analysis across both.

Key research questions

Which design components of Hernia Coach are effective, and which need improvement?
How do former hernia surgery patients respond to Hernia Coach, and does it meet their expectations?
How accurately does Hernia Coach answer common hernia surgery questions?

Methods

Heuristic evaluation with six design experts in healthcare and conversational agents, using Nielsen's heuristics adapted for healthcare conversational agents. In-person moderated scenario-based usability testing sessions combined with semi-structured exit interviews, conducted with six former hernia surgery patients. I considered going straight to usability testing with patients, but ran the heuristic evaluation first so design issues could be addressed before exposing patients to potentially unidentified system issues.

Results & impact

Heuristic evaluation: The evaluation surfaced strengths and weaknesses. Effective components included clear and consistent patient education content, proficient natural language understanding across voice and keyboard, useful conversation guidance, and easy recovery from mistakes. Weaknesses included poor discoverability (Hernia Coach was embedded within Google Assistant rather than discoverable as a standalone tool), unhelpful error messages, incorrect query responses, and lost dialog history. I iterated on Hernia Coach's design within what the Google Assistant framework allowed.

Usability testing: I conducted usability testing with former hernia surgery patients after iterating on design and content. The participants reported that Hernia Coach would have been a useful part of their care journey and that they would recommend it to others undergoing surgery. They saw it as an effective supplement to their traditional education sources (surgeons, nurses, and printed materials), with the potential to replace paper information packets entirely. Participants also saw the agent as a way to reduce the burden on clinicians, freeing surgeons and nurses to focus on direct patient care and high-consequence questions.

Query response accuracy: Query response accuracy improved iteratively as I added training data based on observed errors. By the end of the dissertation research, Hernia Coach achieved a 93% true positive rate with its query responses.

In summary

Reflections

My biggest blind spot was failing to train Hernia Coach on which topics or questions it should not answer. I had focused entirely on the content Hernia Coach should answer. This gap explained most of the incorrect query responses the heuristic evaluation surfaced. The lesson: defining what a system should not do is as important as defining what it should. I've applied this guiding design principle to every AI system I've worked on since.

High level impact

Hernia Coach taught me that defining what actions an AI system should not perform is as important as defining what it should. A lesson I now apply daily in agentic AI work at Microsoft Azure. When my dissertation research concluded, Hernia Coach achieved a 93% true positive rate with its query responses and demonstrated that a conversational agent could effectively support patient education for hernia surgery.

This work established early evidence for patient-facing conversational AI in healthcare, years before LLM-powered agents became commercially available. Additionally, my scientific contributions became the foundation for a CHI Best Paper I co-authored that extends Nielsen's heuristics for conversational agents.