MCP in EHRs: Conversation with Aledade Transcript

Farzad (0:00) Josh, we started working together 15...

Josh Mandel (0:06) Yeah, yeah, early days of SMART.

Farzad (0:10) Yeah. And we, at the National Coordinator for Health IT, we had some nice assets for research and development and one of the 15 R&D projects we funded ended up having the biggest impact, which was the SMART project at Harvard and Josh was there and has always impressed me with his generosity, his ability to go deep, but then also to share what he has learned. And we saw Josh as per usual, on the outward edge of looking at, in this case, some of the model context protocols and he's generously agreed to come and share with our team some of what he's learned, but also for us to explore together where, where might this technology lead us both in terms of policy as well as implementation. Maybe the team can quickly introduce themselves and their roles and then I'll turn it over to Josh.

Ashok Srinivasan (1:22) Ready? How about Nick?

Nicholas Gerner (1:24) Yeah, sure. So, Nick, nice to meet you, Josh. I head up what's called a point of care organization, which is primarily focused on building out an overlay, which is sitting on top of EHR systems and presenting validated insights to the providers. Been here for about two years, heavily involved with a lot of our AI and machine learning efforts. A lot of questions, but I'm sure we'll get to them. And that's it for me. Ashok, you're next.

Ashok Srinivasan (1:53) Yeah, cool. Hey, my name is Ashok. I head the leading engineering here for product and platform team and the point of care overlay is part of my org. I've been here for about a year and a half. I'm a long-time Microsoftie, so 14 years Microsoft and then I'm trying out healthcare. And I've seen your part of the work, Josh. Awesome.

Jonas Goldstein (2:20) Hey, definitely.

Ashok Srinivasan (2:28) Can't hear you, Jonas.

Nicholas Gerner (2:31) listening.

Ashok Srinivasan (2:35) Okay, good. Jonas, he's on the care transformation side and he's actively helping in a bunch of transformation projects related to AI and things like that.

Josh Mandel (2:46) Awesome. Well, Farzad gave me the quick intro, so I'll just say, I'm Josh, I'm a physician and software engineer. I've been working on health data access and interoperability ever since I got interested in this area as a medical student, and I've had the good fortune to be able to do work starting in the academic world that was in large part government sponsored, federal grants. And then I've been able to keep pursuing this thread even as I've transitioned into industry. So, I run a very small team in Microsoft Research. So, it's me and two full-time software engineers and we're really focused on contributions to the healthcare standards community. So, the core of our work is contributing to the FHIR standards and a set of associated content around them. And I see your note there, Jonas, that sounds great. Oh, and just so everyone's on the same page, make sure we're still here. Farzad noted in his initial email that it would be good to be able to record this and share it more broadly. So, I hit the record button on my side, and hopefully we could just keep the discussion at a level that we're comfortable having it shared for the public to see. Awesome.

Ashok Srinivasan (3:59) Awesome.

Josh Mandel (4:00) So, where should we start? I've been throwing a few articles and thoughts on LinkedIn and building some demos and trying out some capabilities. So, happy to answer questions or dive in wherever you think would be best.

Farzad (4:14) I think maybe start with, and particularly if we're going to have general audience that's potentially interested in this, just start with the basics. How, and I thought it was, you had an enlightening conversation with Aryan Malik over email. What is, for the average person who's been involved in the health IT community and the EHR world, why should they care about MCP?

Josh Mandel (4:41) Yeah, super. So, we are now getting to the point with frontier language models where they're becoming quite useful in working with healthcare data. So, in the work that my team does day-to-day, the best models from Anthropic or OpenAI or Google today are very capable of reading in structured healthcare data and processing it, transforming it into other structured formats, but also reading in unstructured healthcare data and doing on-the-fly abstractions, taking a pile of clinical notes and turning them into structured questionnaire answers. All kinds of things are possible right now. And one of the challenges has been how do you give the models the data they need to actually work with? And a lot of end-user facing workflows today, a patient wants to ask some questions to ChatGPT, the vehicle for that is let me copy and paste. So, I might literally go to my patient portal, copy some clinical notes onto my clipboard, paste them into ChatGPT and start asking questions. And that is actually a pretty reasonable workflow. One of the nice things about it is that you know exactly what the model can see about you. There's no mystery. But it has some obvious limitations as well. First of all, it might just be very cumbersome and painful to go have to copy and paste notes back and forth and keep them up to date. It's just not good user experience. And of course, you might also run out of space or you might have more information, like piles of old data and PDFs that you can't pass all of because the model just can't handle a context of that size or can't handle it effectively. So, one of the capabilities that has emerged over say the last one to two years has been giving models access to some set of tools, which is really just functions they can invoke. And as the model is processing the data that you've given it directly, the model can issue tool calls that might do something like query the outside world, the EHR in this case, for specific data. And if the model is smart enough, it can make good use of those tools and fetch what it needs just in time, and be able to pull that data in from the EHR and then reason over it and use that data in its next output generations. And that's a pretty good pattern, especially with the frontier models today, they can make pretty effective use of those tools. And there are some nice things about it from the user perspective, it's still inspectable. So, you can look and see exactly what tool calls the model is making and what results it's receiving. So, if you're wondering, well, how did it get this information, you can go and inspect that. And that might sound like an obvious thing that of course it would work that way. But we've also seen examples of approaches to solve this problem that don't provide that same level of transparency. So, for example, in the OpenAI chat interface, you can upload lots of PDF documents and you might think that the model is going to see all those PDFs, but by default, it actually won't. The model will just be able to do a fuzzy search over the PDFs, but they don't show the user what the model searched for or what it saw. So, you might upload 10,000 pages of documents and like might see a little animation that says something like reading documents that pops up for a moment. And for me, that's always like a red flag as an end user. I'm like, well, what did it read? What did it miss? I don't have good confidence that it's going to be using the right information if I can't inspect what it saw. And so, explicit tool calling where you show the user the results, very helpful. And then the question becomes cool. There might be lots of people out there who are writing tools, connecting to different underlying databases or providing different capabilities. And if you have to do a custom integration with every tool and every model out there, people waste a lot of time, plugging this one particular tool into Claude and then into ChatGPT and then again into Grok and on down the line. So, that is really the problem statement where model context protocol is trying to come in and standardize that interface to say, each of these models probably has its own underlying training protocol for teaching Grok or ChatGPT how to use tools, but the definitions of the tools can be standardized. We can say here's what they're called, here's the schema of the input, the parameters that you would pass when you call a tool, and here's what the output is going to look like. And so, the model context protocol just provides that definitional layer. So, if you're publishing a tool, you can host it as an MCP and then any chat interface that supports MCPs can call your system, do a little interrogation up front to say, hey, what tools do you have installed right now? And then it can tell the model to tell the language model about those tools. And that has been a very successful protocol in part because it is just how thin and small. It solves a very narrow problem, but it's given people this spark of initiative to say, we can connect all kinds of things into models this way, not just text, but we can, if you're doing development of web apps, it becomes a really useful to have a tool that takes a screenshot of your website and passes it into the model. So, lots of these possibilities become become open. And then very interesting to see how models can make use of a whole set of tools that do different things and combine the application of those tools. It's still very early days. The models aren't really all that smart when it comes to figuring out the right tool to use, but you could see the pace of progression over the last year or so, and really start to get a feeling that these are, these are powerful capabilities and they're worth investing in right now, even if they're still pretty early. So, that's my overview. There's no magic to MCP itself. You could have had any number of protocols for doing this kind of tool discovery. It's not a hard problem, but getting mindshare around a solution is a little bit magical.

Ashok Srinivasan (10:39) That's it.

Nicholas Gerner (10:45) We have, so the comment around the models and the tool calling and where they're how fine-tuned they are on tool calling, this is certainly from what we've seen. And it creates a situation where we have to decide whether to optimize for that lack of tool calling with frameworks and code and whatnot versus is six months from now are they just going to incorporate this into the like instruction tuning process and therefore they'll all be fine and they'll all be able to tool call when except for us. So, just curious if you have thoughts on on like what that time horizon looks like and how much we should invest in solving for that problem now versus just letting these models get better over time.

Josh Mandel (11:36) Awesome. Yeah, and I see Farzad's comment about what's the EHR specific story here too. I mean, here's how I would think about this. Right now, if you are trying to present a general purpose UI where the end user could ask you all kinds of questions and you're not sure what kind of data you're going to need because it's dependent on what the end user just asked you to do. This chat session they might be asking me to help with a prior auth and a moment from now they might be having you write up patient instructions for how to take a medication. In that scenario, tool calling is really helpful because you let the model leverage its own smarts to figure out what information it needs to do a good job. On the other hand, if you're building a web service that just happens to be using an LLM, because it's flexible and useful, but really the purpose of your web service is medication list reconciliation. That's a case where maybe you don't need to lean in on tool calling. Maybe you just have a hardcoded pattern for, hey, we're going to query the medication history, we're going to query the most recent clinical note, we're just going to define what we want in context and we're just going to pass that to the model and we're going to have a reliable result every time. I mean, there's a big asterisk anytime we talk about reliable results with language models, but we know the model is going to be seeing exactly the right set of information every time. It doesn't have to be smart enough to tell us what it wants. So, any problem that has that shape is totally worth just coming up with some heuristics and passing the data that you know that you're going to need. It's faster at runtime to do that, and it's just less random. But it depends on your use case and there's a lot of places where you can also pick a hybrid point in the design space where you say, we're always going to pass in the patient demographics and current problem list, whatever, there's a few things that you just want to know about, problems, meds, allergies, and then if you want to do a deep search on the clinical notes going back historically, that might be something that you lean in with tool calling. And when I think about how to plug tools into the EHR in particular, there's one question about like who's going to provide the tools. It would be lovely if EHRs just supported these kinds of search tools natively. And probably that'll happen someday. It seems like a pretty natural direction for development. But right now, one of the only things that you're guaranteed to get out of the EHR is standardized data payloads like FHIR for the US Core data set. And so, in my little experiments, I've been saying, I'm only going to assume the things that are built into certification, the things I know any certified EHR can get me, and then I'm going to build out the tool logic for myself. And for something like searching clinical notes, it's not all that tremendously complicated. You just fetch all the notes once ahead of time and then you expose a tool that just does an in-memory search on the current patient, you could get as sophisticated as you want, but the basic outline of that tool calling is pretty straightforward. And for me, there's two main advantages of letting the model make these calls. One is just being able to search against a data set that's just too big to put in context. And the other is letting the model write code that runs over data. So, if I have structured history of every blood pressure reading the patient's ever had, I don't exactly want to pass that as a text blob into the model context, even though that might give the model reasonable insight, if the model really wants to calculate an average blood pressure, it can very easily spit out some JavaScript code to reduce over the blood pressure history and compute the averages or monthly averages or any kind of summary statistics it decides are relevant. So, giving the model the ability to run a chunk of code against a big database full of structured stuff is also a really powerful capability and it works well for things that, where you could put the information in context, but the models just have limited ability to make use of large context. If you give it 5,000 blood pressure readings, it can't correctly take an average. It'll probably squint and give you a number that's in the ballpark, but it's just not the same as writing the code to do it. And I like to think about how would I go about this as a human, there's some problems for which I would scan several pages of text and there's other problems for which I would write a query. Same thing with language models.

Ashok Srinivasan (16:01) I have a similar question. Which is, Josh, where do you see the playing out with EHRs and also other software, the healthcare system, like pharmacy. Do you think the speed of embracing this on incentives for them?

Farzad (16:20) Another, Josh, we had refer to that is, I tongue-in-cheek tweeted that if we were doing meaningful use in EHR certification today, we might include MCP as a requirement. Does that make sense or does that not make sense?

Josh Mandel (16:41) So, it might make sense as the first part of the question. MCP is really just saying, how do you plug tools into a language model? So, the next part of the question is, well, what tools? If all you said was, hey, every certified EHR has to do something with MCP, you're just going to get a hodgepodge of different functionality, some of which might not be that useful. This is always the challenge of regulating functionality into existence. You'd also always much rather see a few vendors who have organically adopted some technology because it was useful for them and their customers and then you start to say, okay, how did vendor A, B, and C each do it? Were they different for arbitrary reasons? Can we get some energy around standardizing this? So, like when it comes to MCP in particular, I guess what I would say is, no, I don't find that in and of itself very compelling. I would rather say that vendors need to have useful APIs and they need to make sure any developer can use those APIs. And then as a developer, I can wrap them in an MCP if that's what I want to do. That's great for plugging stuff into a language model. But like that's the very thin layer on top. I think today we're still having problems at a much lower level on Maslow's hierarchy of needs, which is, is there an existing API that even does what I want? As a regular old developer who's not in a deep partnership with the EHR, am I allowed to call it? Is it documented and defined in a way that I can understand? Like, those are right now the we're solving problems at that level. How you wrap it to pass the stuff into a language model, that's relatively trivial. And then the other longer-term theme for me has been, is it really that important to have standards anymore? If language models can just read and interpret and translate all the data, I think in the long term, it's going to be a lot less important. In other words, if Epic wanted to give me different shapes of data, but they defined what they were with really great documentation, then I think a language model next year or the year after is going to be able to do a pretty good job of just working with those data sets. But, it's also worth saying that something similar is happening today. This is Epic's version of EHI export, but every vendor needs to define a full data export. This is not a standardized format, but it's supposed to be complete. So, it's the whole data set. And in Epic's definitions, they list thousands of tables that are part of this full EHI export. It's based on some of their internal data warehouse formats. But the level of documentation you get for these tables, in my experience has been just not enough even for me as an expert human to be able to make good use of the data. And so, if you try to pass all these table definitions into today's language models, they get pretty confused and I find that quite understandable because I'm also pretty confused. So, the level of documentation you need to make this stuff usable is still a pretty high bar and I don't think we're there for the non-standardized complete data sets yet.

Ashok Srinivasan (25:51) Yeah.

Ashok Srinivasan (25:54) Um okay, I guess to take it further, uh and point about what the standards matter. If you ever think about taking this from like the playright realm of or taking screenshots of that, that'd be it kind of stuff. Like how would you think about applying MCP into that realm?

Josh Mandel (26:22) Yeah. Yeah, I mean again, I think MCP is really just like a little bit of glue. The question is what are the API capabilities you're using on the other side, and so for me there's there's two big buckets. One is standardized APIs like FHIR. Well, okay, I'll say three buckets. One is standardized APIs like FHIR. And so, if you can get the data you need that way, wonderful. The next one is something like a full EHI export. So, maybe the data you care about right now is not showing up in FHIR, but you can do a full EHI export. Okay. And the third one is sort of RPA or you know what the current model providers are calling computer use APIs where you just emulate the kind of things an end user would do. Look at the screen, click here, drag there, scroll, type. And I think that set of capabilities is getting better and better. It's obviously general purpose, so like anything the user can do, you can do this way. So, there's like a proof that you won't be missing anything if it was available in the product itself, which is really powerful and important, but also the computer use APIs are still sort of slow and sort of expensive and they don't work perfectly. So, today I would say like if you can get the data through FHIR, you do it that way. And you use something like computer use as a fallback. There's also from what I can understand still a set of legal questions about you know whether EHRs can prohibit the use of these kinds of technologies. I would personally be pretty surprised at the end of the day, if information blocking doesn't guarantee that healthcare providers can put any user they want in front of the keyboard, whether that user is a language model or a human. So, I'm sort of expecting that's how will this will how this will play out, but it's not played out yet and there's still some cases that are working their way through the system.

Ashok Srinivasan (28:19) One thought I've been thinking. I was playing around with it a few days back. How, example, in the case where we know which user to go get, like in your case, you went and downloaded from the FHIR, Josh, is like I always wonder if we had to integrate with the underlying EHR with all the patients' data, right? But then I'm working with the model and I start asking questions to the model, right? I always wonder how much we can trust the model to not go and fetch other data, other patients' data, the data and it no longer authenticate only to Ashok, it could go.

Josh Mandel (29:04) Yeah. So, this is a really important point. In the examples I gave, I showed an Oauth flow where before we provided a model with access to a tool, we went through an authorization flow, very, very similar to what I showed you for bringing my data from Epic into my app. There's a similar authorization flow for bringing data from, you know, my tool into the language model client environment. And so, when you initiate a connection to an MCP, in the early days, they just sort of set this problem aside and they said authorization's out of scope, the MCP server is going to either be a local process or something running on the web, but you know implicitly you're just authorized to see whatever you're allowed to see. More recently, they've layered on an Oauth profile to say when you first introduce the tool to the client environment, you go through a user-facing authorization flow, and that can and should impose some kind of limitations. So, what that means is that later on when the model goes to issue a tool call, your tool on the back end, your MCP server is seeing here's what the model is attempting to invoke, and here's the model's authorization token, and you've got some database of authorization tokens and you know token 123 is associated with Josh's record and if the model is trying to do something like querying the underlying EHR demographics database for Ashok, it's going to just reject that query. So, I mean it's very important. Somebody has to impose that authorization logic on the flow if you don't want random data popping all around. But the protocols support that.

Ashok Srinivasan (30:39) I wonder how you do this though at a platform to platform level, like a machine to machine. Because if you have a user app, then you're right. You can do an Oauth. That could get tricky. So, I don't know what the authorization model... Yeah, I...

Josh Mandel (30:57) there's a few ways that you could sort of tackle this. One is to say that you know even like if there is an end user, like MCP is focused on say today typically clients that are user-facing, so even if that user is a clinician, you might have an automated Oauth flow that just does a couple redirects back and forth but establishes who is this user signed into the EHR as maybe your LLM client is embedded in the EHR in a patient context. In that case, you can know okay well we're running in the context of patient 123 and so you can impose limitations that way. On the other hand, maybe your tool backend server really just has a system level access token. It's able to do queries for anything it wants. You know this becomes a little bit of a harder partitioning problem. You could create some schemes like so you could say something like in the context of a tool use session, the first thing you ask the model to do is establish a clinical context, you tell why it needs to access a certain record, you know have your tool process that thing and you know you could put into the protocol sort of a session to say okay the next the next query the model writes is going to need to include a session identifier and you know your backend could impose limits there, there's all kinds of schemes. I think this is an area that does need more attention to figure out what works well.

Farzad (32:25) Do you see the possibility that if I'm an EHR, this idea of replacing software licenses with service licenses. And if I'm in EHR, I might want to begin to create agents who can do work for you. Would you see them, if I'm an EHR vendor who wants to get into that business, right, start shipping you scalable workers.

Josh Mandel (32:59) Yeah.

Farzad (33:00) Would they use MCP to wrap potentially some functions that are not currently API enabled, in an MCP so that my own or other people's agents could use them?

Josh Mandel (33:16) It's possible. Yeah. The question there becomes to what extent what user experience are they providing for you? So, let's say, the EHR vendor is going to give you access to some virtual hospital staff that you can use to extend your real staff members. What does it look like to manage that staff? Like one story there would be like, you just see a dashboard that tells you how they're all doing today and you can add items to their work queue and you'll just see as they process the work. And, in that view, all of the tool calling is happening on the EHR side. They're offering this virtual staff to you and their agents are the ones that are issuing which tools to call and they're providing tools and all you see is your HR dashboard or whatever. So, on that view, they might be using MCP under the hood, but it's an implementation detail. If on the other hand, what they're trying to do is give you a powerful set of tools that you can incorporate into your own language models, that's where standardizing the protocol could be pretty helpful. So, and MCP is definitely one option there. It's designed for things that feel like function calls. I give you some parameters and you crank on it and you give me a result back. The other protocol in the mix here is one that Google has defined maybe six weeks ago, which is agent to agent. That one is more geared towards I'm going to ask a remote worker to do a thing and eventually I'm going to get back some kind of result, but in the meantime, I expect that we're going to have a conversation about this and it explicitly models like the fact that there's a long-running task and so you could spin up these conversation threads that are oriented around those tasks. So, it would be interesting to see, my guess is that EHRs, if they really were going to get into this space, would just make up their own protocols that are morally similar to A to A. Like, if somebody was going to build this month, they'd probably just make up their own and it's going to take a while to see like what's going to drive, what pressure would drive standardization.

Ashok Srinivasan (35:21) I had one last question, Josh. Thanks for that. If you have time?

Josh Mandel (35:23) I do. Yeah.

Ashok Srinivasan (35:26) Yeah, thank you, Farzad.

Josh Mandel (35:27) I do. Yeah.

Ashok Srinivasan (35:28) Okay. How have you seen MCP evolve with writebacks eventually because it's more, you see it, do you see more of that happening eventually, like reads are very prevalent. Yeah. Unless like mutative operations, I wonder how MCP will evolve.

Josh Mandel (35:51) Yeah, so at the protocol level, MCP doesn't really distinguish. You send a request, something happens, you get a response or you send a notification. So, then the question becomes how are people using this in real life?

Ashok Srinivasan (36:07) Also, could relax a bit.

Josh Mandel (36:08) That's right. So, one of the things that makes people much more comfortable about opening up that access is, what, here's my framework for thinking about it. One is, is the model any good or is it just going to like mess up my data all the time? You really don't want to get started here if the models aren't pretty good. The next one is, can I audit it and understand exactly what was changed? So, if your APIs, maybe under the hood they're keeping really great audit logs, but you never deal with surface them to users because the audit logs are all about medical legal liability and you just expect there's going to be a team of lawyers reading them or something, like, okay, then you don't have really good auditability for your end users. So, that's a really important thing and it can be hard to design for. And the other thing you clearly would want is the ability to undo changes that were just wrong. And so, depending on how changes propagate through your system, or how you've designed it, undoing can be very hard. But there's also some design patterns that provide a little bit more protection. So, like giving the model the ability to write data into a holding tank and then like everything in the holding tank gets reviewed at some point and it either propagates or gets killed, like that's something that could make you way more comfortable with models writing data back. And the interesting thing about all this is that it has a very strong analogy in the general purpose discussion about when do EHR APIs allow writing into their systems. Leaving language models and MCP aside, I've worked on several projects now over the last five years with the EHR group through the Argonaut project about writing data back into these systems. And these schemes come up over and over again, particularly if it's a patient-facing app that's going to be the one writing data, like, maybe you've got a blood pressure monitor at home and it wants to be able to write the data into the EHR. Well, how do you know they're not going to accidentally write the same data a million times? Or how do you know they're not going to get some of the details wrong? Do you expect the clinician to review it when it shows up? Whose responsibility is it if somebody writes a blood pressure that shows the patient is really close to a cardiovascular event or something bad is happening and it's in the middle of the night and are you on the hook to notice that? So, very similar set of concerns and similar patterns that are adopted in terms of allowing the right back of data, but leaving it up to the receiving system to decide how to route that downstream.

Ashok Srinivasan (38:42) Yeah, that's I think. I think MCP as a protocol maybe will evolve eventually, you can maybe provide human the loop concept, like a workflow state management maybe. Right now they don't, but I think they might get to that state. The idea of workflow and you can show.

Josh Mandel (39:00) Yeah. Yeah, I don't know whether the MCP folks will want to take that on or leave it to other projects to build on top. Part of how successful they've been is just how thin they've made the APIs. And, that is a really good lesson too.

Ashok Srinivasan (39:21) This is awesome. This is awesome. Nick, Jonas, any other questions? I think I know we all need to.

Nicholas Gerner (39:25) Yeah, I really appreciate the time. We'll give you time back. I really appreciate the answers and whatnot.

Josh Mandel (39:35) Awesome. Yeah, it's great to connect with you all and don't hesitate to reach out over email if you got follow-up discussion.

Jonas Goldstein (39:41) You're in a flight. Thanks, Josh.

Josh Mandel (39:47) Awesome. Cool. Take care, guys.

Farzad (39:49) Thank you.

Ashok Srinivasan (39:50) Bye.

Josh Mandel (39:50) Goodbye.

jmandel/transcript.md