Evaluating AI in context: Which LLM is best for real health care needs?

Evaluating AI in context: Which LLM is best for real health care needs?

When a large language model first passed the United States Medical Licensing Exam in 2023, it was a big deal. But two years later, what was once a notable milestone in artificial intelligence progress is more of a bare minimum."It's not enough for a large language model to simply answer medical test questions accurately," said Nigam H. Shah, MBBS, PhD, chief data scientist at Stanford Health Care. "That type of evaluation doesn't tell us anything about what matters."In other words, Shah said, it says nothing about how a model might perform in a real-world clinic or hospital.That's why Shah and a team of researchers have devised a framework to help fill that gap. It's called MedHELM, HELM standing for holistic evaluation of language models. It's a resource for accurate and reliable evaluations of LLMs, supporting the core principles that power the RAISE Health Initiative.

Medigy Insights

Stanford's Center for Research on Foundation Models had already developed HELM, which is an LLM evaluation infrastructure that basically allows people in any domain to create various scenarios in which AI is being used. For instance, one could create a prompt for an LLM within the HELM framework that says, "Summarize this patient's medical record." A successful summary of that patient's information needs to already exist, giving you the gold standard answer. You could then evaluate how well the model's answer agreed with the gold standard. That's a complete scenario.


Next Article

  • Evaluating AI in context: Which LLM is best for real health care needs?

    Catalysts In Innovation: A Doctor’s Dream Impacts Healthcare

    The beautiful country of Malta is surrounded by deep blue waters and has a rich history of visitors and conquerors from Europe, the Middle East and Africa. A place to soak in fascinating stories, …

    Posted Jul 22, 2025

Did you find this useful?

Medigy Innovation Network

Connecting innovation decision makers to authoritative information, institutions, people and insights.

Medigy Logo

The latest News, Insights & Events

Medigy accurately delivers healthcare and technology information, news and insight from around the world.

The best products, services & solutions

Medigy surfaces the world's best crowdsourced health tech offerings with social interactions and peer reviews.


© 2025 Netspective Foundation, Inc. All Rights Reserved.

Built on Nov 19, 2025 at 1:44pm