• 1 Post
  • 132 Comments
Joined 6 months ago
cake
Cake day: August 27th, 2025

help-circle












  • You’re over-egging it a bit. A well written SOAP note, HPI etc should distill to a handful of possibilities, that’s true. That’s the point of them.

    The fact that the llm can interpret those notes 95% as well as a medical trained individual (per the article) to come up with the correct diagnosis is being a little under sold.

    That’s not nothing. Actually, that’s a big fucking deal ™ if you think thru the edge case applications. And remember, these are just general LLMs - and pretty old ones at that (ChatGPT 4 era). Were not even talking medical domain specific LLM.

    Yeah; I think there’s more here to think on.


  • Agreed!

    I think (hope) the next application of this tech is in point of care testing. I recall a story of a someone in Sudan(?) using a small, locally hosted LLM with vision abilities to scan hand written doctor notes and come up with an immunisation plan for their village. I might be misremembering the story, but the anecdote was along those lines.

    We already have PoC testing for things like Ultrasound… but some interpretation workflows rely on strong net connection iirc. It’d be awesome to have something on device that can be used for imaging interpretation where there is no other infra.

    Maybe someone can finally win that $10 million dollar X prize for the first viable tricorder (pretty sure that one wrapped up years ago? Too lazy to look)…one that isn’t smoke and mirror like Theranos.


  • Funny how the hivemind over looks that bit enroute to stunt on LLMs.

    If anything, that 90% result supports the idea that Garbage In = Garbage Out. I imagine a properly used domain-tuned medical model with structured inputs could exceed those results in some diagnostic settings (task-dependent).

    Iirc, the 2024 Nobel prize in chemistry was won on the basis of using ML expert system to investigate protein folding. ML =! LLM but at the same time, let’s not throw the baby out with the bathwater.

    EDIT: for the lulz, I posted my above comment in my locally hosted bespoke llm. It politely called my bullshit out (Alpha fold is technically not an expert system, I didn’t cite my source for Med-Palm 2 claims). Not all hope is lost with these things lol


    The statement contains a mix of plausible claims and minor logical inconsistencies. The core idea—that expert systems using ML can outperform simple LLMs in specific tasks—is reasonable.

    However, the claim that “a properly used expert system LLM (Med-PALM-2) is even better than 90% accurate in differentials” is unsupported by the provided context and overreaches from the general “Garbage In = Garbage Out” principle.

    Additionally, the assertion that the 2024 Nobel Prize in Chemistry was won “on the basis of using ML expert system to investigate protein folding” is factually incorrect; the prize was awarded for AI-assisted protein folding prediction, not an ML expert system per se.

    Confidence: medium | Source: Mixed



  • I don’t think it’s their information per se, so much as how the LLMs tend to use said information.

    LLMs are generally tuned to be expressive and lively. A part of that involves “random” (ie: roll the dice) output based on inputs + training data. (I’m skipping over technical details here for sake of simplicity)

    That’s what the masses have shown they want - friendly, confident sounding, chat bots, that can give plausible answers that are mostly right, sometimes.

    But for certain domains (like med) that shit gets people killed.

    TL;DR: they’re made for chitchat engagement, not high fidelity expert systems. You have to pay $$$$ to access those.



  • Agree.

    I’m sorta kicking myself I didn’t sign up for Google’s MedPALM-2 when I had the chance. Last I checked, it passed the USMLE exam with 96% and 88% on radio interpretation / report writing.

    I remember looking at the sign up and seeing it requested credit card details to verify identity (I didn’t have a google account at the time). I bounced… but gotta admit, it might have been fun to play with.

    Oh well; one door closes another opens.

    In any case, I believe this article confirms GIGO. The LLMs appear to have been vastly more accurate when fed correct inputs by clinicians versus what lay people fed it.