Senti — Creating real-world healthcare Artificial Intelligence

Telegraph Article - 9th September 2024 - ChatGPT diagnosing respiratory illness

AI in Health Today

Artificial Intelligence (AI) has a health problem. Perhaps it’s nothing more than performance anxiety, but every time an AI is deployed in the real world to solve a problem in healthcare, it falls spectacularly short. Promising lab-validated work is hyped only to underperform, or, more commonly, never emerge in the real world.

Most recently, it was Google Health’s retinopathy (eye disease) AI. Four years (so far) in the making by one of the most talented teams in the world: only to fail in the chaos of real-world data acquisition, shaky internet connections and poor lighting (“A Human-Centered Evaluation of a Deep Learning System Deployed in Clinics for the Detection of Diabetic Retinopathy”).

Google has certainly come to the right conclusions about understanding the healthcare processes, the patients, the staff, and the environments necessary to deploy an AI successfully. But, we argue a fundamental problem remains, and it will always suppress the benefits of these systems. To understand this issue, let’s first understand these processes ourselves. First, the context: diabetes can cause irreversible eye damage — but this can be treated if caught early. Catching it early means catching it before symptoms (reduced vision) develops. This requires a screening programme.

  • 1. A patient with diabetes is invited to an eye screening clinic.
  • 2. A nurse takes a picture of the back of the patient’s eye.
  • 3. These pictures are then sent to an ophthalmologist to be graded according to the severity of damage to the eye.
  • 4. This grade determines how often the patient should have their eyes rechecked and whether to start any active treatment to prevent damage.
Diabetic Retinopathy
Diabetic Retinopathy
Google’s simple change was to send the images to their AI for grading, rather than an ophthalmologist. Unfortunately, over 20% of the images were deemed ungradable. The AI set a high bar for image quality — far higher than that set by the team in usual practice. Unfortunately, this was something of a deal-breaker — causing substantial frustration for both the clincial staff and patients.

Another major issue is the increase in turn-around time. These nurses are already stretched (as described in Google’s paper) to deliver on screening targets. Currently it takes a nurse 3 minutes to photograph the retina and send these images to an ophthalmologist. With the Google AI the amount of time the nurse spends increased to 5 minutes, and, regardless of the time saved for ophthalmologists, this has the potential to overbalance an already precariously perched resource.

If nursing time and ophthalmologist time are equally constrained, taking time from nurses and giving it to the ophthalmologists is simply robbing Peter to pay Paul. The only accomplishments are increased healthcare whole-system costs by the cut the thief takes.

This is simply the latest in a long line of Healthcare AI disappointments. MD Anderson’s partnership with IBM Watson ended in 2017, with $62m down and nothing to show after 4 years. A read through IBM Watson Health’s scientific year in review documents and quarterly reports (IBM Watson — scientific reports; IBM Watson — 2019 Scientific Year In Review) reveals many small-size retrospective “encouraging” and “promising” studies in a range of fields of healthcare.

This is a stark contrast to their success as a knowledge tool in, for example, the oil & gas and engineering industries (IBM Watson — Woodside case study), where they already enjoy real-world success. Again, this is a world-leading team: we can be confident that the problem does not lie in the team, technical implementation or execution.

Babylon Health is another world-leading team. Their GP-at-hand app has certainly made healthcare more accessible and is rated highly amongst users. But their AI Triage system, with over $100m invested, has nothing more to show than“Output from Triage does not constitute medical advice, diagnosis or treatment; it provides triage advice based on the combination of symptoms that a user enters.”,further warning “Do not use Triage in an emergency” (Babylon Health AI Triage system). If Triage is neither medical advice, nor diagnosis, nor for use in an emergency, we’re left wondering: what is its place in the real world?

And finally, there are the meaningless successes. Google’s AI for predicting Acute Kidney Injury (“A clinically applicable approach to continuous prediction of future acute kidney injury”) undoubtedly can. Indeed, it can predict 55.8% of all AKIs (and 90.2% of the most severe AKIs) up to 48 hours in advance of the condition developing. But the real question is whether we can do anything to stop AKI developing in these patients. We can certainly try, but do we try in vain? Regardless of someone’s probability of developing AKI, we already positively intervene to keep a peron’s kidneys healthy in hospital. Therefore, the only actions available to us are to avoid actions which have a theoretical chance of increasing that probability of kidney damage. But these inacted actions would have managed the patient’s actual reason for being in hospital. There is no point in saving a peron’s kidneys only to let them die from their initial peresnting complaint.

Stop Renal Failure
Everything in medicine is risk versus benefit. Whilst we must remember to, first do no harm, we cannot become so afraid of risk that we do not do anything at all. I’m reminded of a patient whom I looked after as an on-call junior doctor. This gentleman was a day post-op from a knee replacement but was, unfortunately, suffering a heart attack. We started preparing the strong blood thinning medication used to treat heart attacks. There’s a risk of bleeding from these medications, so the medical registrar who was with me asked that I discussed the patient with the orthopaedic team (who had performed the replacement), due to the risks of bleeding into the recently replaced knee. The orthopaedic registrar said simply “of course — but if you save his heart, we can look at salvaging the knee later”.

It was Dr Saurabh Jha (https://twitter.com/RogueRad) who coined the phrase “Quantitative Cassandras” in his opinion piece reviewing this supposed triumph (“The Rise and Rise of Quantitative Cassandras”). Cassandra, the oracle extraordinaire of Greek mythology, knew perfectly the future but could not act to stop her predictions coming to pass. These alluring Quantitative Cassandras offer action based on possibilities (admitting their imperfect knowledge) but have no way of knowing how their offered-actions affect the probability of their own predictions. One knows in vain; the other acts in vain. Both are tragically futile. We recommend you read Dr Jha’s piece in full.

Why healthcare AIs fail in the real world

What is happening here? With evidence of AI’s success in other spheres, is healthcare somehow immune to the charms of AI? We do not believe so, but AI does have a health problem. Specifically, it seems to have trouble in identifying real-world datasets which can provide insights that are actionable in the real world.

We’ve discussed perfect data-sets failing in the imperfect world and we’ve discussed real-world data-sets failing through lack of perfect actions. However, there’s a missing piece of the puzzle; with AI, we get perfect knowledge of AI’s own imperfections. There is always a trade-off between knowing perfectly and acting perfectly. We call these quantities specificity and sensitivity. Specificity tells us what something is: allowing you to act perfectly. Sensitivity tells us what something isn’t: allowing you to know perfectly.

For example, a positive test with a specificity of 1 means your patient has the condition for which you are testing. Knowing what something is allows you to act perfectly. But, if the test is negative, what are the chances that your patient has the condition anyway? If the sensitivity of the test is less than 1, then there is a non-zero chance that the test is a false negative, and your patient does have the condition.

Similarly, a positive test with a sensitivity of 1 tells you very little if the specificity is substantially less than 1: there is a non-zero chance that the result is a false positive. So, despite the test, your patient does not have the condition. But a negative test with a sensitivity of 1 tells you that your patient, categorically, does not have the condition for which you’re testing. This allows us to know perfectly what something is not.

Sensitivity and specificity are often at odds in a probabilistic model — trying to improve one will worsen the other. So, an AI can tell you either whether patients have conditions (but knowing that it will miss the diagnosis in some people) or whether patients do not have a certain condition (whilst admitting that it will mistake the diagnosis in some people).

The problem is that either way (choosing imperfect knowledge or imperfect action) people die. To ground this in the real world: as an Emergency Medicine doctor, I can gather a great deal of data through tests to work out what conditions are and, often more importantly, what they are not (Emergency Medicine is often not the art of diagnosis, but the art of ruling out life-threatening illness). However, despite this, every single person I discharge gets similar advice (though tailored to the exact situation): come back if things get worse. This is how we deal with the knowledge that, sometimes, we make mistakes and miss something vital.

AI has a harder problem. As a human, you can rely on me never to send you home unless I am sure (within the limits of my reason) that I have not made a mistake. My fallibility, though certain, is unquantified. My defence is not knowing I was making a mistake — and nor could I have known. Tragedy can only be met with empathy and apology and a commitment to improving our systems to prevent the same mistake happening again.

But AI is part of this system, and AI knows, quantifiably, its own fallibility. If a 20-something year old with no prior medical problems is discharged home, following attending accident and emergency (A&E) with abdominal pains, and subsequently dies due to an unrecognised first presentation of Type 1 Diabetes in Diabetic Ketoacidosis (DKA), we can then change our systems to take blood glucose readings from every single patient coming through our doors. (This is exactly what has happened across most UK A&E departments in response to this particular tragedy). An AI symptom triage system, with perfect information, on the other hand, knows that there’s a 1 in 100,000 chance (maybe smaller) of a 20-something-year-old with abdominal pain having this life-threatening condition.

The AI has two options.

  • Option 1: Choose sensitivity (perfect action) and ensure that every case of DKA is picked up by referring every patient with this presentation to A&E for blood sugar readings.
  • Option 2: Choose specificity (perfect knowledge) and accept there is a 1 in 100,000 chance of a person with this presentation dying tragically.

Option 1 is game-over for healthcare services which would be overrun with young people with constipation or trapped wind arriving at A&E because the AI told them so.

Option 2 is game-over for the AI. Ultimately, it chose to let a real person — someone’s son or daughter — die.

AI in Health is in an impossible Catch-22 situation
AI in Health is in an impossible Catch-22 situation
Most AI’s are optimised somewhere between perfect action and perfect knowledge (trying to maximise a statistic we call the receiver operating characteristic — or ROC — curve). Unfortunately, this will only ever amount to imperfect actions based on imperfect knowledge. Sounds rather like me, doesn’t it? And like me, AI in Medicine is doomed to make mistakes. But an AI can never feel any of your pain or loss. An AI can never apologise sincerely. As a human, I can at least try, but — for an AI — failure is both effortless and meaningless.

This may feel like an odd post coming from the founder of an AI-enabled health technology start-up. But by dichotomising the choice between sensitivity and specificity (albeit, allowing a spectrum in between), I have, I’m afraid, slightly misled you.

Creating successful real-world healthcare AIs.

Fortunately, there is another option. The AI, of course, could order a blood glucose test. In fact, it could order a panel of bloods. It could examine the patient’s tummy: carefully feeling for any tell-tale rigidity of the abdominal muscles betraying underlying inflammation within the bowel. It may even consider whether a CT scan would be warranted, perhaps after giving a little analgesia and monitoring the effect. At least, that’s what I would do, if I was the AI.

This is, perhaps, all a little challenging for a triage AI which is currently simply built around the concept of asking patients a series of questions and mining the answers for probabilities. It isn’t really AI which has a health problem, but healthcare which has an AI problem. In the real-world of imperfect knowledge and imperfect actions, where each decision has potential life-altering consequences, healthcare needs an AI which can navigate within that context.

Senti — AI enabled precision and predictive medicine.
That’s how we’ve built Senti. Senti’s AI does not make a snapshot decision on how to act. Rather, it can wait until it has enough information to make both a safe and effective decision. We do this by monitoring clinically significant bio-acoustic markers. These are lung and heart sounds that change as conditions progress, and, as they change, we’re able to know whether to escalate decisions and recommend more urgent options. Even if Senti misses the prediction, by monitoring these clinically meaningful markers in real-time, we will always know when things are going wrong. If these markers change suddenly, we are there: Senti will call emergency services to save your life.

Even in the UK, over 40% of people who die from Asthma die before being able to call for help or before help arrives. Senti ensures that help is always at hand. Senti is an AI, but, unlike the healthcare AIs which have gone before, Senti can check its own answers. And, rather than needing to ask forgiveness if it makes a mistake, Senti instead will ask for an ambulance: ensuring people with long term respiratory illnesses can enjoy more days without breathlessness.

Senti is currently recruiting pilot sites for clinical trials. To learn more, reach out to us here: https://senti.care.