AI in Health Today
Artificial Intelligence (AI) has a health problem. Perhaps it’s nothing more than performance anxiety, but every time an AI is deployed in the real world to solve a problem in healthcare, it falls spectacularly short. Promising lab-validated work is hyped only to underperform, or, more commonly, never emerge in the real world.
Most recently, it was Google Health’s retinopathy (eye disease) AI. Four years (so far) in the making by one of the most talented teams in the world: only to fail in the chaos of real-world data acquisition, shaky internet connections and poor lighting (“A Human-Centered Evaluation of a Deep Learning System Deployed in Clinics for the Detection of Diabetic Retinopathy”).
Google has certainly come to the right conclusions about understanding the healthcare processes, the patients, the staff, and the environments necessary to deploy an AI successfully. But, we argue a fundamental problem remains, and it will always suppress the benefits of these systems. To understand this issue, let’s first understand these processes ourselves. First, the context: diabetes can cause irreversible eye damage — but this can be treated if caught early. Catching it early means catching it before symptoms (reduced vision) develops. This requires a screening programme.
- 1. A patient with diabetes is invited to an eye screening clinic.
- 2. A nurse takes a picture of the back of the patient’s eye.
- 3. These pictures are then sent to an ophthalmologist to be graded according to the severity of damage to the eye.
- 4. This grade determines how often the patient should have their eyes rechecked and whether to start any active treatment to prevent damage.
Another major issue is the increase in turn-around time. These nurses are already stretched (as described in Google’s paper) to deliver on screening targets. Currently it takes a nurse 3 minutes to photograph the retina and send these images to an ophthalmologist. With the Google AI the amount of time the nurse spends increased to 5 minutes, and, regardless of the time saved for ophthalmologists, this has the potential to overbalance an already precariously perched resource.
If nursing time and ophthalmologist time are equally constrained, taking time from nurses and giving it to the ophthalmologists is simply robbing Peter to pay Paul. The only accomplishments are increased healthcare whole-system costs by the cut the thief takes.
This is simply the latest in a long line of Healthcare AI disappointments. MD Anderson’s partnership with IBM Watson ended in 2017, with $62m down and nothing to show after 4 years. A read through IBM Watson Health’s scientific year in review documents and quarterly reports (IBM Watson — scientific reports; IBM Watson — 2019 Scientific Year In Review) reveals many small-size retrospective “encouraging” and “promising” studies in a range of fields of healthcare.
This is a stark contrast to their success as a knowledge tool in, for example, the oil & gas and engineering industries (IBM Watson — Woodside case study), where they already enjoy real-world success. Again, this is a world-leading team: we can be confident that the problem does not lie in the team, technical implementation or execution.
Babylon Health is another world-leading team. Their GP-at-hand app has certainly made healthcare more accessible and is rated highly amongst users. But their AI Triage system, with over $100m invested, has nothing more to show than“Output from Triage does not constitute medical advice, diagnosis or treatment; it provides triage advice based on the combination of symptoms that a user enters.”,further warning “Do not use Triage in an emergency” (Babylon Health AI Triage system). If Triage is neither medical advice, nor diagnosis, nor for use in an emergency, we’re left wondering: what is its place in the real world?
And finally, there are the meaningless successes. Google’s AI for predicting Acute Kidney Injury (“A clinically applicable approach to continuous prediction of future acute kidney injury”) undoubtedly can. Indeed, it can predict 55.8% of all AKIs (and 90.2% of the most severe AKIs) up to 48 hours in advance of the condition developing. But the real question is whether we can do anything to stop AKI developing in these patients. We can certainly try, but do we try in vain? Regardless of someone’s probability of developing AKI, we already positively intervene to keep a peron’s kidneys healthy in hospital. Therefore, the only actions available to us are to avoid actions which have a theoretical chance of increasing that probability of kidney damage. But these inacted actions would have managed the patient’s actual reason for being in hospital. There is no point in saving a peron’s kidneys only to let them die from their initial peresnting complaint.
It was Dr Saurabh Jha (https://twitter.com/RogueRad) who coined the phrase “Quantitative Cassandras” in his opinion piece reviewing this supposed triumph (“The Rise and Rise of Quantitative Cassandras”). Cassandra, the oracle extraordinaire of Greek mythology, knew perfectly the future but could not act to stop her predictions coming to pass. These alluring Quantitative Cassandras offer action based on possibilities (admitting their imperfect knowledge) but have no way of knowing how their offered-actions affect the probability of their own predictions. One knows in vain; the other acts in vain. Both are tragically futile. We recommend you read Dr Jha’s piece in full.
Why healthcare AIs fail in the real world
What is happening here? With evidence of AI’s success in other spheres, is healthcare somehow immune to the charms of AI? We do not believe so, but AI does have a health problem. Specifically, it seems to have trouble in identifying real-world datasets which can provide insights that are actionable in the real world.
We’ve discussed perfect data-sets failing in the imperfect world and we’ve discussed real-world data-sets failing through lack of perfect actions. However, there’s a missing piece of the puzzle; with AI, we get perfect knowledge of AI’s own imperfections. There is always a trade-off between knowing perfectly and acting perfectly. We call these quantities specificity and sensitivity. Specificity tells us what something is: allowing you to act perfectly. Sensitivity tells us what something isn’t: allowing you to know perfectly.
For example, a positive test with a specificity of 1 means your patient has the condition for which you are testing. Knowing what something is allows you to act perfectly. But, if the test is negative, what are the chances that your patient has the condition anyway? If the sensitivity of the test is less than 1, then there is a non-zero chance that the test is a false negative, and your patient does have the condition.
Similarly, a positive test with a sensitivity of 1 tells you very little if the specificity is substantially less than 1: there is a non-zero chance that the result is a false positive. So, despite the test, your patient does not have the condition. But a negative test with a sensitivity of 1 tells you that your patient, categorically, does not have the condition for which you’re testing. This allows us to know perfectly what something is not.
Sensitivity and specificity are often at odds in a probabilistic model — trying to improve one will worsen the other. So, an AI can tell you either whether patients have conditions (but knowing that it will miss the diagnosis in some people) or whether patients do not have a certain condition (whilst admitting that it will mistake the diagnosis in some people).
The problem is that either way (choosing imperfect knowledge or imperfect action) people die. To ground this in the real world: as an Emergency Medicine doctor, I can gather a great deal of data through tests to work out what conditions are and, often more importantly, what they are not (Emergency Medicine is often not the art of diagnosis, but the art of ruling out life-threatening illness). However, despite this, every single person I discharge gets similar advice (though tailored to the exact situation): come back if things get worse. This is how we deal with the knowledge that, sometimes, we make mistakes and miss something vital.
But AI is part of this system, and AI knows, quantifiably, its own fallibility. If a 20-something year old with no prior medical problems is discharged home, following attending accident and emergency (A&E) with abdominal pains, and subsequently dies due to an unrecognised first presentation of Type 1 Diabetes in Diabetic Ketoacidosis (DKA), we can then change our systems to take blood glucose readings from every single patient coming through our doors. (This is exactly what has happened across most UK A&E departments in response to this particular tragedy). An AI symptom triage system, with perfect information, on the other hand, knows that there’s a 1 in 100,000 chance (maybe smaller) of a 20-something-year-old with abdominal pain having this life-threatening condition.
The AI has two options.
- Option 1: Choose sensitivity (perfect action) and ensure that every case of DKA is picked up by referring every patient with this presentation to A&E for blood sugar readings.
- Option 2: Choose specificity (perfect knowledge) and accept there is a 1 in 100,000 chance of a person with this presentation dying tragically.
Option 1 is game-over for healthcare services which would be overrun with young people with constipation or trapped wind arriving at A&E because the AI told them so.
Option 2 is game-over for the AI. Ultimately, it chose to let a real person — someone’s son or daughter — die.
This may feel like an odd post coming from the founder of an AI-enabled health technology start-up. But by dichotomising the choice between sensitivity and specificity (albeit, allowing a spectrum in between), I have, I’m afraid, slightly misled you.
Creating successful real-world healthcare AIs.
Fortunately, there is another option. The AI, of course, could order a blood glucose test. In fact, it could order a panel of bloods. It could examine the patient’s tummy: carefully feeling for any tell-tale rigidity of the abdominal muscles betraying underlying inflammation within the bowel. It may even consider whether a CT scan would be warranted, perhaps after giving a little analgesia and monitoring the effect. At least, that’s what I would do, if I was the AI.
This is, perhaps, all a little challenging for a triage AI which is currently simply built around the concept of asking patients a series of questions and mining the answers for probabilities. It isn’t really AI which has a health problem, but healthcare which has an AI problem. In the real-world of imperfect knowledge and imperfect actions, where each decision has potential life-altering consequences, healthcare needs an AI which can navigate within that context.
Even in the UK, over 40% of people who die from Asthma die before being able to call for help or before help arrives. Senti ensures that help is always at hand. Senti is an AI, but, unlike the healthcare AIs which have gone before, Senti can check its own answers. And, rather than needing to ask forgiveness if it makes a mistake, Senti instead will ask for an ambulance: ensuring people with long term respiratory illnesses can enjoy more days without breathlessness.
Senti is currently recruiting pilot sites for clinical trials. To learn more, reach out to us here: https://senti.care.