Wouldn’t it be wonderful if there was a single blood test that could pick up not just one, but dozens of early cancers, allowing early treatment, and countless lives to be saved? According to an auspiciously named American high tech company, GRAIL, Inc., just such a goal might be in reach. It claims it’s Galleri blood test, which uses clever technology to detect fragments of tumour DNA in the blood, can detect many common cancers in the early stages, with a false positive rate below 1%. On the back of this, it has secured a ‘commercial partnership program‘ with the NHS to pilot the test in around 165,000 people, starting next year. Hatt Mancock, the health secretary who increasingly resembles a schoolboy who has just shot his neighbour’s dog with a poison dart, can hardly contain his excitement. ‘We are building a world-leading diagnostics industry in the UK — not just for coronavirus, but for other diseases too,’ he gushed. Could this be the ‘exciting and ground-breaking new blood test’ Mancock says it is, or is it another world-beating Operation Bonfire of taxpayers’ money?
Most of the people enrolled in the pilot will be aged 50 and over, and asymptomatic, so this is mass screening. To the simple mind, screening is intuitively attractive. It stands to reason that picking up early cancers and treating them must be a good thing, just as picking up asymptomatic covid cases early, and isolating them, must be a good thing. But, as covid has taught us, screening is never simple. Not only do we have to concern ourselves with the accuracy of the test, its sensitivity and specificity, but we also have to consider how common the condition is in the real world, as this has a major effect on the false positive rate. We have also learnt that what happens after a positive test is important: if the ‘treatment’, self-isolation, doesn’t work, because too few people do self-isolate, then the test becomes pointless. The same principle applies to all screening: if the treatment doesn’t work, or doesn’t even exist, then screening will have no benefits, leaving only potential harms.
Another major problem is bias in studies of screening that make it appear that screening works — of course it works, stands to reason it works — when it doesn’t. A classic example is lead time bias. Imagine two people, both asymptomatic, but both harbouring an early untreatable cancer that will kill them both in two years time. Next week, one has a screening test for the cancer, the other doesn’t, and the screened one tests positive, and ends up with a cancer diagnosis, while the other carries on normally, blissfully unaware of the ticking time bomb within. A year later, the unscreened person develops symptoms, and gets a cancer diagnosis, and another year later, both die. Mr Stands-to-Reason hears about both patients — maybe they were friends of his — and says, look, screening obviously works, my screened friend survived twice as long as my unscreened friend. It does indeed look that way, but as you will have spotted, that is not at all what happened. Both survived the same length of time, only the screened person had the burden of knowing of the cancer, and very likely enduring treatment, for twice as long. All in all, no benefit, just the harm of being ill with cancer for twice as long.
That is perhaps an extreme example, but it serves to remind us that we should be wary of claims that screening works. Another major problem is poorly performing screening tests, with low threshold PSA screening for prostate cancer the classic example: it picks up far too many men who would never have died from their prostate cancer. The key thing here — and we are well placed to consider this, given the recent fandango over covid testing — is the test’s specificity, that is, how good, or specific, it is at only being positive when the condition it tests for is actually present. A highly specific test implies a low false positive rate — but there is a catch. If the prevalence of the condition of interest is low, then even a highly specific test can lead to a significant number of false positives. This happens because even a low rate applied to a large number, all those people who don’t have the disease because the prevalence is low, inevitably produces a large number of false positives.
Manufacturers of screening tests know this only too well, and so they tend to cook the books when running tests on their screening tests, by setting up trials in which the prevalence is artificially high. Information on the holy GRAIL test is sparse — never a good sign — but what there is suggests that this is indeed exactly what GRAIL have done. In a paper published in June this year, the company reports the results of a study of the sensitivity and specificity of its multi-cancer detection test. The headline figures are that the test has an overall specificity of 99.3% for any cancer (on the face of it, good), and a sensitivity for early (stage 1) cancers of 18% (not so good). Sensitivity improves if only selected cancers are included, as it does for later stage disease, but given that later stage disease will almost always mean the cancer is clinically apparent, a screening test becomes irrelevant.
Now let us look at the numbers a bit more closely. They are presented rather opaquely so we will have to reverse engineer them to get a conventional two by two screening test disease positive/negative by test positive/negative table. In the validation arm of the trial, there were 654 people with cancer, and 610 without, with sensitivity and specificity as above, 18% and 99.3% respectively. This, using Dr No’s favourite online diagnostic/screening test calculator, gives us the following two by two table:
Table 1: conventional two by two tabulation to test results, based on figures reported in the GRAIL paper
The sensitivity is none too good, but the specificity, and so false positives — there are only 4 out of a total of 122 positive results — look promising. But there is a big red flag waving over that table: the prevalence of cancer in the study population. At 52%, it is wildly higher than any realistic real world number. What happens if we lower the prevalence to a more realistic number?
Quite what a realistic number is involves a bit of educated guess work, given that we are trying to estimate a number we don’t know, the number of people who we don’t know have cancer. As a rough guide, let’s take the UK annual incidence of all cancers, 367,000, and say that — your guess is as good as Dr No’s — these cancers existed in screen detectable form for two years before diagnosis, meaning that the prevalence of screen detectable cancer at any one time is around 734,000, made up of 376,000 in their first screen detectable year, and 376,000 in their second screen detectable year. Given a UK population of 67 million, that gives us a prevalence of screen detectable cancer of of around 1%. Seems reasonable enough, so lets put that much lower but more realistic prevalence figure into our table, while leaving the sensitivity an specificity as before:
Table 2: same test, same sensitivity and specificity, but prevalence adjusted to a more realistic real world figure (note: 1% gets entered as 0.01)
Oh dear, not so good. Not only do we miss a lot of cancers, because of the low sensitivity, but now the low prevalence has spooked the test positives: of the eleven positives, a full nine are false positives. Nine out of every eleven who test positive will suffer the anxiety of a provisional cancer diagnosis, and while we might hope subsequent diagnostic testing will in turn rule out cancer, they will nonetheless have to endure the unpleasantness of the further tests, and inevitably — because those tests too will have false positives — some will have to endure treatment for a cancer they do not have (or do have, but it was never going to harm them).
Yet the government is once again steaming ahead, rolling out a ‘pilot [of a] potentially revolutionary blood test’ — and note pilot implies a test of feasibility, not a test of the test itself — across the country, even when the scant science there is shows the test at the very least needs further clinical evaluation, in a more realistic real world population. One hopes those invited to participate will give full informed consent, but one fears perhaps they may not. No need to bother with that sort of nonsense, when you are busy chasing a Holy Grail of ‘building a world-leading diagnostics industry in the UK — not just for coronavirus, but for other diseases too’.