The Covid Inquisition has had a rather hard time of it lately. As UK test positives, hospitalisations and deaths have plummeted, they have had in recent weeks to rely on scariants and foreign outbreaks in Brazil and India to maintain fear levels, but over the last few days a game changer has emerged: the virus is airborne. WHO have marginally up-rated their assessment of risk from aerosol transmission, triggering a raft of tweets from the Inquisition saying we told you so, but the real bombshell is a pre-print that puts some E notation numbers and extra computer generated colours on the Milk Curdler’s earlier three colour crayon box model #CovidRiskChart. The Inquisition now know that H, or your airborne infection risk parameter, for a brief, silent masked outdoor encounter is 2.33E-05. Prolonged shouting without masks in a poorly ventilated crowded room, on the other hand, pushes your H up to 1.00E+02. If shout turns to shove, your H jumps even higher, to 2.33E+02. Cripes.
There is a lot wrong with this pre-print, starting of course with the fact that it is a pre-print, and so has not been peer-reviewed. Of the twenty authors, most, perhaps as many as eighteen, are non-medics, and none appear to be epidemiologists. They are instead a crew of engineers, ventilation experts and environmental technologists, and Dr No takes it that they are indeed experts in their fields, but notes nonetheless that they are not medical experts. The methods used rely heavily on numerology, by which Dr No means all but impenetrable maths, and modelling, and as we have seen more than once, using such methods doesn’t always end well. This obfuscation — the bane of so much modern science — is even extended to the presentation. There really is no need to use E notation in the chart, given that the vast majority of readers will not be familiar with its use. The E notation n.nnE[+/-]xx means n.nn time 10 to the power [+/-]xx, so for example, the 1.00E+02 given above is an obfuscation of a rather more familiar real number: 100.
The idea behind the obfuscation is to deter the casual reader from asking too many questions, and so accept the findings by academic osmosis. It can even act as a sort of obfuscation fallacy: if the eggheads can handle maths of such complexity, then they must be right. Nothing could be further than the truth, of course: obfuscation blocks verification, as we have no ready way to assess whether the numerology and models make sense. But there are other things we can look at without popping too many little grey cells. First and foremost of these is where the data comes from. For normal studies this means looking at the population studied, along with the data that was (and wasn’t) recorded; for meta-analyses — and this pre-print includes a form of meta-analysis — it means looking at the studies that provide the data used in the meta-analysis.
The pre-print’s chief exhibit is Figure 1(b). This plots the attack rate (in this pre-print, percentage of exposed who got infected) against log Hr (Hr is the relative infection risk parameter) for 12 covid outbreaks (and a few other diseases), along with a covid predicted attack rate based trend line based on somewhat arbitrary assumptions. In a model of academic clarity, the authors state “An Ep0 of 18.6 quanta h-1 was obtained by fitting (with B0 = 0.288 m3 h-1 assumed for all occupants for simplicity). This value is higher than that suggested by Buonanno et al. (2 quanta h-1 ) (33, 37), but within the uncertainties provided by those authors”. The Ep0 is the “SARS-CoV-2 exhalation rate by an infector resting and only orally breathing” and is all but an order of magnitude larger than estimates from other studies, but it is “within the uncertainties” given in those studies, so that’s all right then, all the more so as it makes for a good fit with the observed data. It’s sort of possibly convincing, but only if the magic Ep0 is used. If other Ep0 values are used, the model moves away from the observed data, as can be seen in this reproduction of Figure 1(b) from the preprint.
Figure 1: attack rate vs. the relative risk parameter Hr for outbreaks of COVID-19, tuberculosis, and measles reported in the literature (source)
The data behind those covid points in Figure 1 come from a variety of sources, several of which are pre-prints, so we have pre-prints using data from pre-prints. For at least four of the points, the data come from a pre-print which notes that the data are ‘not yet scientifically published’. We have a pre-print using data from pre-prints that use data that hasn’t even been pre-printed. Perhaps they heard it on the grapevine. Different studies used different case definitions, sometimes only test positives counted as cases, in others, clinical suspicion was sufficient to define a case. In some studies, persons at risk appear to have been chosen on a whim: one study uses the entire floor of an office building, for example, while another used only passengers seated in business class on a long haul flight. These thing matter, because they have a big effect on attack rate, the variable plotted on the Y axis on Figure 1 (b). In at least two cases, the listed source either fails to provide the attack rate, or gives a different estimate.
Repeating the plot in Figure 1 using just the covid outbreak data given in the pre-print, and adding labels showing the setting for each outbreak, we get a plot that looks like Figure 2. A trend seems apparent, even without the eye leading red trend line.
Figure 2: the data behind Figure 1 above, covid data points only, with setting labels added
What happens if we remove the ‘heard it on the grapevine’ and missing data points, correct one apparent error (the big bus rate), and make some not unreasonable adjustments to numerators and denominators (Skagit choir: only include PCR +ves, as most of the other studies do; include all passengers on the aircraft as persons at risk and include all diners in the same room in the restaurant as persons at risk)? Even with just these adjustments — Dr No lost the will to do any more, because the point is already made — the apparent trend effectively disappears, as seen in Figure 3.
Figure 3: Figure 2 adjusted by (a) removing/correcting questionable data and (b) changing some underlying assumptions (see text for details)
The sources are the same: all Dr No has done is removed unverifiable data, corrected an error, and made some different but not unreasonable assumptions. And yet we get a very different result. And this is the point: studies such as these are extremely sensitive to (a) data selection and (b) initial assumptions. Once this is established, the robustness of the authors’ subsequent modelling collapses, and with that, the E notation numbers presented in the revised quantitative #CovidRiskChart disolve into a haze of airborne fleas.