I AM afraid that this is a rather geeky post on the coronavirus pandemic which nonetheless hopefully TCW readers will find interesting, detailing the difficult realities that the scientists and modellers trying to combat the disease face when making predictions about its progress.
As a trained scientist with limited exposure to some of the techniques the modellers are using, I will limit my remarks to some general observations. I am certainly not going to critique the approach taken by those frantically engaged in trying to understand this terrible phenomenon because I am utterly unqualified to do so.
It is true that these days modellers and scientists have access to tools undreamt of by previous generations when trying to predict the course of highly complex systems such as an epidemic. Machine Learning engines, driven by the almost unlimited power of modern computing resources, can crunch equations correlating thousands, indeed tens of thousands, of different variables very far in excess of what the human brain could hope to achieve. The result is that equations can be arrived at that fit colossal amounts of data almost perfectly, but this does not mean that such models are not subject to fundamental fragilities that can greatly limit their application.
Firstly, there is the way that data is collected. Any model is only ever as good as the data used to create it, and assumptions about what the data is actually measuring can be very dangerous but very easy to make. For example, it now seems that the nightmarishly high Italian mortality rates from Covid-19 may be partially a function of the way Italy records cause of death, inflating the statistics when judged from a British perspective.
Another problem is how well a given model may generalise. Say you are studying Italian data with a view to predicting the likely future course of Covid-19 in the UK. You may well arrive at a complex multivariate equation that correlates a great many different variables. From them you seek to extract the underlying factors that are influencing the spread of the disease. However, such a complex model that fits a specific case perfectly may be highly sensitive and may not generalise to UK data all that well. Applying inferences gleaned from modelling on one set of data to another may be worse than useless if the underlying factors that drive the disease vary from country to country.
Furthermore, sometimes you may not have reliable data that may allow you to discover the true influence of potential factors, or, even worse, those potential factors themselves may remain entirely undiscovered – what Donald Rumsfeld famously termed ‘the known unknowns and the unknown unknowns’. To an extent this is precisely what seems to be happening with Covid-19: one further factor in the high Italian mortality rates may be Italy’s older demographics and another may be Italian family life, where it is far more common for the older generations to live with the young. It may be relatively easy to prove that national demography is a factor and adapt your model accordingly, but it is altogether harder to quantify the precise dynamics of social interaction between young and old in different societies and infer its effect on disease transmission.
Let me strongly emphasise that I am not trying to be alarmist and suggest that all this means our scientists are simply clutching at straws: these are serious people with serious reputations, no doubt backed by an army of brilliant researchers. What I would stress is the need to critique their approach in a sober manner, rather than ranting like certain loud-mouthed armchair buffoons (yes, Piers Morgan, that means you).
The kneejerk nature of the media’s outrage that followed the UK’s initially very different course of action in combating Covid-19 was simply wrong. Furthermore, whether it was right or not, the fact that British researchers initially chose a different path from others was astonishingly brave: it is much easier to be proved wrong if you are wrong in good company, especially when the stakes are so high. We should remember that these people have to live daily with an almost unimaginable responsibility, sure in the knowledge that they cannot possibly get it entirely right. One day years from now, those same statistical engines, fed with more accurate and complete data, will no doubt cruelly expose the inevitable shortcomings in their work, and calculate with the 20/20 vision of hindsight how many lives could have otherwise been saved. That is a very heavy burden to live with, and we should all be thankful that it is them rather than us who have to do so.Attachments area