Like many young boys in the UK my first career choice was to become a soccer star. My grand­dad (Harry) had been something of a local soccer hero in his day, and I wanted nothing more than to be like him. Harry had a huge influence on me: he had been a goalkeeper, and conse­quently I became a goalkeeper too. This decision, as it turned out, wasn’t a great one because I was a bit short for my age, which meant that I never got picked to play in goal for my school. Instead, a taller boy was always chosen. I was technically a better goalkeeper than the other boy, but the trouble was that the opposition could just lob the ball over my head (so, technique aside, I was a worse goalkeeper). Instead, I typically got played at left back (‘left back in the changing room’ as the joke used to go) because, despite being right footed, I could kick with my left one too. The trouble was, having spent years trying to emulate my granddad’s goal­keeping skills, I didn’t really have a clue what a left back was supposed to do.1 Consequently, I didn’t exactly shine in the role, and that put an end for many years to my belief that I could play soccer. This example shows that a highly influential thing (like your granddad) can bias the conclusions you come to and that this can lead to quite dramatic consequences. The same thing happens in data analysis: sources of influence and bias lurk within the data, and unless we identify and correct for them we’ll end up becoming goalkeepers despite being too short. Or something like that.


1 In the 1970s at primary school, no one actually bothered to teach you anything about how to play soccer; they just shoved 11 boys onto a pitch and hoped for the best.