John Burn-Murdoch from the Guardian discusses why we should always ensure we think about how and why information is being presented to us in data visualizations
Classic example of the lie factor distorting how we see the data. The X axis doesn't start at zero, greatly exaggerating the difference
Pete Warden is spot on about being sceptical of data, but it is data visualisation, not data science, where caution is most crucial
Data visualisation is a wonderful tool and an extremely efficient way of communicating a message. But what if the message is wrong?
First of all, let me be clear: the headline of this article is a reference to Pete Warden's post, and should be read in the same way - as a caution against blind acceptance, rather than the wholesale condemnation of data visualization
An excellent blogpost has been receiving a lot of attention over the last week. Pete Warden, an experienced data scientist and author for O'Reilly on all things data, writes:
"The wonderful thing about being a data scientist is that I get all of the credibility of genuine science, with none of the irritating peer review or reproducibility worries ... I thought I was publishing an entertaining view of some data I'd extracted, but it was treated like a scientific study."
This is an important acknowledgement of a very real problem, but in my view Warden has the wrong target in his crosshairs. Data presented in any medium is a powerful tool and must be used responsibly, but it is when information is expressed visually that the risks are highest.
The central example Warden uses is his visualisation of Facebook friend networks across the United States, which proved extremely popular and was even cited in the New York Times as evidence for growing social division.
As he explains in his post, the methodology behind his underlying network graph is perfectly defensible, but the subsequent clustering process was "produced by me squinting at all the lines, coloring in some areas that seemed more connected in a paint program, and picking silly names for the areas". The exercise was only ever intended as a bit of fun with a large and interesting dataset, so there really shouldn't be any problem here.
But there is: humans are visual creatures. Peer-reviewed studies have shown that we can consume information more quickly when it is expressed in diagrams than when it is presented as text.
Even something as simple as colour scheme can have a marked impact on the perceived credibility of information presented visually - often a considerably more marked impact than the actual authority of the data source.
Another great example of this phenomenon was the Washington Post's 'map of the world's most and least racially tolerant countries', which went viral back in May of this year. It was widely accepted as an objective, scientific piece of work, despite a number of social scientists identifying flaws in the methodology and the underlying data itself.
Continue reading here
Source: The Guardian