Searching online for examples of bad data visualization is one of my guilty pleasures. Usually I am simply seeking a quick hit of schadenfreude, but a couple times I have found errors that would have been tempted to make myself, and learned from the experience. The other day I was hit with an entirely different reaction, when my most popular data creation (based on page views) popped up in my Google image search for “Worst Data Visualization Ever.”
Walter Hickey, a journalist presently writing for FiveThirtyEight, whose work I enjoy (particularly this on box office and gender coverage in Hollywood films), had included the above image and description of my MLS salary visualization in a column titled of The 27 Worst Charts of All Time, which posted on Business Insider last June. I was taken aback because while I knew the graph wasn’t the Platonic ideal of Tufte-ian visualization standards, within the MLS blogosphere, it had garnered myriad retweets, upvotes, and page views soon after I created it. I had received a few tweets seeking clarification, but I don’t remember any expressing utter confusion. As the above image is only part of the data visualization, it is unclear whether or not Hickey even saw the full interactive version. Also in the list were bad graphs published by the White House, Wall Street Journal, Human Rights Campaign, Bloomberg, Gallup, three from Fox News, etc. So at least I was in mostly good company…
Luckily, Hickey also included a link to Junk Charts, whose more constructive criticism had presumably alerted him to my work. Kaiser Fung’s assessment, titled “More power brings more responsibility,” noted that:
“(1)Sorting the bars by total salary would be a start.
(2) The colors and subsections of the bars were intended to unpack the composition of the total salaries, namely, which positions took how much of the money. (3) I’m at a loss to explain why those rectangles don’t seem to be drawn to scale, or what it means to have rectangles stacked on top of each other. (4) Perhaps it’s because I don’t know much about how the cap works.
Combined with the smaller chart (shown below), the story seems to be that while all teams have similar cap numbers, the actual salaries being paid could differ by multiples.”
(numbers added by me)
There are valid criticisms here alongside an admitted knowledge gap. First off, Fung is absolutely right that I should have sorted the bars (1). That’s an inexcusable, silly oversight on my part. Also, the criticism of my coloring the segments of this tree graph by position (2) is valid, and I shouldn’t have let it become a perceived focus of the graph. A simple color scale based on player salary probably would have been much more effective.
The main takeaway I wanted from the graph was the massive disparity between the best- and worst-paid players in MLS, as well as a comparison of total club salaries. For some viewers, obviously the coloring by position distracted from that. (3) pointed me toward a minor issue with the graph that had escaped my attention. Tableau should draw the rectangles to scale, but there seems to be around a 0.1% error in rectangle sizing. The league’s minimum salary ($35,125) is 0.88% of Robbie Keane’s guaranteed compensation of $4,000,000 (16,555). From a screenshot of the chart, a minimum player’s rectangle ranged from 119 to 130 pixels, 0.72-0.78% of Keane’s 16,555 pixels. For me this is an acceptable level of imprecision since it allows me to chart both total spending and compare individual player salaries in a single view.
Taking these points into account, here’s my first draft of a re-design:
First, no soccer fan would ever order the positions: defender, forward, midfielder, goalkeeper, other. More seriously, lumping defender-midfielders and midfielder-forwards under a single “other” group is misleading as these two types of players are enormously different. Fung admitted earlier that he didn’t know how the cap works (4), and this contextual oddity probably came from a scant knowledge of the sport itself. Even without that issue, I think I have come up with a better way to improve upon on my positional and overall breakdown of each club’s spending:
In hindsight, I wasn’t thrilled with the graph when I created it, which is why I posted it on reddit/MLS instead of one of the blogs I was writing for at the time. I figured I would receive some constructive criticism, but instead it quickly passed 5,000 page views on well beyond (its most recent version passed 14,000 recently), the popularity became an end in and of itself. Whenever the MLS Players Union released updated salary information, I felt obligated to update the chart. Data visualization can be oriented toward page views or clarity/usefulness, and I fell into the common trap of feeling that if something is popular, it must have been well organized, despite plenty of evidence to the contrary.
Also, when posting something online, I need to be more cognizant of the chance that it will reach beyond its intended audience. MLS supporters were very open to a visualization of league’s salaries, and may have already had a good enough basic understanding of MLS wage dynamics that they saw past the chart’s oddities. I can sympathize with Hickey and Fung’s confusion, and I should have organized it in a way that was more universally understandable. I am thankful that both of them offered a critique, though I wish one of them (or any of the readers who saw my Twitter handle within the graph) had reached out to me at the time.
I always love to receive thoughtful constructive criticism. Please don’t hesitate to tell me if any of my work seems spurious or unclear, or if you see another website discussing my work. I have never claimed that anything I post is beyond reproach, far from it. I greatly value any opportunity to review a smart assessment of my analysis, data visualization, and writing.