Ever since I first charted a cosine curve on my Casio graphing calculator in high school pre-cal, I’ve been a fan of data visualization. The vast majority of the time, I employ it to learn things, but sometimes I just doodle for the aesthetic appeal. When this is the objective, I often use that trusty cosine curve as my basis.
Tableau has a contest this month for using their product artistically, and I decided to make a cosine graph, its vertical mirror image, and their trend lines with parameters that users could play with to change amplitude, wavelength, color, etc. of the waves. I’m not defining what each of the four parameter controls do at this point (if anyone cares, I’ll gladly post the formula that drives it), I just hope that someone enjoys tinkering with it like I do.
This graph was only supposed to be a little exercise in descriptive stats, but after I made it, I realized that I could drum up something similar in Tableau, which could easily include some predictive analytics and that would be easily repeatable for others in MLS, whose player pages all include a game log dating back to 2010. I’m just not sure if there’s a market for that, so here I’ll just post the graphic alone and ask for your feedback for now.
Landon Donovan announced his retirement yesterday, effective at the end of the current season. Currently Donovan has the most goals (138) and the 2nd most assists (124) in MLS history. Some have smartly pointed out that these figures are inflated by both his longevity (4th all-time among field players with 27,423 minutes) and his penalty kick goals (2nd most with 28). Thankfully a common fancy stat accounts for both these issues. Non-penalty goals plus assists per 90 is a pretty self explanatory term. Subtract PKs from a player’s goal total, add in assists, divide by minutes, then multiply by 90. I applied this calculation to MLS’ all-time top 25 for both goals and assists, and here’s the leaderboard, active players in green, retirees in blue:
The exclusion of penalty kick goals deserves a quick note. PKs are inherently a different skill than goals in every other situation of the game. Put simply, in the run of play, corner kicks, free kicks, etc. you will never see a single player get an unimpeded run up to a shot 11 yards away from a keeper that’s anchored to the goalline. That situation immensely favors the shooter, scoring 70-80% of the time, while the average shot in other situations goes in only 11%.
Landon’s still #1, but his lead on Preki and Taylor Twellman is pretty slim. A slim lead over MLS originals is even more impressive than you might think, because the standard for assists was very lax in the olden days. You can read about it here, but the basics are that the last two players to touch the ball before their teammate scored were usually credited with assists, even if a lot of things happened between their pass and the strike. Players used to even get an assist even if they took a shot and their teammate scored of a rebound.
The chart above is a bit noisy, and I hope to build an database of MLS players with each year of their production, rather than raw career totals. For now, we have even more reason to be in awe of the face of Major League Soccer who will be hanging up his cleats in a few months.
Check out The Shin Guardian today. I wrote about my vision for improving the MLS All-Star Game. Basically, I want to see a return to a US vs the World format. But with a twist, because the event’s non-FIFA status would allow the US team to include citizenship hopefuls, too.
My contest for guessing new MLS wages ended yesterday. If the MLS Players Union holds to form, they will update their salary release today, and I can determine the winner of the contest. In the meantime, here are box and whisker plots of all the guesses put forth by my readers:
The median values (point in each box when the hue of grey changes) mostly seem reasonable, but based on Brian Straus’ report from last night, the crowd here may be profoundly overestimating the wages Sporting Kansas City will be paying Matt Besler, and especially Graham Zusi. Per Straus, both of those players will “earn a pro-rated base salary of $600,000 this year,” while my readers’ median guesses were $1.1 million for Besler, and $1.2 million for Zusi. The median for DaMarcus Beasley, $850,000, was much closer to the $780,000 guaranteed compensation reported by Straus.
Notably, the MLSPU release could differ from Straus report, and that release will be the standard by which the contest is judges. Even though I trust Straus’ report more, MLSPU-listed guaranteed compensation was the standard by which I set up the contest, and those rules cannot change after submissions were received.
As is often the case in MLS (and other soccer leagues) this time of year, its clubs have been introducing quite a few new players. What’s unique to MLS right now is that the most famous ones are committing to clubs that won’t play any official matches until next March. Yesterday, Frank Lampard signed with New York City FC, joining David Villa as the faces of that club heading into their expansion season next year. Kaka joined MLS’ other incoming club, Orlando City, weeks before. Villa and Kaka are going on loan elsewhere until they have a US-based squad to play with, and its unclear if Lampard will do the same, or just rest up for a few awhile.
Meanwhile, existing MLS clubs have been bringing in Americans (DaMarcus Beasley) and foreign players that each have somewhat lower profiles. Alongside these signings, some MLS players have also been getting extensions raises, most notably Sporting Kansas City turning Matt Besler and Graham Zusi into designated players.
If you’re anxious to hear about the contest I mentioned in the title, here’s where I get to that. Whoever does the best job of guessing the below salaries of the players below will get a prize. Type in your estimates of each player’s guaranteed compensation (annual wages, not weekly, as is usually reported in European leagues), alongside your name and either your email address or Twitter handle. I will shut off the poll on July 31st, or when the MLS Players Union (MLSPU) updates their listing of salaries, whichever comes first.
The contest is now closed.
After the MLSPU makes their newest salary release, which happened on the first of August in both 2012 and 2013, I’ll campare the guesses to the listed guaranteed compensation. Guessers will get a point for each percentage point above or below the actual wage for each player. Whoever ends up with the lowest point total wins. For example, if you guess $3 million and it turns out that the player makes $4 million, you get 25 points, because $3mm is 75% of $4mm, and 100-75=25. I won’t make my own guesses public, so as to avoid swaying the vote, but I’ll hold some general discussions on the topic on Twitter (@StatHunting).
I will tailor the prize to the winner. I don’t have the budget to offer something that anyone would want, but I’ll talk to the winner via Twitter or email and see what they are into. It will probably end up being a soccer scarf, DVD, or something along those lines. I also have an autographed picture of Kenny Cooper circa 2008 that I’d happily give to someone use it for something other than a dartboard.
So, I don’t know what the prize will be, but I promise to do everything I can to make it fun. Given my lack of a budget, I might have to get pretty creative if a European wins this thing, but we’ll see.
Every time the MLSPU releases salary data, I visualize it. This time, that post will include a wisdom of crowds visualization with all the guesses I received compared to the actual listed figure for each player, as well as an announcement of the winner. Here’s what my most recent MLS salary visualization looked like, which may help some of you make an educated guess for some of these players:
I’ll make a couple tweaks when I chart the next release, because there are a couple parts of my approach above that I’m not entirely content with. Also, anytime the MLSPU salary release is discussed, it has to be mentioned that this data is often in some level of dispute, as coaches, owners, and technical directors have publicly stated that there are inaccuracies there. However, it’s the only information we have to go off of, and this seemed like a fun way to see how well people can guess these players’ wages.
After this year’s World Cup ended, I went back to my History of the World Cup visualization and added in the 2014 data. Not content with just a data enhancement, I have also decided to break World Cup history into storypoints (a really interesting new feature in Tableau) for each era, so that readers could more easily see the record of various nations over the shorter periods of time.
What follows should speak for itself, and every era’s page is fully interactive so that those who would like to explore a specific nation’s performance can still do so by click or hovering, and all filters are active.
What is most striking to me here is the way that focusing on eras can provide shape that the overall record of teams does not offer. We almost take it as fact that Brazil and Germany have always been world conquerers, but they both had dry spells where they were not one of the best sides. Brazilian dominance in the 1950-1962 and 1982-1994 eras drive their standing as the slim all-time leader in points and goal difference per game, while few have ever been as good as Germany have been during the current 32-participant era.
Please explore the data yourself, but beware of low-sample outliers within eras. For example, while Eusebio-led Portugal had a stellar record in 1966, it was that nation’s only appearance in the era, so their standing atop the goal difference per game table should not imply that they were the top team of the era. I hope that what I’m offering here makes it much easier for people to meaningful explore the history of this tournament.
This article serves as my entry for Tableau’s Storytelling Viz Contest. If you like this, please vote for me by tweeting #StorytellingSteve today, as number of hashtags drives a $500 prize. Here, I’ll make it easy for you:
Google’s Doodles, the animations and games that adorn the search giant’s home page, blew past their own ridiculously high standards during the last month. That’s because the they have produced 63 light, fun doodles dedicated to the World Cup. Google even sent members of their Doodle team to Brazil so they could better capture the spirit of the tournament. This led to a ton of fun little cultural references and on-the-ground imagery in many of the images. These Doodles deserve preservation beyond their relatively dry Doodle archive page, where small versions are listed out of order and without notes on which games they referenced.
Whether you choose to scroll through all of the doodles or use your browser find feature to search for particular teams or dates, I hope you enjoy.
Days without fixtures in the middle of the World Cup can be disarming, and especially so after the relentlessly entertaining start to the year’s tournament. Whatever time zone you are in, a certain hour hits and you instinctively flip on the TV, streaming service, check Google just to see their match-specific doodle, or look for the latest Cup chatter on Twitter.
For the second day in a row, today you won’t find any matches, and that’s a good time to sit back and assess teams accomplishments thusfar in the tournament. To mark the lack of occasion, I’ve taken non-PK goal, shot, and expected goal data (gathered by the excellent Colin Trainor) and broken them down to per 90 minute figures (extra time would skew per game results). We’ll get a good look at the whole field, and focus on the United States, which along with being my home country and the team I follow closest, has some unique attributes.
It is natural to start with goals, excluding penalty kicks and own goals, which aren’t very likely to recur. After all, outscoring opponents is the whole point of the game. The USA falls just below average, tied for 17th in the field of 32, as they finished with a -1 goal differential, while they and Algeria played more minutes than the other minus ones. However, there are issues with trying to analyze teams using only goals for and against. Performances from Switzerland, Ecuador, Croatia, and Côte d’Ivoire have not been of equal quality, but if you’re forced to look at them only in terms of rare scoring events, they are all tied with goal differentials of zero. Over a small sample, and everything in a tournament like the World Cup is a small sample, it is very difficult to say much based only on goals, which are driven by fixture difficulty, game states, and other factors that some would call luck.
One rule of thumb in statistics is that if you need to track important but rare event, you’re often better off focusing on the more common events that lead to it. With that in mind, we step back from goals to use shots (again, without PKs) for that wider focus.
Now we’ve at least got some clearer differentiation between teams. So where is the good old USA? Oh, that’s kind of embarrassing. Klinsmann’s crew allowed 94 shots, worst in the field. Alongside their taking only the 27th-most shots per 90 minutes, that massive level of shot absorption slams them to the bottom of this shot differential list. Remember what we said about the importance of fixture difficulty, though? Belgium, Ghana, and Germany all sit in the top six here, and Portugal are 13th. The USA played against four very shot-happy teams. Surely the US’ opponents reached these levels in part because because the United States was so permissive of en masse shooting, but they also sit relatively high here because of performances in their other matches.
Now for an even deeper cut we go to expected goals (again, invaluably provided by Colin Trainor). For the uninitiated, expected goals are an adjustment of shots based primarily on location. Different analysts approach expected goals in slightly different way, and you can read about the specifics of Colin’s work with Constantinos Chappas on this metric here. Basically, shot distance, angle, and body part striking the ball make a big difference in the likelihood of that shot scoring, and expected goals reflect this.
While this only lifts the USA up to 27th, keep in mind just how far behind they were on unadjusted shots. Now we see them pass the truly weaker parts of the field in Honduras, Cameroon, and Korea, alongside some more impressive teams like Chile and Nigeria. All told, the offense appears to be a little below average in this field, and the defense allowed 1.47 less the expected goal metric would project. While some would chalk that up entirely to Tim Howard, there are other positives here for the defense. While some portray Howard’s work in this tournament as herculean, if had to choose a Greek mythology descriptor, I’d go with Sisyphean.
While Howard certainly had some very highlight-worthy saves, for the most part his teammates guided opponents into difficult shooting angles, and goaded them into attempts from distances unlikely to bother a keeper of Howard’s quality. The overall image for me is of consistency and persistence, pushing back a boulder’s worth of chances, not monster-conquering, unbelievable feats of agility. As another way to compare shots and expected goals allowed, here’s a scatter plot of the two:
Note the distance between the trendline and the United States here. This again lays out that the Americans’ average shots allowed were markedly less promising than those allowed by most defenses in Brazil, though Brazil, France, Italy, and Colombia have been similar in this respect (much better company than Cameroon, Honduras, and Spain, who swung most strongly in the other direction), though those powers have also had lower overall shot allowance at the same time. This is a nuance that gets bypassed when pundits or statisticians focus only on shots, scorelines, and levels of bracket invasion for the USA.
None of this is to say that the USA were great in this tournament, or that they should still be in Brazil, preparing for Argentina right now. They weren’t, and they shouldn’t be, but they also weren’t as bad as some are arguing right now. Yes, they were often outplayed by their opponents, and while we should rightly praise Howard for his contributions, some credit goes to US defensive schemes and players, who mostly kept the likes of Christiano Ronaldo, Thomas Muller, Asamoah Gyan, and Eden Hazard out of the prime real estate directly in front of Howard.
Whether it was a conscious choice or not, over the last couple weeks Jurgen Klinsmann’s team allowed shot quantity and limited quality. While that’s no recipe for overwhelming success, it did keep them in games, and gives them a platform to grow from. Stay stingy on quality while making a regular habit of keeping opponents out of the final third more regularly, and this team could really be onto something. Of course that’s overwhelmingly easier to say than do, but ascending the ranks is supposed to be hard.
When I was a more casual fan, World Cup group tables annoyed me. This pinnacle of the beautiful game was the only sports competition I knew of in which standings could not be viewed in total without scrolling through a website. This was particularly problematic if I wanted to look up a particular nation but had not memorized their group assignment. This year, I’ve designed an alternative. Above you’ll see the result. I’ve packed flags of every World Cup nation into images for each group, and when you hover or click on a particular group, you get a view of games played, goal differential, and points (presented in a bar graph as well as a number). The point here is ease of access to information. Anyone who is familiar with a nation’s flag will find its standing in the space of a second. The widget is small enough (320×100 pixels) that it can be placed anywhere on any website, using the following embed code:
If you have an html portal for a website that will be covering the World Cup, please consider pasting the above somewhere on the main page so that your readers will be able to see whatever table they want very quickly.
Enormous thank yous go out to Colin Trainor and Jerry Tweedy, who are helping me keep this as up to date as possible throughout the group stage. My wife and I just had a child yesterday, which doesn’t bode well for the likelihood that I would be able to update results in a timely fashion on my own.